IoT source, streaming, bronze table correct setup?

Question

I have a general question on if my setup is any good for my scenario below:

1 streaming IoT source to Azure Eventhub
Raw format is in byte encoding and requires that it be decoded.
Need to build a scalable and reusable data structure
Output to 2 Azure cosmos DB collections, 1 streaming and 1 batch different set of data.

I have the following in mind

1 streaming job 24/7 - Data from Azure eventhub --> append to Delta table table Bronze
1 streaming job 24/7 - Select from Bronze Delta table, decode data and merge into --> Delta table Silver
1 streaming job 24/7 - Select specifics from Silver Delta table, transform, aggregate and upsert into —> Azure Cosmos DB collection 1
1 batch job once per day - Select specifics from Silver Delta table, transform, aggregate, upsert into —> Azure Cosmos DB collection 2

Is this the correct way of doing it?

Should I save the raw unencoded data in Bronze or should I instead decode and save it as raw?

What about schema and decode logic changes from IoT source?

Can/should you cohost the first 2 streaming jobs into 1 to save cost? Is the merge between bronze and silver to heavy to have in 1 job?

Answer

Hello @Alex ,

• Yes, this is good and its correct way of doing.
• You can do either decodes the stream from IoT and then writes to Bronze. I think the question is do you want to be able to query the data directly in bronze or are you OK with decoding the payload when querying. If the data is large I wouldn't as it is likely to have a significant impact on query performance as you will have to call your decode function (like .from_json().cast('string')) for each record. Better to take the hit during the stream read as the data flow is going to be regulated by the data coming from the event hub.
• I would define the schema and pass it as part of the spark.readStream() function.
• I don't think so - you will probably want to select F type VMs or other compute optimized for the cluster.

Hope this helps. Do let us know if you any further queries.

------------

Please don’t forget to Accept Answer and Up-Vote wherever the information provided helps you, this can be beneficial to other community members.

IoT source, streaming, bronze table correct setup?

1 answer