Blob Storage Partitioning of Device Data From IoT Hub

Stefan Klocke 11 Reputation points
2022-07-13T20:54:09.073+00:00

Hi everybody,

I have configured an IoT Hub with a built-in endpoint to store IoT device messages inside an Azure Blob Storage. Data is being stored under {container}/{iothub}/{partition_id}/{YYYY}/{MM}/{dd}/{hh}/{mm}/{filename}. As far as I understand the concept, partition_id is chosen automatically and it is not intended for the user to choose partition_id upon configuration of devices. Right now, this means that I do not know under which partition_id the data of some device will eventually be stored.

I am consuming the data from Azure Databricks with the Blob Storage being mounted to the Databricks Filesystem. For the above reason, I am currently specifying the partition_id within the mounted filesystem for every device manually (e.g. device A is stored under partition 03, so prior to importing data I need to specify that I will import data from this partition). Is there a way to automatically find the partition under which a specific device's data is stored (so that I do not have to import data from all partitions and then filter for the specific device)?

Thanks in advance and best regards
Stefan

Azure Internet of Things
{count} vote

2 answers

Sort by: Most helpful
  1. QuantumCache 20,366 Reputation points Moderator
    2022-07-19T22:26:55.297+00:00

    Hello @Stefan Klocke Did you get a chance to refer to the solution suggested by Sander in the above comment?

    Also one of the Product team members' suggestions was to try using ADX in your scenario.

    Consider using ADX for splitting and making the data queryable? ADX can route the data and save it in efficient parquet tables in addition to making the data immediately useful for real-time dashboarding and
    ad-hoc analytics!

    Ingest data from IoT Hub into Azure Data Explorer

    1 person found this answer helpful.

  2. Sander van de Velde | MVP 36,776 Reputation points MVP Volunteer Moderator
    2022-07-15T08:15:07.073+00:00

    Hello @Stefan Klocke ,

    The 'uncontrolable' partitionID seems to make the path unpredictable.

    Unfortunately, this partitionkey must be part of the blob storage container path:

    221065-image.png

    Usually, the logic handling incoming blobs, is not interested in the paths, just in new blobs. So then this is not really a problem.

    Perhaps, in your case, it's a simplistic solution, but what if you rearrange the path and move the partitionkey to the end like:

    221064-image.png

    I expect you have some flexibility in the path so experiment with the path like:

    {iothub}/{YYYY}/{MM}/{DD}/{HH}/{mm}{partition}  
    {iothub}/{YYYY}/{MM}/{DD}/{HH}/{mm}/{partition}  
    

    off-topic: Each incoming message should be handled independently so the device id should be part of it. I expect the system properties to show the device id. why is the logic device dependent?

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.