I am using Azure (Azure Databricks, IoT Hub) to stream unstructured data from IoT devices (i.e. wind turbine), in the form of thousands of files with millions of data captured over a period of 10 years. How do I extract a variety of metadata fields directly from these unstructured files? (and not from a structured table, for example)
The reason for this that these devices are generating metadata fields such as temperature and humidity data most of the time, however a particular device may be generating new metadata fields, which I may not be aware of. I would like to know this beforehand, so that I can address this issue prior to it becoming problematic.
Particularly, I would like to see: file name (i.e. windTurbine14), metadata field names (i.e. temperature, humidity, newMetadataFieldX), and metadata field data type (i.e. double, double, double). Once I have this information, I can conduct analytics on this data to better visualize the new metadata fields from each file.
I would really appreciate any help that you can provide in this matter. Specifically, what queries should I be running on these files, to ensure there is 100% extraction of all metadata fields from all files?
Thanks in advance!