In the provided JSON structure, the data is arranged in a format that isn't directly convertible to a traditional columnar format, which might be why Copy Activity doesn't infer the schema correctly.
Here's an approach using Azure Data Factory (ADF) Data Flow to process this file:
Flatten the JSON: First, you need to flatten the JSON file. In Data Flow, use 'Parse' transformation for this. The output will be a single row with multiple array columns.
Split into Multiple Rows: After flattening, split the single row into multiple rows. Use the Unpivot
transformation in ADF Data Flow. You'll have to do this separately for each of the arrays ("Name", "Unit", "ID", "Index", "Data").
Combine Data: Once you have all the separate frames (one for each original array), you need to join them together based on the index of the element in the original array. Use Join
transformation for this.
Save as Parquet: Now that you have a traditional table-like DataFrame, you can save it as a Parquet file using 'Sink' operation.
Please note that the above method is one way to process this kind of data in ADF. Depending on the exact nature and size of your data, there may be other methods that could be more efficient or appropriate. Also, please be aware that debugging and developing data flows in ADF can be time-consuming due to the serverless nature of the service. Make sure to plan accordingly and test each step thoroughly. If you have further questions or if something is not clear, feel free to ask!