Optimal Json format for Azure Cosmos DB
In my company there are machines that return data for every action they take. I have a huge amount of data and I want to upload it to Cosmos DB
to take advantage of the horizontal scalability of a non-relational database.
Based on my very basic knowledge of Cosmos DB, my idea is to use a composite partition key
to organize my data structure as efficiently as possible. Currently, to identify an action, I use the id of the machine that performed the action, the id of the product on which the action was performed and the id of the action itself.
So the key would be something like this: `'product_machine_action'`
.
I am open to any suggestion about the structure of the data in Cosmos DB. Mainly, I want to store all the data as the machine sends it and query the data to compare graphs of several products at the same time.
The question is that I don't know which is the most optimal Json format for my case. The machines return several csv with different types of data.
As each csv has columns with unique values, i.e. the column name is not repeated, I had thought of creating a single large dictionary and use the column names as keys. In case there are more than one line in a file I had thought of making a list. The only files that have more than one line would be the data of the graphs and I would like to consult them at the same time.
I have the following doubts:
- Is it optimal to use dictionaries with lists?
- Is it optimal to have only one dictionary with the information of all the files in a Json?
- Would it be better to make one dictionary per file in a Json?
I imagine that the partition key must be in a separated dictionary.
Link to this question in stackoverflow
A data sample:
[
{
"id": "product_machine_action"
},
{
"BOX Type": "data",
"BOX Release": "data",
"BOX Firmware Version": "data",
"BOX Production Date": "data",
"BOX Maintenance Date": "data",
"BOX Operation Time": "data",
"BOX Power Supply Limit": "data",
"BOX Lub Pump Coef.": "data",
"Pset Default Selection": "data"
},
{
"Max cycles Limit 1": "data",
"Max cycles Limit 2": "data",
"CC Pressure": "data",
"Max Stroke (mm)": "data",
"Vacuum Delay (ms)": "data"
},
{
"position": "data",
"torque_intensity": "data",
"thrust_intensity": "data",
"no_load_torque": "data",
"no_load_thrust": "data",
"step": "data",
"stop_code": "data",
"torque_min": "data",
"thrust_min": "data",
"torque_power": "data",
"gap_length": "data"
},
{
"position": "data",
"torque_intensity": "data",
"thrust_intensity": "data",
"no_load_torque": "data",
"no_load_thrust": "data",
"step": "data",
"stop_code": "data",
"torque_min": "data",
"thrust_min": "data",
"torque_power": "data",
"gap_length": "data"
},
{
"position": "data",
"torque_intensity": "data",
"thrust_intensity": "data",
"no_load_torque": "data",
"no_load_thrust": "data",
"step": "data",
"stop_code": "data",
"torque_min": "data",
"thrust_min": "data",
"torque_power": "data",
"gap_length": "data"
},
{
"position": "data",
"torque_intensity": "data",
"thrust_intensity": "data",
"no_load_torque": "data",
"no_load_thrust": "data",
"step": "data",
"stop_code": "data",
"torque_min": "data",
"thrust_min": "data",
"torque_power": "data",
"gap_length": "data"
},
{
"position": "data",
"torque_intensity": "data",
"thrust_intensity": "data",
"no_load_torque": "data",
"no_load_thrust": "data",
"step": "data",
"stop_code": "data",
"torque_min": "data",
"thrust_min": "data",
"torque_power": "data",
"gap_length": "data"
},
{
"position": "data",
"torque_intensity": "data",
"thrust_intensity": "data",
"no_load_torque": "data",
"no_load_thrust": "data",
"step": "data",
"stop_code": "data",
"torque_min": "data",
"thrust_min": "data",
"torque_power": "data",
"gap_length": "data"
},
[TRUNCATED]
{
"Version": "data",
"Date": "data",
"Sample Rate (Hz)": "data",
"Drilling Cycle ID": "data"
},
{
"Head Name": "data",
"Head Type": "data",
"Head TAG UID": "data",
"Head Global Counter": "data",
"Head Local Counter 1": "data",
"Head Local Counter 2": "data",
"Head M1 Ratio": "data",
"Head M2 Ratio": "data"
},
{
"Motor Name": "data",
"Motor Type": "data",
"Motor STSE": "data",
"Motor SN": "data",
"Motor Operation Time": "data",
"Motor M2 Ratio": "data",
"Motor M1 CW Ratio": "data",
"Motor M1 CCW Ratio": "data",
"Motor Winch Limit": "data"
},
{
"Step Nb": "data",
"Step On/Off": "data",
"DEP (mm)": "data",
"RPM": "data",
"Feed (mm/s)": "data",
"Feed (mm/tr)": "data",
"Thrust Max (A)": "data",
"Torque Max (A)": "data",
"Thrust Min (A)": "data",
"Torque Min (A)": "data",
"Thrust Safety (A)": "data",
"Torque Safety (A)": "data",
"Gap (mm)": "data",
"Peck (nb)": "data",
"Delay (ms)": "data",
"Stroke Limit (A)": "data",
"Thrust Limit (A)": "data",
"Torque Limit (A)": "data",
"LUB AIR": "data",
"LUB FLOW": "data",
"Vacuum": "data",
"Material": "data"
},
{
"Pset Type": "data",
"Pset n\u00b0": "data",
"Version": "data"
},
{
"Cycle Time (s)": "data",
"Distance (mm)": "data",
"Cycle OK": "data"
},
{
"Step Number": "data",
"Stop Code": "data",
"Duration (s)": "data",
"Distance M1": "data",
"Distance M2": "data",
"M1 Max Amp": "data",
"M2 Max Amp": "data",
"M1 No Load Amp": "data",
"M2 No Load Amp": "data",
"Gap Max (mm)": "data"
}
]
As you can see, I have a list of JSON objects, my doubt is that I don't know what is more optimal if having a single JSON object or keeping the list. In addition, as you can see there is a part of the code that is repeated and I had thought of putting it this way:
{
"position": ["data", "data", "data", "data", "data"...] ,
"torque_intensity": ["data", "data", "data", "data", "data"...],
"thrust_intensity": ["data", "data", "data", "data", "data"...],
"no_load_torque": " ["data", "data", "data", "data", "data"...],
"no_load_thrust": ["data", "data", "data", "data", "data"...],
"step": ["data", "data", "data", "data", "data"...],
"stop_code": ["data", "data", "data", "data", "data"...],
"torque_min": ["data", "data", "data", "data", "data"...],
"thrust_min": ["data", "data", "data", "data", "data"...],
"torque_power": ["data", "data", "data", "data", "data"...],
"gap_length": ["data", "data", "data", "data", "data"...]
}
Thanks in advance