Thanks for reaching out to Microsoft Q&A.
It seems like you’re trying to create a JSON array of arrays, but your current approach is creating an array of objects, each containing a single array. Here’s how you can modify your approach:
Select a table as a source: This step remains the same. You’re selecting your table with employee_id, gender, and salary columns.
Add Derived Column: Instead of creating a new column “Values”, you can directly convert your columns into string format. You can use the expression: array(t
oString(employee_id), toString(gender), toString(salary)
). Let’s call this column “tempValues”.
Add Aggregate Transformation: This is a new step. Here, you’ll group by some constant (you can create a derived column with a constant value if no such column exists) and make a collect() operation on “tempValues”. This will create a new column which is an array of “tempValues” arrays. Let’s call this column “Values”.
Select Column: Now, you just need to select the “Values” column.
Sink to JSON file: This step remains the same. You’re sinking your data into a JSON file.
This should give you a JSON in the following format:
{
"Values": [
["1", "F", "1000.000000"],
["2", "M", "2000.000000"],
["3", "F", "1500.000000"],
...
]
}
Please note that the values will be in string format because of the toString() function. If you want them in their original format, you would need a method to convert these values into their original format (integer for employee_id and salary, string for gender) after reading the JSON file.
I hope this helps! Let us know if you have any other questions.