Azure Data Factory data transformation

Question

Azure Data Factory data transformation

Shinto Kodivalappil Anto 20

i have source data as below :-

id,name,type,value
123,abc,type1,value1
123,abc,type2,value2

I want it to be transformed to :

{
	"id":123,
	"name": "abc",
	"typedata":{
		"type1":"value1",
		"type2":"value2"
	}
}

how can i do it with azure data factory. I am new to ADF , hence tried few things like aggregate functions but couldn't find a solution for it.

Accepted answer

1 additional answer

Your answer

Answer 1

AnnuKumari-MSFT 34,556 Microsoft Employee Moderator

Hi Shinto Kodivalappil Anto ,

Welcome to Microsoft Q&A platform and thanks for posting your query here.

As I understand your question, you want to transform your data from csv to nested json format using mapping dataflow in ADF pipeline.

To achieve the same, you need to follow the below steps:

Add a pivot transformation after source , and group by using 'Id' column . Use 'Type' Column for pivot key . Use max(value)for Pivoted column expression

User's image

In the data preview tab of pivot transformation, hit on Map drifted option

User's image

It will redefine the schema:

User's image

Use derived column transformation to create a new column called 'typedata' and create subcolumns within typedata as type1 and type2

User's image

Use select transformation to deselect type1 and type2 columns :
Use sink transformation with json dataset to load the data into json file and call the dataflow in a new ADF pipeline and execute it. Here is how the output would look like:

User's image

Hope it helps. Kindly accept the answer and take the survey if the answer was helpful. Thankyou

Shinto Kodivalappil Anto 20 Reputation points

2023-07-11T12:57:31.8166667+00:00

Thanks @AnnuKumari-MSFT . Exactly what i need and really quick response.

I tried to do a direct SINK in data flow but it is failing. I have to write to a blob storage and then do a COPY DATA to write to Cosmos .

Do you know if i can directly SINK to cosmos in DATA FLOW ?
AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator

2023-07-12T12:30:58.3633333+00:00

Hi Shinto Kodivalappil Anto ,

Haven't tried it. Will check and let you know . Thanks

Answer 2

Hi Shinto,

To transform your SQL data into YAML format using Azure Data Factory (ADF), you can use the Mapping Data Flow feature. Here are the basic steps:

Create a new Data Factory pipeline: In Azure Data Factory, create a new pipeline to define the data transformation process.
Add a Source data flow component: Within the pipeline, add a Source data flow component to read the SQL source data. Configure the source to connect to your SQL database and specify the table containing the data.

Add a Derived Column transformation: Add a Derived Column transformation within the data flow. This transformation allows you to create new columns or modify existing columns based on expressions. You'll use it to create the desired structure for the YAML output.

Configure the Derived Column transformation: In the Derived Column transformation, create the following expressions to generate the desired structure:

For the id and name fields, use the source columns directly.

For the typedata field, use the following expression: {'type1': type1, 'type2': type2}

Replace type1 and type2 with the respective source column names.

Add a Sink data flow component: Add a Sink data flow component to write the transformed data in YAML format. Configure the sink to save the data to your desired destination, such as a file in Azure Blob Storage.

Specify the YAML format in Sink settings: In the Sink settings, choose the appropriate format for the output file. For YAML, you can use the "DelimitedText" format and set the delimiter as appropriate for YAML (e.g., colon ":" as the delimiter).

Run the pipeline: Save and publish your pipeline in Azure Data Factory. You can then trigger the pipeline to execute the data transformation.

By following these steps, you can convert your SQL data into YAML format using Azure Data Factory's Mapping Data Flow feature.

I hope this helps?

Shinto Kodivalappil Anto 20 Reputation points

2023-07-11T10:20:25.7466667+00:00

The data transformation is from csv/parquet to json (to cosmosDB) .

if i just use derived column it wouldn't group the records. And the type1/type2 etc. has to be dynamically created and not manually.
RevelinoB 3,675 Reputation points

2023-07-11T10:30:28.25+00:00

Hi Shinto, Apologies for the confusion. If you want to transform your CSV or Parquet data to JSON format dynamically and group the records based on certain criteria, you can use the Azure Data Factory's Mapping Data Flow feature along with an Aggregate transformation. Here's what you could do:

Create a new pipeline in Azure Data Factory.

Add a Source data flow component to read the CSV or Parquet data.

Configure the Source to connect to your source file and define the schema.

Add an Aggregate transformation to group the records based on a specific column (e.g., "id" and "name").

Configure the Aggregate transformation to group by the desired column(s) and define the aggregate operations for other columns (e.g., "value") within each group.

Add a Derived Column transformation after the Aggregate transformation to create the dynamic "typedata" field.

Use expressions within the Derived Column transformation to generate the dynamic "typedata" field based on the grouped values.

You can use functions like collect or create_map to generate the desired structure dynamically.

Add a Sink data flow component to write the transformed data to Cosmos DB in JSON format.

Configure the Sink to connect to your Cosmos DB instance and specify the target collection.

Run the pipeline to execute the data transformation.

If you follow these steps, you can dynamically transform your CSV or Parquet data to JSON format, group the records, and store them in Cosmos DB using Azure Data Factory's Mapping Data Flow feature.

I hope this helps with your query?

Share via

Azure Data Factory data transformation

1 additional answer

Your answer