Share via

Azure Data Factory data transformation

Shinto Kodivalappil Anto 20 Reputation points
Jul 11, 2023, 9:29 AM

i have source data as below :-

id,name,type,value
123,abc,type1,value1
123,abc,type2,value2

 

I want it to be transformed to : 

{
	"id":123,
	"name": "abc",
	"typedata":{
		"type1":"value1",
		"type2":"value2"
	}
}

how can i do it with azure data factory. I am new to ADF , hence tried few things like aggregate functions but couldn't find a solution for it.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,107 questions
0 comments No comments
{count} votes

Accepted answer
  1. AnnuKumari-MSFT 33,986 Reputation points Microsoft Employee
    Jul 11, 2023, 10:52 AM

    Hi Shinto Kodivalappil Anto ,

    Welcome to Microsoft Q&A platform and thanks for posting your query here.

    As I understand your question, you want to transform your data from csv to nested json format using mapping dataflow in ADF pipeline.

    To achieve the same, you need to follow the below steps:

    1. Add a pivot transformation after source , and group by using 'Id' column . Use 'Type' Column for pivot key . Use max(value)for Pivoted column expression

    User's image

    1. In the data preview tab of pivot transformation, hit on Map drifted option

    User's image

    It will redefine the schema:

    User's image

    1. Use derived column transformation to create a new column called 'typedata' and create subcolumns within typedata as type1 and type2

    User's image

    1. Use select transformation to deselect type1 and type2 columns :User's image
    2. Use sink transformation with json dataset to load the data into json file and call the dataflow in a new ADF pipeline and execute it. Here is how the output would look like:

    User's image

    Hope it helps. Kindly accept the answer and take the survey if the answer was helpful. Thankyou


1 additional answer

Sort by: Most helpful
  1. RevelinoB 3,435 Reputation points
    Jul 11, 2023, 9:45 AM

    Hi Shinto,

    To transform your SQL data into YAML format using Azure Data Factory (ADF), you can use the Mapping Data Flow feature. Here are the basic steps:

    • Create a new Data Factory pipeline: In Azure Data Factory, create a new pipeline to define the data transformation process.
    • Add a Source data flow component: Within the pipeline, add a Source data flow component to read the SQL source data. Configure the source to connect to your SQL database and specify the table containing the data.

    Add a Derived Column transformation: Add a Derived Column transformation within the data flow. This transformation allows you to create new columns or modify existing columns based on expressions. You'll use it to create the desired structure for the YAML output.

    Configure the Derived Column transformation: In the Derived Column transformation, create the following expressions to generate the desired structure:

    For the id and name fields, use the source columns directly.

    For the typedata field, use the following expression: {'type1': type1, 'type2': type2}

    Replace type1 and type2 with the respective source column names.

    • Add a Sink data flow component: Add a Sink data flow component to write the transformed data in YAML format. Configure the sink to save the data to your desired destination, such as a file in Azure Blob Storage.

    Specify the YAML format in Sink settings: In the Sink settings, choose the appropriate format for the output file. For YAML, you can use the "DelimitedText" format and set the delimiter as appropriate for YAML (e.g., colon ":" as the delimiter).

    • Run the pipeline: Save and publish your pipeline in Azure Data Factory. You can then trigger the pipeline to execute the data transformation.

    By following these steps, you can convert your SQL data into YAML format using Azure Data Factory's Mapping Data Flow feature.

    I hope this helps?


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.