Hi @Madan M R
The Derived Column transformation you tried is more for creating or modifying data columns, not for generating a separate schema file dynamically. Two Ways to Solve This
If Your Schema Doesn’t Change Often (Static Approach)
If your JSON file’s structure is pretty consistent, you can manually define the schema in your Data Flow:
- In your Derived Column transformation (derivedColumn1), create a new column (e.g., SchemaOutput) to hold the schema as a JSON string. You can write an expression like this:
This hardcodes the schema you want (like Sch_ClientBrowser and Sch_ClientCity from your screenshot).'{"columns":[{"name":"Sch_ClientBrowser","dataType":"NVARCHAR","primaryKey":true},{"name":"Sch_ClientCity","dataType":"NVARCHAR","primaryKey":true}]}'
- Add a second Sink in your Data Flow: Branch off a new stream (use the “+” to add a new branch after the Derived Column). In this new Sink, write the SchemaOutput column to AWS S3 in a separate directory (e.g., s3://your-bucket/schema/Schema.json). Set the Sink format to JSON or delimited text, and make sure it writes just one file (you can set the file name to Schema.json in the Sink settings).
- Your existing Sink (SinkS3) can keep writing the actual data to its own directory as Parquet.
This approach works if your schema is fixed, but if your JSON file’s structure changes often, it’s not ideal because you’d have to keep updating the hardcoded schema.
If Your Schema Changes (Dynamic Approach)
If your JSON file’s schema might change, we need a more flexible solution. Mapping Data Flows alone can’t dynamically extract the schema as a JSON object, but we can use an extra step in your ADF pipeline:
Add an Azure Function to Extract the Schema
- Create a small Azure Function (or ask a developer to help) that reads your JSON file from Azure Blob Storage and generates the schema in the format you want. For example, it can list all columns and their data types.
- In your ADF pipeline, add an Azure Function activity before your Mapping Data Flow. This activity calls the function, gets the schema as a JSON string, and stores it in a pipeline variable.
Pass the Schema to Your Data Flow
- In your Mapping Data Flow, add a parameter (e.g., schemaJson) to hold the schema string from the Azure Function.
- Use a Derived Column transformation to create a new column (e.g., SchemaOutput) and set its value to the schemaJson parameter.
- Add a second Sink (like in Option 1) to write this SchemaOutput column to AWS S3 as Schema.json.
- Your main data stream can still write to AWS S3 as Parquet, just like you’re doing now.
What You’ve Already Done
In your screenshot, I see you used the Derived Column transformation to create columns like Sch_ClientBrowser and Sch_ClientCity. That’s perfect for transforming your data, but for the schema file, we need a separate stream to write the schema JSON, as I described above.
I hope this information helps. Please do let us know if you have any further queries.
Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.