ADF : Formmating of the file in datafactory pipelines

Pavan Kumar Chakilam 126 Reputation points
2022-05-06T17:45:15.537+00:00

Hello all,
we have a requirement as below.

problem statement:
we have some records in Cosmos DB and get the records from cosmos db and create file for each record in some specific format.

consider the below records in cosmos dB collections.
{
"id": "100",
"piId": "***1234",
"Action": "NEW_JOINEE",
"accountId": "123456",
"caseId": "22034",
"salary": 30000,
"divId": "518",
"employeeName": "Pavan kumar",
"address": null,
"dob": null,
"additionalDetails": [
{
"fieldLabel": "when did he joined?",
"fieldValue": "Field 1"
},
{
"fieldLabel": "What is his prev company?",
"fieldValue": "xyz"
},
{
"fieldLabel": "What is his phone numer",
"fieldValue": "Field 2"
},
{
"fieldLabel": "Date Resigned in prv company:",
"fieldValue": "04/22/2022"
},
{
"fieldLabel": "Additional info/Explanation:",
"fieldValue": "Add Info"
}
],
"_ts": 1651831799
}
{
"id": "101",
"piId": "***12345",
"Action": "NEW_JOINEE",
"accountId": "1234567",
"caseId": "22035",
"salary": 30000,
"divId": "518",
"employeeName": "Randy",
"address": null,
"dob": null,
"additionalDetails": [
{
"fieldLabel": "when did he joined?",
"fieldValue": "Field 1"
},
{
"fieldLabel": "What is his prev company?",
"fieldValue": "abc"
},
{
"fieldLabel": "What is his phone numer",
"fieldValue": "Field 2"
},
{
"fieldLabel": "Date Resigned in prv company:",
"fieldValue": "04/22/2022"
},
{
"fieldLabel": "Previous address:",
"fieldValue": "Houston,TX"
},
{
"fieldLabel": "Additional info/Explanation:",
"fieldValue": "Add Info"
}
],
"_ts": 1651831800
}

ADF process needs to pull these records and create 2 different files based on the caseid. if collection has 100 items/records, ADF process needs to create 100 individual files and each file contain one record (one collection item).

we were able to fulfill this requirement without any issues. we were able to create case id files by fetching each item collection data by using Look up, Foreach, Copy activity (source : cosmos db, sink: blob)

issue is:
the data inside the file coming as below in single row
{"id":"100","piId":"***1234","Action": "NEW_JOINEE","accountId": "123456","caseId": "22034","salary": 30000,"divId": "518","employeeName": "Pavan kumar","address": null,"dob": null,"additionalDetails": [{"fieldLabel": "when did he joined?","fieldValue": "Field 1"},{"fieldLabel": "What is his prev company?","fieldValue": "xyz"},{"fieldLabel": "What is his phone numer","fieldValue": "Field 2"},{"fieldLabel": "Date Resigned in prv company:","fieldValue": "04/22/2022"},{"fieldLabel": "Additional info/Explanation:","fieldValue": "Add Info"}],"_ts": 1651831799}

The expected format in the file should come as below :

id: 100
piId: ***1234
Action: NEW_JOINEE
accountId: 123456
caseId: 22034
salary: 30000
divId: 518
employeeName: Pavan kumar
address: null
dob: null
when did he joined? : Field 1
What is his prev company? : xyz
What is his phone numer?:Field 2
Date Resigned in prv company? : 04/22/2022
Additional info/Explanation? : Add Info

tried solution:
we have tried to read the generated file from copy activity into dataflow and tried to format but getting the error to read the file in Dataflow source option.

Please try to check the problem and give your valuable suggestions

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,162 questions
0 comments No comments
{count} votes

Accepted answer
  1. AnnuKumari-MSFT 32,011 Reputation points Microsoft Employee
    2022-05-10T10:30:38.807+00:00

    Hi @Pavan Kumar Chakilam ,

    Thankyou for using Microsoft Q&A platform and posting your query.

    As I understand your ask, you want to convert JSON data into simple string. Please let me know if my understanding is incorrect.

    To achieve this requirement, you can make use of DataFlow in ADF.

    1. Store the JSON data in .json file in your ADLS . Create a dataset of type- Delimited text (csv) and point to the JSON file.
    2. In the Source transformation , use the above dataset , import projection and preview data. All your data should be appearing in single column _col0_ like the below screenshot:

    200558-image.png

    3. Use Derived column transformation and use the following expression to replace double quotes " with empty string, curly braces {} with empty string and comma , with new line \n to achieve the desired output: replace(replace(replace(replace({_col0_}, '"', ''),'}',''),'{',''),',','\n') . Check the below screenshot for more details:

    200604-image.png

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you.
      Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful