How to build a pipeline in Azure Data Factory which gets json file using API and save it in Storage Account using it's own json fields?

Patrick Wong 0 Reputation points
2023-03-17T15:10:37.7233333+00:00

Hello everyone,

I am currently working on building an ADF pipeline which gets one json file using the API and save it to an Azure Storage Account without any transformation.

input: longitude, latitude

output:

{
  "coord": {
    "lon": 6.3161,
    "lat": 51.081
  },
  "weather": [
    {
      "id": 803,
      "main": "Clouds",
      "description": "Überwiegend bewölkt",
      "icon": "04n"
    }
  ],
  "base": "stations",
  "main": {
    "temp": 2.96,
    "feels_like": 1.86,
    "temp_min": 1.89,
    "temp_max": 4.58,
    "pressure": 1031,
    "humidity": 75
  },
  "visibility": 10000,
  "wind": {
    "speed": 1.34,
    "deg": 155,
    "gust": 0.89
  },
  "clouds": {
    "all": 52
  },
  "dt": 1675970578,
  "sys": {
    "type": 2,
    "id": 78943,
    "country": "DE",
    "sunrise": 1675925969,
    "sunset": 1675960709
  },
  "timezone": 3600,
  "id": 2856210,
  "name": "Östrich",
  "cod": 200
}

All I want to do is to save it under container name/weather/{city name}/{year}/{month}/{day}/{hour}.json

City name is actually in the output (Östrich), and for the date and time I would like to use the trigger time of the pipeline or the unix timestamp in the output (dt).

Does anyone have any idea of how to do it? Thank you very much in advanced!

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,515 questions
{count} votes

1 answer

Sort by: Most helpful
  1. KranthiPakala-MSFT 46,422 Reputation points Microsoft Employee
    2023-03-20T23:39:53.0433333+00:00

    Hi @Patrick Wong ,

    Welcome to Microsoft Q&A forum and thanks for reaching out here.

    As per my understanding you want to make a call to an API and save the response as JSON in Azure Storage account using one of the subfolder names as city name from the API response and have subsequent folder structure as year}/{month}/{day}/{hour}.json from the pipeline trigger time. Please correct if I misunderstood the requirement.

    To do so, you will have to use 2 web activities. 1st web activity is to call the API endpoint and get the response. Once you have the API response as web activity output, you can access the API response JSON to get the city name using dynamic expression.

    Below sample dynamic expression but you will have modify it according to your response JSON Payload:

    @activity('web_GetWeatherDetailsForGivenCoodinates').output.value.name
    

    You can get the YYYY, MM, DD, HH from the inbuilt System function @pipeline().TriggerTime

    @concat(formatDateTime(pipeline().TriggerTime, 'yyyy/MM/dd/hh'),'.json')
    

    Here is a sample JSON payload for you to play around in formation the required file path:

    {
        "name": "pipeline11",
        "properties": {
            "activities": [
                {
                    "name": "SetYYYYMMDDHHFilePath",
                    "type": "SetVariable",
                    "dependsOn": [],
                    "userProperties": [],
                    "typeProperties": {
                        "variableName": "varFilePath",
                        "value": {
                            "value": "@concat(formatDateTime(pipeline().TriggerTime, 'yyyy/MM/dd/hh'),'.json')",
                            "type": "Expression"
                        }
                    }
                }
            ],
            "variables": {
                "varFilePath": {
                    "type": "String"
                }
            },
            "annotations": []
        }
    }
    

    You can also use a single set variable activity and form the {city name}/{year}/{month}/{day}/{hour}.json path as shown below.

    @concat(activity('web_GetWeatherDetailsForGivenCoodinates').output.value.name, '/' ,formatDateTime(pipeline().TriggerTime, 'yyyy/MM/dd/hh'),'.json')
    

    Then you will pass this value from variable to your Azure Blob API in the subsequent web activity for writing the file as shown in the below blog:

    Public article by a community member: Azure Data Factory – How to Save Web Activity to Blob Storage

    Please note that you can parameterize the REST connector for forming the folder and file path and then pass the value using the above variable created in the previous step.

    Your highlevel flow would look like below:

    User's image

    Hope this helps. If you have further questions, please share additional info about the API endpoint or share the endpoint link if it is a public API endpoint so that I can share more details.


    Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.