ADF GROUPED EXPERTquery

Mahlaule, Rirhadzu 0 Reputation points
2024-11-15T14:55:09.8766667+00:00

Hi,

API receives input expert query in JSON format and returns related transitional data in JSON format. When data is transferred then separate request with expertquery for each combination is executed with QueryTarget = self, as a results result a lot of small files are generated in Raw Storage Container and processing times are long. to resolve this issue i want to execute “grouped” ExpertQueries only for a single combination and the whole PMC hierarchy each time, each response json file would contain data not just for one single Node but for all related Nodes how can i best apply this changes in my pipelines and dataflows?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,991 questions
{count} votes

1 answer

Sort by: Most helpful
  1. phemanth 12,300 Reputation points Microsoft Vendor
    2024-11-15T17:41:04.02+00:00

    @Mahlaule, Rirhadzu

    Thanks for reaching out to Microsoft Q&A.

    To optimize your Azure Data Factory (ADF) pipeline for handling the expert queries more efficiently, you can follow these steps to implement a "grouped" expert query approach. This will allow you to retrieve data for multiple related nodes in a single request rather than making separate requests for each combination.

    Steps to Implement Grouped Expert Queries in ADF

    Modify the Expert Query Structure:

    • Update your expert query JSON structure to accept multiple nodes at once. Instead of sending a request for each node, you can send a single request that includes all the nodes in the PMC hierarchy you want to query.
    • Ensure that your API can handle this new format and return data for all the requested nodes in one response.

    Create a Lookup Activity:

    • Use a Lookup activity in your ADF pipeline to retrieve the list of nodes from your source. This could be a database, a file, or another API call.
    • This activity should output a list of all the nodes you want to include in your grouped query.

    Use a ForEach Activity:

    • Instead of executing a separate request for each node, implement a ForEach activity that loops through the list of nodes retrieved from the Lookup activity.
    • Inside the ForEach activity, you can create a JSON object that combines the nodes you want to include in your single grouped query.

    Execute the Grouped Query:

    • Within the ForEach activity, add a Web Activity (or another appropriate activity) to call your API with the grouped query JSON.
    • Ensure that the API call is correctly formatted to accept the grouped request and returns the expected JSON response.

    Process the Response:

    • After receiving the response from the API, you can use additional activities (such as Data Flow or Copy Data) to process the response JSON.
    • Depending on your requirements, you may want to flatten the JSON structure or filter the data before storing it in your Raw Storage Container.

    Store the Results:

    • Instead of creating multiple small files for each node, you can aggregate the results into a single file or a set of files based on your storage strategy (e.g., partitioning by date or node type).
    • Use a Copy Data activity to write the aggregated results to your Raw Storage Container.

    Error Handling and Monitoring:

    • Implement error handling in your pipeline to manage any failed requests or issues with data processing.
    • Use ADF monitoring features to keep track of the performance and success of your grouped queries.

    Hope this helps. Do let us know if you any further queries.

    1 person found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.