GraphQL source, SQL Azure Database Destination, is Azure Data Factory the best choice?

G Goldspink 20 Reputation points
2024-11-11T22:10:24.1566667+00:00

Hi

Relatively new to the world of Data Factory and pulling information via API, in this case GraphQL. Issue I'm having is there is little examples out there of GraphQL queries in Azure Data Factory. And when it comes to pagination, which I can perform manually, I'm looking at ADF and wondering if its the right tool to query, and load to the database. At the moment source is only accessible via graphql.

Example site I'm using is https://studio.apollographql.com/public/star-wars-swapi/variant/current/home

I've figured the query format has to be

{"query":"{ allFilms { films { title director speciesConnection {species { name } } } } "}

And, I have it working in a "Copy Data" component. Alas, same setup in a RestAPI source in a dataflow throws this error, haven't gotten that working, only Copy Data (achieved via Data Preview on the source component)

at Source 'source4': Failure to read most recent page request: DF-REST_001 - Error response from server: Some({"errors":[{"message":"Unsupported content-encoding "utf-8"."}]}), Status code: 415. Please check your request url and body. (url:https://swapi-graphql.netlify.app/.netlify/functions/index,request body: Some({"query":"{ allFilms (first:2){ edges { cursor node { title director speciesConnection {species { name } } } } pageInfo { hasNextPage hasPreviousPage } } }"}), request method: POST)

If it was just the copy component, I can dump it into a json file, then in a dataflow read the file and flatten it to work around my Dataflow source issue. However... graphql has other features which is where I wonder if ADF is the right choice.

Tackling pagination, you can have a query as a cursor, which is likely going to be a need if I pull a large dataset. Example can be seen here https://hygraph.com/learn/graphql/pagination

For the starwars example, I basically pulled just 2 records, and used the "after" to then pull the next records etc.

{"query":"{ allFilms (first: 2, after: \"YXJyYXljb25uZWN0aW9uOjM=\"){ edges { cursor node { title director speciesConnection {species { name } } } } pageInfo { hasNextPage hasPreviousPage startCursor endCursor} } }"}

Picturing this in ADF atm I think my query will have to be a Variable, I sub in the values, pass it into a Copy Data component, that will pump it out into... something, which I'll then have to read the "hasNextPage" to see if there is more and substitute in the After, and stitch the Json output together for the later dataflow and flattening.

Seems complex, with potential gotchas. This is more a proof of concept. Case of harden up and get it done? or does anyone know if I'm missing something seeing as I'm early days in this tech.

Most informative post I've found is https://learn.microsoft.com/en-us/answers/questions/516288/is-there-a-way-to-pull-graphql-api-data-using-azur

Cheers

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,624 questions
0 comments No comments
{count} votes

Accepted answer
  1. Amira Bedhiafi 33,071 Reputation points Volunteer Moderator
    2024-11-12T10:33:23.74+00:00

    ADF can be used for your GraphQL-to-Azure SQL Database pipeline, but as you’ve observed, it's not inherently designed for GraphQL. ADF excels with REST APIs but requires workarounds for GraphQL, especially with pagination and cursor handling.

    As you mentioned, using a variable to manage the cursor pagination logic in ADF is viable. You’d initialize the cursor value, use it in each request, and update it iteratively based on the hasNextPage response. This approach requires multiple Copy Data activities or web requests in a ForEach activity loop until the hasNextPage is false.

    ADF's data flow transformation will help flatten the JSON, but if handling the GraphQL structure becomes cumbersome, consider transforming JSON in a staging area (like Azure Storage) and running data flow transformations from there.

    The error you encountered (Unsupported content-encoding "utf-8") could stem from ADF not interpreting the response’s encoding correctly. Try setting headers explicitly for content type (application/json) and encoding in the Web activity or REST connector. If this continues, another option is to preprocess the data outside ADF or leverage a service like Azure Functions for custom handling.

    If ADF feels too complex for this, consider Azure Functions, which offers more flexibility to handle complex GraphQL queries, pagination, and custom error handling. Logic Apps can also orchestrate API calls with stateful handling of pagination and could output the final result to ADF for further processing.

    In my opinion, ADF can handle this with some effort, but if your proof of concept reveals significant issues, combining it with Azure Functions for the GraphQL API call handling might be more streamlined.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.