ADF How to POST large body to REST connector?

Adrian Filipescu 30 Reputation points
2023-11-14T09:43:30.58+00:00

I am making a HTTP Request to a webservice that returns a csv (>1MB csv) which I am converting it (row by row) to LINE protocol for an Influx DB using a Dataflow.

The transformation works ok(~1min) using the default integration runtime.

DF

The problem comes when I try to "POST" that information to the Influx Db using a REST linked service (with None as format).

Rest Linked Service

Sink 1

Sink 2

It is working, but really really slow (about 30 min for 60k rows) and its not the Influx endpoint's problem because when I try to "POST" already transformed rows (using Postman) to the same endpoint it takes a few seconds for ~500k rows.

I think ADF tries to "POST" them in small batches or something because I can query the sink (Influx Db) and I see the way it "grows" up (really slow).

Can a batch size setting be increased for a "None" format?

What I tried (besides the first method):

  1. saving a file with the transformed rows and use a pipeline copy activity with that file as the source and the Influx REST linked service as sink -> doesn't work
  2. saving a file with the transformed rows and use another dataflow that uses that file as source and the Influx REST linked service as sink -> works but really slow ~30 min. Here I thought that having already precalculated lines will help.
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,568 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 23,096 Reputation points
    2023-11-14T14:55:28.11+00:00

    The Copy Activity allows you to specify a batchSize parameter when moving data to and from data stores. However, when using the REST connector, the behavior can vary because the REST API may not support batch processing in the same way. If the InfluxDB API supports batched posts, you might need to ensure that the data is being sent in larger batches. This is typically controlled in the body of the POST request. You can check this old thread : https://salesforce.stackexchange.com/questions/15431/rest-api-queryoptions-batchsize

    By default, ADF might not be using all the available DIUs for your task. You can increase parallelism in the pipeline settings, which should allow you to process more data simultaneously, provided that the InfluxDB endpoint can handle the increased load. (https://stackoverflow.com/questions/72902650/increase-parallel-copy-in-azure-data-factory)


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.