Newbie to DataFlow in Azure Data Factory - split data to multiple files

Scott Klein 161 Reputation points
2023-12-01T00:51:57.79+00:00

OK, I am new to Azure Data Factory and Data Flows, so I have some questions, but we'll start with this one first. I am trying to read data from SQL Server into Azure blob storage, and I have a copy activity in ADF working great. Well, I need to split the incoming data into separate files (for example, columns 1-3 into file1, columns 4-6 into file2), so splitting columns, not rows.

So, I built data flow that reads data from SQL and splits the data. However, I get the "Linked Service with self-hosted integration runtime is not supported in data flow." error. In doing some looking around, what I am finding is that I first need to pull the data from SQL into blob storage first (such as a csv file first), then use the data flow to use the new blob file as the source to parse into multiple files.

Is that really the case??? That seems like a huge extra and unnecessary step. The source setting on my data flow points to my SQL dataset, so do I have something configured wrong?

User's image

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,643 questions
{count} votes

Accepted answer
  1. AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator
    2023-12-01T06:16:26.7966667+00:00

    Hi Scott Klein ,

    Thanks for using Microsoft Q&A platform and thanks for posting your query here.

    As per my understanding you are trying to split the columns from your on-prem SQL database and then load it into multiple files in Az blob storage using mapping dataflow.

    Yes, your understanding is correct. Dataflow doesn't support self hosted IR. However, you can use inline dataset in dataflow that allows to connect with on-Prem SQL Server.

    User's image

    Other workaround is to load the table into blob storage using copy activity and then create multiple branches from the source transformation and use 'select transformation' to split your columns , attach it with multiple corresponding sinks to load it into different files .

    Relevant document: https://techcommunity.microsoft.com/t5/azure-data-factory-blog/new-data-flow-connector-sql-server-as-source-and-sink/ba-p/2406213

    Hope it helps. Kindly accept the answer by clicking on Accept answerbutton. Thankyou


2 additional answers

Sort by: Most helpful
  1. Uwe Fuchs 150 Reputation points
    2023-12-01T01:55:51.07+00:00

    With regards to splitting data read from a SQL server into multiple files, maybe try use a copy activity instead. This can be achieved without a data flow, and just using the copy activity. See screenshot below on how you can set up the sink.

    User's image

    Hope it helps. Kindly accept the answer by clicking on Accept answer button - It would be greatly appreciated.


  2. Deleted

    This answer has been deleted due to a violation of our Code of Conduct. The answer was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.


    Comments have been turned off. Learn more

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.