How do I make a dataset based on an SQL statement with multiple tables?

Chuck Roberts 130 Reputation points
2024-10-02T09:55:25.5133333+00:00

How do I make a dataset based on an SQL statement with multiple tables, not a single table?

  1. I'm using MS Azure Data Factory online via a web browser.
  2. I'm not a total beginner but I know a few things and have completed a tutorial on Azure.
  3. This dataset should work with a dataflow.
  4. The server I want to get data from is an MS SQL server v12, hosted on our own network via an Azure on-premise gateway.
  5. The linked service in Azure is set up and passes the connection test.
  6. I have an SQL statement that connects to multiple tables that I want to use in a Dataflow source.

My final goal is to make a Source in a Dataflow which gets data based on an SQL statement. But don't I need to set up one or more datasets first?

When looking at the Source on a dataflow, under Source Settings it's required I choose a dataset, which can only be from one table. Under Source Options I can enter an SQL query but I'm assuming that will only query the single table in the dataset.

Another thing I tried.

When I tried to get an answer from the Azure community I did these steps and had this problem.

  1. Connector: "SQL Server".
  2. Choose the linked service ls_printstream_live
  3. Azure community AI says now under table choose "Query" and there is no query.

Can someone guide me on this? This part is totally new to me.

Thank you.

EDIT: I'm actually getting an error in a dataflow when I set the Source to any dataset that uses this database and do Data Preview. The error is:

"Linked service with self-hosted integration runtime is not supported in data flow. Please utilize the Azure IR with managed vnet using this tutorial". And it has a link to this tutorial: https://learn.microsoft.com/en-us/azure/data-factory/tutorial-managed-virtual-network-on-premise-sql-server

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,623 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Amira Bedhiafi 33,071 Reputation points Volunteer Moderator
    2024-10-02T19:48:38.0166667+00:00

    To create a dataset based on an SQL statement that queries multiple tables in Azure Data Factory, follow these steps:

    1. Create a Dataset:
      • In Azure Data Factory, go to the "Author" section and create a new dataset.
      • Choose the data store type as "SQL Server."
      • Select your linked service (in your case, the one connected to your on-premise SQL Server via the Azure gateway).
      • When prompted to select a table, choose None and instead opt to use a query.
    2. Use a Query Instead of a Single Table:
      • After creating the dataset, go to the Settings tab.
      • Here, you can enable the Query option to input your SQL query manually. This SQL query can include joins, subqueries, or any other SQL statement combining multiple tables. This dataset will then return the result of your query.
    3. Using the Dataset in a Dataflow:
      • Now that your dataset is based on a custom SQL query, you can use it in a dataflow.
      • Create or edit your dataflow and add a new source.
      • In the source settings, select the dataset you just created. The data will now be based on your SQL query combining multiple tables.
    4. Self-hosted Integration Runtime Limitation:
      • As you mentioned, the error you're encountering ("Linked service with self-hosted integration runtime is not supported in data flow") is due to the limitation of using self-hosted integration runtime in a dataflow.
      • To resolve this, you'll need to use Azure Integration Runtime (IR) with managed Virtual Network (VNet).
      • Follow this tutorial to set up managed VNet and Azure IR: Tutorial: On-premise SQL Server with Managed VNet.

    Let me know if you need more details or assistance with any step!

    0 comments No comments

  2. AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator
    2024-10-08T10:23:45.38+00:00

    Hi Chuck Roberts ,

    Thankyou for posting your query on Microsoft Q&A platform .

    In case you want to deal with multiple tables in SQL and query using dataflow , you could do that using query option in 'source options' tab in source transformation of mapping dataflow as below, irrespective of the table selected in dataset level, it will run the query since you would select 'query' and not table in input property of source options as shown in the below image.

    Also, you can make your dataset dynamic by creating parameters for tablename instead of selecting a particular table.

    User's image

    To your query regarding SHIR, it is not yet supported in Dataflow, you could use Azure Integration Runtime (IR) with managed Virtual Network (VNet) as mentioned by Amira.

    Hope it helps. Kindly accept the answer by clicking on Accept answer button. Thankyou

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.