To create a dataset based on an SQL statement that queries multiple tables in Azure Data Factory, follow these steps:
- Create a Dataset:
- In Azure Data Factory, go to the "Author" section and create a new dataset.
- Choose the data store type as "SQL Server."
- Select your linked service (in your case, the one connected to your on-premise SQL Server via the Azure gateway).
- When prompted to select a table, choose None and instead opt to use a query.
- Use a Query Instead of a Single Table:
- After creating the dataset, go to the Settings tab.
- Here, you can enable the Query option to input your SQL query manually. This SQL query can include joins, subqueries, or any other SQL statement combining multiple tables. This dataset will then return the result of your query.
- Using the Dataset in a Dataflow:
- Now that your dataset is based on a custom SQL query, you can use it in a dataflow.
- Create or edit your dataflow and add a new source.
- In the source settings, select the dataset you just created. The data will now be based on your SQL query combining multiple tables.
- Self-hosted Integration Runtime Limitation:
- As you mentioned, the error you're encountering ("Linked service with self-hosted integration runtime is not supported in data flow") is due to the limitation of using self-hosted integration runtime in a dataflow.
- To resolve this, you'll need to use Azure Integration Runtime (IR) with managed Virtual Network (VNet).
- Follow this tutorial to set up managed VNet and Azure IR: Tutorial: On-premise SQL Server with Managed VNet.
Let me know if you need more details or assistance with any step!