Get Dataset Schema via ADF API

Jatinder Luthra 130 Reputation points
2023-07-21T20:58:38.67+00:00

Hello folks,

I am struggling to create ADF objects using API in python including datasets and dataflows for azuresql tables. In UI, I can easily create the dataset and ADF reads the schema of table.

Using APIs, how can I create dataset and read the schema of dataset for information to use in data flow transformation?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,196 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Vahid Ghafarpour 20,500 Reputation points
    2023-07-23T05:50:34.1666667+00:00

    Here's a sample Python code snippet to create a dataset and read its schema:

    from azure.identity import DefaultAzureCredential
    from azure.mgmt.datafactory import DataFactoryManagementClient
    from azure.mgmt.datafactory.models import DatasetResource
    
    # Replace with your Azure subscription ID and resource group name
    subscription_id = "YOUR_SUBSCRIPTION_ID"
    resource_group_name = "YOUR_RESOURCE_GROUP_NAME"
    data_factory_name = "YOUR_DATA_FACTORY_NAME"
    
    # Create the Data Factory Management Client
    credential = DefaultAzureCredential()
    adf_client = DataFactoryManagementClient(credential, subscription_id)
    
    # Define dataset properties
    dataset_name = "your_dataset_name"
    connection_string = "YOUR_AZURE_SQL_CONNECTION_STRING"
    table_name = "YOUR_AZURE_SQL_TABLE_NAME"
    
    dataset = DatasetResource(properties={
        "type": "AzureSqlTable",
        "typeProperties": {
            "connectionString": connection_string,
            "tableName": table_name
        }
    })
    
    # Create the dataset in the Data Factory
    adf_client.datasets.create_or_update(resource_group_name, data_factory_name, dataset_name, dataset)
    
    # Fetch the dataset schema
    dataset = adf_client.datasets.get(resource_group_name, data_factory_name, dataset_name)
    schema = dataset.properties["typeProperties"]["schema"]
    
    # Now you can use the 'schema' dictionary to get information about column names, data types, etc.
    print(schema)