Sync a GitHub repository in Workflow Orchestration Manager
APPLIES TO: Azure Data Factory Azure Synapse Analytics
Tip
Try out Data Factory in Microsoft Fabric, an all-in-one analytics solution for enterprises. Microsoft Fabric covers everything from data movement to data science, real-time analytics, business intelligence, and reporting. Learn how to start a new trial for free!
Note
This feature is in public preview. Workflow Orchestration Manager is powered by Apache Airflow.
In this article, you learn how to synchronize your GitHub repository in Azure Data Factory Workflow Orchestration Manager in two different ways:
- By using Enable git sync in the Workflow Orchestration Manager UI.
- By using the Rest API.
Prerequisites
- Azure subscription: If you don't have an Azure subscription, create a free Azure account before you begin. Create or select an existing Data Factory instance in a region where the Workflow Orchestration Manager preview is supported.
- GitHub repository: You need access to a GitHub repository.
Use the Workflow Orchestration Manager UI
To sync your GitHub repository by using the Workflow Orchestration Manager UI:
Ensure that your repository contains the necessary folders and files:
dags/: For Apache Airflow directed acyclic graphs (dags) (required).
Plugins/: For integrating external features to Airflow.
When you create a Workflow Orchestration Manager integration runtime, select Enable git sync in the Airflow environment setup dialog.
Select one of the following supported Git service types:
- GitHub
- ADO
- GitLab
- BitBucket
Select a credential type:
None (for a public repo): When you select this option, make sure that your repository's visibility is public. Then fill out the details:
- Git repo url (required): The clone URL for the GitHub repository you want.
- Git branch (required): The current branch, where the Git repository you want is located.
Git personal access token: After you select this option for a personal access token (PAT), fill out the remaining fields based on the selected Git service type:
- GitHub personal access token
- ADO personal access token
- GitLab personal access token
- BitBucket personal access token
SPN (service principal name): Only ADO supports this credential type. After you select this option, fill out the remaining fields based on the selected Git service type:
- Git repo url (required): The clone URL to the Git repository to sync.
- Git branch (required): The branch in the repository to sync.
- Service principal app id (required): The service principal app ID with access to the ADO repo to sync.
- Service principal secret (required): A manually generated secret in the service principal whose value is used to authenticate and access the ADO repo.
- Service principal tenant id (required): The service principal tenant ID.
Fill in the rest of the fields with the required information.
Select Create.
Use the REST API
To sync your GitHub repository by using the Rest API:
Method: PUT
URL:
https://management.azure.com/subscriptions/<subscriptionid>/resourcegroups/<resourceGroupName>/providers/Microsoft.DataFactory/factories/<datafactoryName>/integrationruntimes/<airflowEnvName>?api-version=2018-06-01
URI parameters:
Name In Required Type Description Subscription Id path True string Subscription identifier ResourceGroup Name path True string Resource group name (Regex pattern: ^[-\w\._\(\)]+$
)dataFactoryName path True string Name of the Azure Data Factory (Regex pattern: ^[A-Za-z0-9]+(?:-[A-Za-z0-9]+)*$
airflowEnvName path True string Name of the Workflow Orchestration Manager environment Api-version query True string The API version Request body (Airflow configuration):
Name Type Description name string Name of the Airflow environment properties propertyType Configuration properties for the environment Properties type:
Name Type Description Type string The resource type (Airflow in this scenario) typeProperties typeProperty Airflow Type property:
Name Type Description computeProperties computeProperty Configuration of the compute type used for the environment airflowProperties airflowProperty Configuration of the Airflow properties for the environment Compute property:
Name Type Description location string The Airflow integration runtime location defaults to the data factory region. To create an integration runtime in a different region, create a new data factory in the required region. computeSize string The size of the compute node you want your Airflow environment to run on. Examples are Large or Small. Three nodes are allocated initially. extraNodes integer Each extra node adds three more workers. Airflow property:
Name Type Description airflowVersion string Supported version Apache Airflow. For example, 2.4.3. airflowRequirements Array<string> Python libraries you want to use. For example, ["flask-bcrypy=0.7.1"]. Can be a comma-delimited list. airflowEnvironmentVariables Object (Key/Value pair) Environment variables you want to use. For example, { "SAMPLE_ENV_NAME": "test" }. gitSyncProperties gitSyncProperty Git configuration properties. enableAADIntegration boolean Allows Microsoft Entra ID to log in to Workflow Orchestration Manager. userName string or null Username for Basic Authentication. password string or null Password for Basic Authentication. Git sync property:
Name Type Description gitServiceType string The Git service where your desired repository is located. Values are GitHub, ADO, GitLab, or BitBucket. gitCredentialType string Type of Git credential. Values are PAT (for personal access token), SPN (supported only by ADO), and None. repo string Repository link. branch string Branch to use in the repository. username string GitHub username. Credential string Value of the PAT. tenantId string The service principal tenant ID (supported only by ADO). Responses:
Name Status code Type Description Accepted 200 Factory OK Unauthorized 401 Cloud Error Array with more error details
Examples
Review the following examples.
Sample request:
HTTP
PUT https://management.azure.com/subscriptions/aaaa0a0a-bb1b-cc2c-dd3d-eeeeee4e4e4e/resourcegroups/abnarain-rg/providers/Microsoft.DataFactory/factories/ambika-df/integrationruntimes/sample-2?api-version=2018-06-01
Sample body:
{
"name": "sample-2",
"properties": {
"type": "Airflow",
"typeProperties": {
"computeProperties": {
"location": "East US",
"computeSize": "Large",
"extraNodes": 0
},
"airflowProperties": {
"airflowVersion": "2.4.3",
"airflowEnvironmentVariables": {
"AIRFLOW__TEST__TEST": "test"
},
"airflowRequirements": [
"apache-airflow-providers-microsoft-azure"
],
"enableAADIntegration": true,
"userName": null,
"password": null,
"airflowEntityReferences": []
}
}
}
}
Sample response:
Status code: 200 OK
Response body:
{
"id": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/your-rg/providers/Microsoft.DataFactory/factories/your-df/integrationruntimes/sample-2",
"name": "sample-2",
"type": "Microsoft.DataFactory/factories/integrationruntimes",
"properties": {
"type": "Airflow",
"typeProperties": {
"computeProperties": {
"location": "East US",
"computeSize": "Large",
"extraNodes": 0
},
"airflowProperties": {
"airflowVersion": "2.4.3",
"pythonVersion": "3.8",
"airflowEnvironmentVariables": {
"AIRFLOW__TEST__TEST": "test"
},
"airflowWebUrl": "https://e57f7409041692.eastus.airflow.svc.datafactory.azure.com/login/",
"airflowRequirements": [
"apache-airflow-providers-microsoft-azure"
],
"airflowEntityReferences": [],
"packageProviderPath": "plugins",
"enableAADIntegration": true,
"enableTriggerers": false
}
},
"state": "Initial"
},
"etag": "3402279e-0000-0100-0000-64ecb1cb0000"
}
Here are some API payload examples:
Git sync properties for GitHub with PAT:
"gitSyncProperties": { "gitServiceType": "Github", "gitCredentialType": "PAT", "repo": <repo url>, "branch": <repo branch to sync>, "username": <username>, "credential": <personal access token> }
Git sync properties for ADO with PAT:
"gitSyncProperties": { "gitServiceType": "ADO", "gitCredentialType": "PAT", "repo": <repo url>, "branch": <repo branch to sync>, "username": <username>, "credential": <personal access token> }
Git sync properties for ADO with service principal:
"gitSyncProperties": { "gitServiceType": "ADO", "gitCredentialType": "SPN", "repo": <repo url>, "branch": <repo branch to sync>, "username": < service principal app id >, "credential": <service principal secret value> "tenantId": <service principal tenant id> }
Git sync properties for a GitHub public repo:
"gitSyncProperties": { "gitServiceType": "Github", "gitCredentialType": "None", "repo": <repo url>, "branch": <repo branch to sync> }
Import a private package with Git sync
This optional process only applies when you use private packages.
This process assumes that your private package was autosynced via Git sync. You add the package as a requirement in the Workflow Orchestration Manager UI along with the path prefix /opt/airflow/git/\<repoName\>/
, if you're connecting to an ADO repo. Use /opt/airflow/git/\<repoName\>.git/
for all other Git services.
For example, if your private package is in /dags/test/private.whl
in a GitHub repo, you should add the requirement /opt/airflow/git/\<repoName\>.git/dags/test/private.whl
in the Workflow Orchestration Manager environment.