How to Filter Source Data before Copy in Azure Data Factory

Charlotte Watson 21 Reputation points
2021-01-26T19:11:26.21+00:00

I have a copy pipeline that copies the tables based on a query in a lookup function. The code is below. I am trying to figure out how to filter the rows copied from the Azure SQL Database. Any suggestions are appreciated.
{
"name": "ACCopy",
"properties": {
"activities": [
{
"name": "For Each Table",
"type": "ForEach",
"dependsOn": [
{
"activity": "TableList",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "@activity('TableList').output.value",
"type": "Expression"
},
"isSequential": true,
"activities": [
{
"name": "Export Data",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "AzureSqlSource",
"queryTimeout": "02:00:00",
"partitionOption": "None"
},
"sink": {
"type": "AzureSqlSink",
"preCopyScript": "WHILE(EXISTS(SELECT 1 FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS WHERE CONSTRAINT_TYPE = 'FOREIGN KEY' ))\n--AND TABLE_NAME IN ('Table1', 'Table2') AND CONSTRAINT_NAME LIKE '%FK__%__DL%'))\nBEGIN\n DECLARE @alenzi _alterTable_fk NVARCHAR(2000)\n\n SELECT TOP 1 @alenzi _alterTable_fk = ('ALTER TABLE ' + TABLE_SCHEMA + '.[' + TABLE_NAME + '] DROP CONSTRAINT IF EXISTS [' + CONSTRAINT_NAME + ']')\n FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS\n WHERE CONSTRAINT_TYPE = 'FOREIGN KEY'\n \n\n EXEC (@alenzi _alterTable_fk)\nEND;",
"disableMetricsCollection": false
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"typeConversion": true,
"typeConversionSettings": {
"allowDataTruncation": true,
"treatBooleanAsNumber": false
}
}
},
"inputs": [
{
"referenceName": "AC",
"type": "DatasetReference",
"parameters": {
"TableName": {
"value": "@item().Table_Name",
"type": "Expression"
},
"TableSchema": {
"value": "@item().Table_Schema",
"type": "Expression"
}
}
}
],
"outputs": [
{
"referenceName": "destAC",
"type": "DatasetReference",
"parameters": {
"TableName": {
"value": "@item().Table_Name",
"type": "Expression"
},
"TableSchema": {
"value": "@item().Table_Schema",
"type": "Expression"
}
}
}
]
}
]
}
},
{
"name": "TableList",
"type": "Lookup",
"dependsOn": [
{
"activity": "EXEC spDropFKTruncate",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "AzureSqlSource",
"sqlReaderQuery": "select table_schema, table_name\nFROM INFORMATION_SCHEMA.TABLES \nWHERE table_type = 'base table'\nand table_schema = 'dbo'\nand table_name not in ( 'Con_Audit',\n'EmailLog',\n'ElectronicPayment_Audit',\n'EPT_Audit',\n'ReptFavRole',\n'__RefLog',\n'ReptLD',\n'OppServ_Audit',\n'ReptDef',\n'User_Audit',\n'ReptFav',\n'ClockTime_Audit',\n'ReptLayout',\n'ReptLayoutType',\n'SessionHeader',\n'ErrorLog',\n'PayLog',\n'DashboardElement',\n'SearchField',\n'DashboardElementUser',\n'SearchObject',\n'SearchView',\n'SearchViewFieldDisplay',\n'DropDownListMaster',\n'AdminNotification',\n'SearchViewFieldFilter',\n'SearchViewFieldGroup',\n'InventoryVariance_Audit',\n'SearchViewFieldSort',\n'SearchViewUser',\n'ReportFavoriteUser',\n'PageRequest',\n'sysdiagrams',\n'EmailReport',\n'WT_Audit',\n'ExecutionHistory',\n'SearchHistory',\n'SearchViewRole',\n'ASNV',\n'CustomerPortalInvitation',\n'ItemAllocationArchive',\n'ItemAllocationDuplicateExceptions',\n'contactTemp',\n'WTV_audit',\n'AMS',\n'WTT_Audit',\n'DashboardElementUserValue',\n'ActivitySync',\n'ActivitySyncAttachment',\n'ReportLayout_temp',\n'DashboardElementSharedUser',\n'Permission',\n'DeviceData',\n'DashboardElementSharedRole')",
"queryTimeout": "02:00:00",
"partitionOption": "None"
},
"dataset": {
"referenceName": "tablelistTestC",
"type": "DatasetReference"
},
"firstRowOnly": false
}
},
{
"name": "EXEC spDropFKTruncate",
"type": "SqlServerStoredProcedure",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"storedProcedureName": "[dbo].[spDropFKTruncate]"
},
"linkedServiceName": {
"referenceName": "linkedservicedestTestC",
"type": "LinkedServiceReference"
}
}
],
"annotations": [],
"lastPublishTime": "2021-01-22T16:26:28Z"
},
"type": "Microsoft.DataFactory/factories/pipelines"
}

Azure SQL Database
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,997 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Martin Cairney 2,256 Reputation points
    2021-01-27T01:16:57.307+00:00

    You would need a query specific for each table that you are copying data from. However, you can't combine the dynamic source that you are using (all tables in the source database other than the specific exclusions) with a table specific query as each table would have a different schema.

    You would have to either:

    1. Create Views of the tables that you want to copy from and define your filters there. Then you could query the INFORMATION_SCHEMA views for those views in the database
    2. Create a source for each table you are interested in and use a query source for each of these
    0 comments No comments

  2. HarithaMaddi-MSFT 10,136 Reputation points
    2021-01-27T11:00:32.133+00:00

    Hi @Charlotte Watson ,

    Welcome to Microsoft Q&A Platform. Thanks for posting the query.

    As per my understanding, you would like to filter data that is read from source before copying into destination. Dataflow is suitable for such requirements and filter transformation can be used to limit the data according to certain criteria.

    Please let us know if this helps in achieving this requirement. Let us know for further queries and we will be glad to assist.

    --

    • Please accept an answer if correct. Original posters help the community find answers faster by identifying the correct answer. Here is how.
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.