Supported Sources and Sinks

Article
08/25/2015

Important

This topic and its sub-topics provides the JSON format that is supported by the older versions of Azure PowerShell. If you are using the July 2015 Release version of Azure PowerShell or later, see Supported Sources and Sinks for the latest JSON format. You can convert the JSON from old format to new format by using the tool: JSON Upgrade Tool

The Copy Activity supports the following combinations of source/sink at this time.

X in the following table means: Supported.

Source/Sink	Azure Blob	Azure Table	Azure SQL Database	Azure DocumentDB	SQL Server on Azure VM	On-premises SQL Server
Azure Blob	X	X	X	X	X	X
Azure Table	X	X	X	X	X	X
Azure SQL Database	X	X	X	X	X	X
Azure DocumentDB	X	X	X
SQL Server on Azure VM	X	X	X		X	X
On-premises SQL Server Database	X	X	X		X	X
On-premises Oracle Database	X	X	X		X	X
On-premises File System	X	X	X		X	X
On-premises MySQL Database	X	X	X		X	X
On-premises DB2 Database	X	X	X		X	X
On-premises Teradata Database	X	X	X		X	X
On-premises Sybase Database	X	X	X		X	X
On-premises PostgreSQL Database	X	X	X		X	X

The following table provides the source type and sink type for various data sources that you can use in the JSON script.

	Source type	Sink type
Azure Blob	BlobSource properties	BlobSink properties
Azure Table	AzureTableSource properties	AzureTableSink properties
Azure SQL Database	SqlSource properties	SqlSink properties
Azure DocumentDB	DocumentDbCollectionSource properties	DocumentDbCollectionSink properties
SQL Server on Azure VM	SqlSource properties	SqlSink properties
On-premises SQL Server	SqlSource properties	SqlSink properties
On-premises Oracle	OracleSource properties
FileSystem	FileSystemSource
On-premises MySQL On-premises DB2 On-premises Teradata On-premises Sybase On-premises PostgreSQL	RelationalSource properties

The following sections provide properties supported by these sources and sinks.

Note

The FileSystemSource does not support any properties at this time.

BlobSource properties

Supported property

Description

Allowed values

Required

treatEmptyAsNull

Specifies whether to treat null or empty string as null value.

TRUE

FALSE

skipHeaderLineCount

Indicate how many lines need be skipped. It is applicable only when input dataset is using TextFormat.

Integer from 0 to Max.

AzureTableSource properties

Supported property

Description

Allowed values

Required

azureTableSourceQuery

Use the custom query to read data.

Azure table query string.Sample: “ColumnA eq ValueA”

azureTableSourceIgnoreTableNotFound

Indicate whether swallow the exception of table not exist.

TRUE

FALSE

DocumentDbCollectionSource properties

Supported property

Description

Allowed values

Required

query

Specify the query to read data.

Query string supported by DocumentDB.

Example: SELECT c.BusinessEntityID, c.PersonType, c.NameStyle, c.Title, c.Name.First AS FirstName, c.Name.Last AS LastName, c.Suffix, c.EmailPromotion FROM c WHERE c.ModifiedDate > \"2009-01-01T00:00:00\".

If not specified, the SQL statement that is executed: select <columns defined in structure> from mycollection.

nestingSeparator

Special character to indicate that the document is nested.

Any character. See the example below.

Example

Sample JSON document in the Person collection in a DocumentDB database:

{
  "PersonId": 2,
  "Name": {
    "First": "Jane",
    "Middle": "",
    "Last": "Doe"
  }
}

DocumentDB supports querying documents using a SQL like syntax over hierarchical JSON documents. Example:

SELECT Person.PersonId, Person.Name.First AS FirstName, Person.Name.Middle as MiddleName, Person.Name.Last AS LastName FROM Person

This is the query we will use when defining a pipeline later. For now, let’s create an input table that represents a document in the Person collection. Note that the location type is set to DocumentDbCollectionLocation, collectionName is set to Person, and linkedServiceName is set to a linked service of type DocumentDbLinkedService.

{
    "name": "PersonDocumentDbTable",
    "properties": {
        "location": {
            "type": "DocumentDbCollectionLocation",
            "collectionName": "Person",
            "linkedServiceName": "DocumentDbLinkedService"
        },
        "availability": {
            "frequency": "Day",
            "interval": 1,
            "waitOnExternal": {}
        }
    }
}

Now, let’s create an output table of type: AzureBlob.

{
    "name": "PersonBlobTableOut",
    "properties": {
        "location": {
            "type": "AzureBlobLocation",
            "folderPath": "docdb",
            "format": {
                "type": "TextFormat",
                "columnDelimiter": ",",
                "nullValue": "NULL"
            },
            "linkedServiceName": "StorageLinkedService"
        },
        "availability": {
            "frequency": "Day",
            "interval": 1
        }
    }
}

The following pipeline copies data from the Person collection in the DocumentDB database to an Azure blob in the docdb container.

{
    "name": "DocDbToBlobPipeline",
    "properties": {
        "activities": [
            {
                "type": "CopyActivity",
                "transformation": {
                    "source": {
                        "type": "DocumentDbCollectionSource",
                        "query": "SELECT Person.Id, Person.Name.First AS FirstName, Person.Name.Middle as MiddleName, Person.Name.Last AS LastName FROM Person",
                        "nestingSeparator": "."
                    },
                    "sink": {
                        "type": "BlobSink",
                        "blobWriterAddHeader": true,
                        "writeBatchSize": 1000,
                        "writeBatchTimeout": "00:00:59"
                    }
                },
                "inputs": [
                    {
                        "name": "PersonDocumentDbTable"
                    }
                ],
                "outputs": [
                    {
                        "name": "PersonBlobTableOut"
                    }
                ],
                "policy": {
                    "concurrency": 1
                },
                "name": "CopyFromDocDbToBlob"
            }
        ],
        "start": "2015-04-01T00:00:00Z",
        "end": "2015-04-02T00:00:00Z"
    }
}

SqlSource properties

Supported property

Description

Allowed values

Required

sqlReaderQuery

Use the custom query to read data.

SQL query string.For example: select * from MyTable.

If not specified, the SQL statement that is executed: select <columns defined in structure> from MyTable.

OracleSource properties

Supported property

Description

Allowed values

Required

oracleReaderQuery

Use the custom query to read data.

SQL query string. For example: select * from LOG

If not specified, the SQL statement that is executed: select <columns defined in the structure> from MyTable

RelationalSource properties

Supported property	Description	Allowed values	Required
query	Use the custom query to read data.	SQL query string. For example: select * from MyTable.	Y

AzureTableSink properties

Supported property	Description	Allowed values	Required
azureTableDefaultPartitionKeyValue	Default partition key value that can be used by the sink.	A string value.	N
azureTablePartitionKeyName	User specified column name, whose column values are used as partition key. If not specified, AzureTableDefaultPartitionKeyValue is used as the partition key.	A column name.	N
azureTableRowKeyName	User specified column name, whose column values are used as row key. If not specified, use a GUID for each row.	A column name.	N
azureTableInsertType	The mode to insert data into Azure table.	“merge” “replace”	N
writeBatchSize	Inserts data into the Azure table when the writeBatchSize or writeBatchTimeout is hit.	Integer from 1 to 100 (unit = Row Count)	N(Default = 100)
writeBatchTimeout	Inserts data into the Azure table when the writeBatchSize or writeBatchTimeout is hit	(Unit = timespan)Sample: “00:20:00” (20 minutes)	N(Default to storage client default timeout value 90 sec)

SqlSink properties

Supported property	Description	Allowed values	Required
sqlWriterStoredProcedureName	User specified stored procedure name to upsert (update/insert) data into the target table.	Name of the stored procedure.	N
sqlWriterTableType	User specified table type name to be used in the above stored procedure.	A table type name.	N
writeBatchTimeout	Wait time for the batch insert operation to complete before it times out.	(Unit = timespan) Example: “00:30:00” (30 minutes).	N
writeBatchSize	Inserts data into the SQL table when the buffer size reaches writeBatchSize.	Integer. (unit = Row Count)	N(Default = 10000)

Example:

"sink":
{
    "type": "SqlSink",
    "sqlWriterTableType": "MarketingType",
    "sqlWriterStoredProcedureName": "spOverwriteMarketing", 
    "storedProcedureParameters":
    {
        "stringData": 
        {
            "value": "str1"     
        }
    }
}

BlobSink properties

Supported property

Description

Allowed values

Required

blobWriterAddHeader

Specifies whether to add header of column definitions.

TRUE

FALSE (default)

DocumentDbCollectionSink properties

Supported property

Description

Allowed values

Required

nestingSeparator

A special character in the source column name to indicate that nested document is needed.

For example: Name.First in the output table produces the following JSON structure in the DocumentDB document:

"Name": {
    "First": "John"
  },

Character that is used to separate nesting levels. Default value is . (dot).

writeBatchSize

Number of parallel requests to DocumentDB service to create documents.

You can fine tune the performance when copying data to/from DocumentDB by using this property. You can expect a better performance when you increase writeBatchSize because more parallel requests to DocumentDB are sent. However you’ll need to avoid throttling that can throw the error message: "Request rate is large".

Throttling is decided by a number of factors, including size of documents, number of terms in documents, indexing policy of target collection, etc. For copy operations, you can use a better (e.g. S3) collection to have the most throughput available (2,500 request units/second), or reduce writeBatchSize to have a more robust although slower copy activity.

Integer value.

Example

Sample blob input:

1,John,,Doe
2, Jane,,Doe

Input Azure blob table is defined as follows:

{
    "name": "PersonBlobTableIn",
    "properties": {
        "structure": [
            {
                "name": "Id",
                "type": "Int"
            },
            {
                "name": "FirstName",
                "type": "String"
            },
            {
                "name": "MiddleName",
                "type": "String"
            },
            {
                "name": "LastName",
                "type": "String"
            }
        ],
        "location": {
            "type": "AzureBlobLocation",
            "fileName": "input.csv",
            "folderPath": "docdb",
            "format": {
                "type": "TextFormat",
                "columnDelimiter": ",",
                "nullValue": "NULL"
            },
            "linkedServiceName": "StorageLinkedService"
        },
        "availability": {
            "frequency": "Day",
            "interval": 1,
            "waitOnExternal": {}
        }
    }
}

Ouput Azure DocumentDB table is defined as follows:

{
    "name": "PersonDocumentDbTableOut",
    "properties": {
        "structure": [
            {
                "name": "Id",
                "type": "Int"
            },
            {
                "name": "Name.First",
                "type": "String"
            },
            {
                "name": "Name.Middle",
                "type": "String"
            },
            {
                "name": "Name.Last",
                "type": "String"
            }
        ],
        "location": {
            "type": "DocumentDbCollectionLocation",
            "collectionName": "Person",
            "linkedServiceName": "DocumentDbLinkedService"
        },
        "availability": {
            "frequency": "Day",
            "interval": 1
        }
    }
}

Pipeline is defined as follows:

{
    "name": "BlobToDocDbPipeline",
    "properties": {
        "activities": [
            {
                "type": "CopyActivity",
                "transformation": {
                    "source": {
                        "type": "BlobSource"
                    },
                    "sink": {
                        "type": "DocumentDbCollectionSink",
                        "nestingSeparator": ".",
                        "writeBatchSize": 2,
                        "writeBatchTimeout": "00:00:00"
                    }
                    "translator": {
                        "type": "TabularTranslator",
                        "ColumnMappings": "FirstName: Name.First, MiddleName: Name.Middle, LastName: Name.Last, BusinessEntityID: BusinessEntityID, PersonType: PersonType, NameStyle: NameStyle, Title: Title, Suffix: Suffix, EmailPromotion: EmailPromotion, rowguid: rowguid, ModifiedDate: ModifiedDate"
                    }

                },
                "inputs": [
                    {
                        "name": "PersonBlobTableIn"
                    }
                ],
                "outputs": [
                    {
                        "name": "PersonDocumentDbTableOut"
                    }
                ],
                "policy": {
                    "concurrency": 1
                },
                "name": "CopyFromBlobToDocDb"
            }
        ],
        "start": "2015-04-14T00:00:00Z",
        "end": "2015-04-15T00:00:00Z"
    }
}

Ouput JSON in the DocumentDB will look like:

{
  "Id": 1,
  "Name": {
    "First": "John",
    "Middle": null,
    "Last": "Doe"
  },
  "id": "a5e8595c-62ec-4554-a118-3940f4ff70b6"
}

DocumentDB is a NoSQL store for JSON documents, where nested structures are allowed. Azure Data Factory enables user to denote hierarchy via nestingSeparator, which is “.” in this example. With the separator, the copy activity will generate the “Name” object with three children elements First, Middle and Last, according to “Name.First”, “Name.Middle” and “Name.Last” in the table definition.

Share via

Supported Sources and Sinks

BlobSource properties

AzureTableSource properties

DocumentDbCollectionSource properties

Example

SqlSource properties

OracleSource properties

RelationalSource properties

AzureTableSink properties

SqlSink properties

BlobSink properties

DocumentDbCollectionSink properties

Example

Additional resources