Supported Sources and Sinks
Important
This topic and its sub-topics provides the JSON format that is supported by the older versions of Azure PowerShell. If you are using the July 2015 Release version of Azure PowerShell or later, see Supported Sources and Sinks for the latest JSON format. You can convert the JSON from old format to new format by using the tool: JSON Upgrade Tool
The Copy Activity supports the following combinations of source/sink at this time.
X in the following table means: Supported.
Source/Sink |
Azure |
Azure |
Azure |
Azure |
SQL Server |
On-premises |
Azure Blob |
X |
X |
X |
X |
X |
X |
Azure Table |
X |
X |
X |
X |
X |
X |
Azure SQL Database |
X |
X |
X |
X |
X |
X |
Azure DocumentDB |
X |
X |
X |
|||
SQL Server on Azure VM |
X |
X |
X |
X |
X |
|
On-premises SQL Server Database |
X |
X |
X |
X |
X |
|
On-premises Oracle Database |
X |
X |
X |
X |
X |
|
On-premises File System |
X |
X |
X |
X |
X |
|
On-premises MySQL Database |
X |
X |
X |
X |
X |
|
On-premises DB2 Database |
X |
X |
X |
X |
X |
|
On-premises Teradata Database |
X |
X |
X |
X |
X |
|
On-premises Sybase Database |
X |
X |
X |
X |
X |
|
On-premises PostgreSQL Database |
X |
X |
X |
X |
X |
The following table provides the source type and sink type for various data sources that you can use in the JSON script.
Source type |
Sink type |
|
Azure Blob |
BlobSource properties |
BlobSink properties |
Azure Table |
AzureTableSource properties |
AzureTableSink properties |
Azure SQL Database |
SqlSource properties |
SqlSink properties |
Azure DocumentDB |
DocumentDbCollectionSource properties |
DocumentDbCollectionSink properties |
SQL Server on Azure VM |
SqlSource properties |
SqlSink properties |
On-premises SQL Server |
SqlSource properties |
SqlSink properties |
On-premises Oracle |
OracleSource properties |
|
FileSystem |
FileSystemSource |
|
On-premises MySQL On-premises DB2 On-premises Teradata On-premises Sybase On-premises PostgreSQL |
RelationalSource properties |
The following sections provide properties supported by these sources and sinks.
Note
The FileSystemSource does not support any properties at this time.
BlobSource properties
Supported property |
Description |
Allowed values |
Required |
treatEmptyAsNull |
Specifies whether to treat null or empty string as null value. |
TRUE FALSE |
N |
skipHeaderLineCount |
Indicate how many lines need be skipped. It is applicable only when input dataset is using TextFormat. |
Integer from 0 to Max. |
N |
AzureTableSource properties
Supported property |
Description |
Allowed values |
Required |
azureTableSourceQuery |
Use the custom query to read data. |
Azure table query string.Sample: “ColumnA eq ValueA” |
N |
azureTableSourceIgnoreTableNotFound |
Indicate whether swallow the exception of table not exist. |
TRUE FALSE |
N |
DocumentDbCollectionSource properties
Supported property |
Description |
Allowed values |
Required |
query |
Specify the query to read data. |
Query string supported by DocumentDB. Example: SELECT c.BusinessEntityID, c.PersonType, c.NameStyle, c.Title, c.Name.First AS FirstName, c.Name.Last AS LastName, c.Suffix, c.EmailPromotion FROM c WHERE c.ModifiedDate > \"2009-01-01T00:00:00\". |
N If not specified, the SQL statement that is executed: select <columns defined in structure> from mycollection. |
nestingSeparator |
Special character to indicate that the document is nested. |
Any character. See the example below. |
N |
Example
Sample JSON document in the Person collection in a DocumentDB database:
{
"PersonId": 2,
"Name": {
"First": "Jane",
"Middle": "",
"Last": "Doe"
}
}
DocumentDB supports querying documents using a SQL like syntax over hierarchical JSON documents. Example:
SELECT Person.PersonId, Person.Name.First AS FirstName, Person.Name.Middle as MiddleName, Person.Name.Last AS LastName FROM Person
This is the query we will use when defining a pipeline later. For now, let’s create an input table that represents a document in the Person collection. Note that the location type is set to DocumentDbCollectionLocation, collectionName is set to Person, and linkedServiceName is set to a linked service of type DocumentDbLinkedService.
{
"name": "PersonDocumentDbTable",
"properties": {
"location": {
"type": "DocumentDbCollectionLocation",
"collectionName": "Person",
"linkedServiceName": "DocumentDbLinkedService"
},
"availability": {
"frequency": "Day",
"interval": 1,
"waitOnExternal": {}
}
}
}
Now, let’s create an output table of type: AzureBlob.
{
"name": "PersonBlobTableOut",
"properties": {
"location": {
"type": "AzureBlobLocation",
"folderPath": "docdb",
"format": {
"type": "TextFormat",
"columnDelimiter": ",",
"nullValue": "NULL"
},
"linkedServiceName": "StorageLinkedService"
},
"availability": {
"frequency": "Day",
"interval": 1
}
}
}
The following pipeline copies data from the Person collection in the DocumentDB database to an Azure blob in the docdb container.
{
"name": "DocDbToBlobPipeline",
"properties": {
"activities": [
{
"type": "CopyActivity",
"transformation": {
"source": {
"type": "DocumentDbCollectionSource",
"query": "SELECT Person.Id, Person.Name.First AS FirstName, Person.Name.Middle as MiddleName, Person.Name.Last AS LastName FROM Person",
"nestingSeparator": "."
},
"sink": {
"type": "BlobSink",
"blobWriterAddHeader": true,
"writeBatchSize": 1000,
"writeBatchTimeout": "00:00:59"
}
},
"inputs": [
{
"name": "PersonDocumentDbTable"
}
],
"outputs": [
{
"name": "PersonBlobTableOut"
}
],
"policy": {
"concurrency": 1
},
"name": "CopyFromDocDbToBlob"
}
],
"start": "2015-04-01T00:00:00Z",
"end": "2015-04-02T00:00:00Z"
}
}
SqlSource properties
Supported property |
Description |
Allowed values |
Required |
sqlReaderQuery |
Use the custom query to read data. |
SQL query string.For example: select * from MyTable. If not specified, the SQL statement that is executed: select <columns defined in structure> from MyTable. |
N |
OracleSource properties
Supported property |
Description |
Allowed values |
Required |
oracleReaderQuery |
Use the custom query to read data. |
SQL query string. For example: select * from LOG If not specified, the SQL statement that is executed: select <columns defined in the structure> from MyTable |
N |
RelationalSource properties
Supported property |
Description |
Allowed values |
Required |
query |
Use the custom query to read data. |
SQL query string. For example: select * from MyTable. |
Y |
AzureTableSink properties
Supported property |
Description |
Allowed values |
Required |
azureTableDefaultPartitionKeyValue |
Default partition key value that can be used by the sink. |
A string value. |
N |
azureTablePartitionKeyName |
User specified column name, whose column values are used as partition key. If not specified, AzureTableDefaultPartitionKeyValue is used as the partition key. |
A column name. |
N |
azureTableRowKeyName |
User specified column name, whose column values are used as row key. If not specified, use a GUID for each row. |
A column name. |
N |
azureTableInsertType |
The mode to insert data into Azure table. |
“merge” “replace” |
N |
writeBatchSize |
Inserts data into the Azure table when the writeBatchSize or writeBatchTimeout is hit. |
Integer from 1 to 100 (unit = Row Count) |
N(Default = 100) |
writeBatchTimeout |
Inserts data into the Azure table when the writeBatchSize or writeBatchTimeout is hit |
(Unit = timespan)Sample: “00:20:00” (20 minutes) |
N(Default to storage client default timeout value 90 sec) |
SqlSink properties
Supported property |
Description |
Allowed values |
Required |
sqlWriterStoredProcedureName |
User specified stored procedure name to upsert (update/insert) data into the target table. |
Name of the stored procedure. |
N |
sqlWriterTableType |
User specified table type name to be used in the above stored procedure. |
A table type name. |
N |
writeBatchTimeout |
Wait time for the batch insert operation to complete before it times out. |
(Unit = timespan) Example: “00:30:00” (30 minutes). |
N |
writeBatchSize |
Inserts data into the SQL table when the buffer size reaches writeBatchSize. |
Integer. (unit = Row Count) |
N(Default = 10000) |
Example:
"sink":
{
"type": "SqlSink",
"sqlWriterTableType": "MarketingType",
"sqlWriterStoredProcedureName": "spOverwriteMarketing",
"storedProcedureParameters":
{
"stringData":
{
"value": "str1"
}
}
}
BlobSink properties
Supported property |
Description |
Allowed values |
Required |
blobWriterAddHeader |
Specifies whether to add header of column definitions. |
TRUE FALSE (default) |
N |
DocumentDbCollectionSink properties
Supported property |
Description |
Allowed values |
Required |
nestingSeparator |
A special character in the source column name to indicate that nested document is needed. For example: Name.First in the output table produces the following JSON structure in the DocumentDB document:
|
Character that is used to separate nesting levels. Default value is . (dot). |
N |
writeBatchSize |
Number of parallel requests to DocumentDB service to create documents. You can fine tune the performance when copying data to/from DocumentDB by using this property. You can expect a better performance when you increase writeBatchSize because more parallel requests to DocumentDB are sent. However you’ll need to avoid throttling that can throw the error message: "Request rate is large". Throttling is decided by a number of factors, including size of documents, number of terms in documents, indexing policy of target collection, etc. For copy operations, you can use a better (e.g. S3) collection to have the most throughput available (2,500 request units/second), or reduce writeBatchSize to have a more robust although slower copy activity. |
Integer value. |
N |
Example
Sample blob input:
1,John,,Doe
2, Jane,,Doe
Input Azure blob table is defined as follows:
{
"name": "PersonBlobTableIn",
"properties": {
"structure": [
{
"name": "Id",
"type": "Int"
},
{
"name": "FirstName",
"type": "String"
},
{
"name": "MiddleName",
"type": "String"
},
{
"name": "LastName",
"type": "String"
}
],
"location": {
"type": "AzureBlobLocation",
"fileName": "input.csv",
"folderPath": "docdb",
"format": {
"type": "TextFormat",
"columnDelimiter": ",",
"nullValue": "NULL"
},
"linkedServiceName": "StorageLinkedService"
},
"availability": {
"frequency": "Day",
"interval": 1,
"waitOnExternal": {}
}
}
}
Ouput Azure DocumentDB table is defined as follows:
{
"name": "PersonDocumentDbTableOut",
"properties": {
"structure": [
{
"name": "Id",
"type": "Int"
},
{
"name": "Name.First",
"type": "String"
},
{
"name": "Name.Middle",
"type": "String"
},
{
"name": "Name.Last",
"type": "String"
}
],
"location": {
"type": "DocumentDbCollectionLocation",
"collectionName": "Person",
"linkedServiceName": "DocumentDbLinkedService"
},
"availability": {
"frequency": "Day",
"interval": 1
}
}
}
Pipeline is defined as follows:
{
"name": "BlobToDocDbPipeline",
"properties": {
"activities": [
{
"type": "CopyActivity",
"transformation": {
"source": {
"type": "BlobSource"
},
"sink": {
"type": "DocumentDbCollectionSink",
"nestingSeparator": ".",
"writeBatchSize": 2,
"writeBatchTimeout": "00:00:00"
}
"translator": {
"type": "TabularTranslator",
"ColumnMappings": "FirstName: Name.First, MiddleName: Name.Middle, LastName: Name.Last, BusinessEntityID: BusinessEntityID, PersonType: PersonType, NameStyle: NameStyle, Title: Title, Suffix: Suffix, EmailPromotion: EmailPromotion, rowguid: rowguid, ModifiedDate: ModifiedDate"
}
},
"inputs": [
{
"name": "PersonBlobTableIn"
}
],
"outputs": [
{
"name": "PersonDocumentDbTableOut"
}
],
"policy": {
"concurrency": 1
},
"name": "CopyFromBlobToDocDb"
}
],
"start": "2015-04-14T00:00:00Z",
"end": "2015-04-15T00:00:00Z"
}
}
Ouput JSON in the DocumentDB will look like:
{
"Id": 1,
"Name": {
"First": "John",
"Middle": null,
"Last": "Doe"
},
"id": "a5e8595c-62ec-4554-a118-3940f4ff70b6"
}
DocumentDB is a NoSQL store for JSON documents, where nested structures are allowed. Azure Data Factory enables user to denote hierarchy via nestingSeparator, which is “.” in this example. With the separator, the copy activity will generate the “Name” object with three children elements First, Middle and Last, according to “Name.First”, “Name.Middle” and “Name.Last” in the table definition.