Events
Mar 31, 11 PM - Apr 2, 11 PM
The ultimate Microsoft Fabric, Power BI, SQL, and AI community-led event. March 31 to April 2, 2025.
Register todayThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
APPLIES TO: Azure Data Factory Azure Synapse Analytics
Tip
Try out Data Factory in Microsoft Fabric, an all-in-one analytics solution for enterprises. Microsoft Fabric covers everything from data movement to data science, real-time analytics, business intelligence, and reporting. Learn how to start a new trial for free!
Follow this article when you want to parse Avro files or write the data into Avro format.
Avro format is supported for the following connectors: Amazon S3, Amazon S3 Compatible Storage, Azure Blob, Azure Data Lake Storage Gen1, Azure Data Lake Storage Gen2, Azure Files, File System, FTP, Google Cloud Storage, HDFS, HTTP, Oracle Cloud Storage and SFTP.
For a full list of sections and properties available for defining datasets, see the Datasets article. This section provides a list of properties supported by the Avro dataset.
Property | Description | Required |
---|---|---|
type | The type property of the dataset must be set to Avro. | Yes |
location | Location settings of the file(s). Each file-based connector has its own location type and supported properties under location . See details in connector article -> Dataset properties section. |
Yes |
avroCompressionCodec | The compression codec to use when writing to Avro files. When reading from Avro files, the service automatically determines the compression codec based on the file metadata. Supported types are "none" (default), "deflate", "snappy". Note currently Copy activity doesn't support Snappy when read/write Avro files. |
No |
Note
White space in column name is not supported for Avro files.
Below is an example of Avro dataset on Azure Blob Storage:
{
"name": "AvroDataset",
"properties": {
"type": "Avro",
"linkedServiceName": {
"referenceName": "<Azure Blob Storage linked service name>",
"type": "LinkedServiceReference"
},
"schema": [ < physical schema, optional, retrievable during authoring > ],
"typeProperties": {
"location": {
"type": "AzureBlobStorageLocation",
"container": "containername",
"folderPath": "folder/subfolder",
},
"avroCompressionCodec": "snappy"
}
}
}
For a full list of sections and properties available for defining activities, see the Pipelines article. This section provides a list of properties supported by the Avro source and sink.
The following properties are supported in the copy activity *source* section.
Property | Description | Required |
---|---|---|
type | The type property of the copy activity source must be set to AvroSource. | Yes |
storeSettings | A group of properties on how to read data from a data store. Each file-based connector has its own supported read settings under storeSettings . See details in connector article -> Copy activity properties section. |
No |
The following properties are supported in the copy activity *sink* section.
Property | Description | Required |
---|---|---|
type | The type property of the copy activity source must be set to AvroSink. | Yes |
formatSettings | A group of properties. Refer to Avro write settings table below. | No |
storeSettings | A group of properties on how to write data to a data store. Each file-based connector has its own supported write settings under storeSettings . See details in connector article -> Copy activity properties section. |
No |
Supported Avro write settings under formatSettings
:
Property | Description | Required |
---|---|---|
type | The type of formatSettings must be set to AvroWriteSettings. | Yes |
maxRowsPerFile | When writing data into a folder, you can choose to write to multiple files and specify the max rows per file. | No |
fileNamePrefix | Applicable when maxRowsPerFile is configured.Specify the file name prefix when writing data to multiple files, resulted in this pattern: <fileNamePrefix>_00000.<fileExtension> . If not specified, file name prefix will be auto generated. This property does not apply when source is file-based store or partition-option-enabled data store. |
No |
In mapping data flows, you can read and write to avro format in the following data stores: Azure Blob Storage, Azure Data Lake Storage Gen1, Azure Data Lake Storage Gen2 and SFTP, and you can read avro format in Amazon S3.
The below table lists the properties supported by an avro source. You can edit these properties in the Source options tab.
Name | Description | Required | Allowed values | Data flow script property |
---|---|---|---|---|
Wild card paths | All files matching the wildcard path will be processed. Overrides the folder and file path set in the dataset. | no | String[] | wildcardPaths |
Partition root path | For file data that is partitioned, you can enter a partition root path in order to read partitioned folders as columns | no | String | partitionRootPath |
List of files | Whether your source is pointing to a text file that lists files to process | no | true or false |
fileList |
Column to store file name | Create a new column with the source file name and path | no | String | rowUrlColumn |
After completion | Delete or move the files after processing. File path starts from the container root | no | Delete: true or false Move: ['<from>', '<to>'] |
purgeFiles moveFiles |
Filter by last modified | Choose to filter files based upon when they were last altered | no | Timestamp | modifiedAfter modifiedBefore |
Allow no files found | If true, an error is not thrown if no files are found | no | true or false |
ignoreNoFilesFound |
The below table lists the properties supported by an avro sink. You can edit these properties in the Settings tab.
Name | Description | Required | Allowed values | Data flow script property |
---|---|---|---|---|
Clear the folder | If the destination folder is cleared prior to write | no | true or false |
truncate |
File name option | The naming format of the data written. By default, one file per partition in format part-#####-tid-<guid> |
no | Pattern: String Per partition: String[] As data in column: String Output to single file: ['<fileName>'] |
filePattern partitionFileNames rowUrlColumn partitionFileNames |
Quote all | Enclose all values in quotes | no | true or false |
quoteAll |
Avro complex data types are not supported (records, enums, arrays, maps, unions, and fixed) in Copy Activity.
When working with Avro files in data flows, you can read and write complex data types, but be sure to clear the physical schema from the dataset first. In data flows, you can set your logical projection and derive columns that are complex structures, then auto-map those fields to an Avro file.
Events
Mar 31, 11 PM - Apr 2, 11 PM
The ultimate Microsoft Fabric, Power BI, SQL, and AI community-led event. March 31 to April 2, 2025.
Register todayTraining
Module
Petabyte-scale ingestion with Azure Data Factory - Training
Petabyte-scale ingestion with Azure Data Factory or Azure Synapse Pipeline
Certification
Microsoft Certified: Azure Data Engineer Associate - Certifications
Demonstrate understanding of common data engineering tasks to implement and manage data engineering workloads on Microsoft Azure, using a number of Azure services.
Documentation
Supported file formats by copy activity in Azure Data Factory - Azure Data Factory & Azure Synapse
This topic describes the file formats and compression codes that are supported by copy activity in Azure Data Factory and Azure Synapse Analytics.
Supported file formats (legacy) - Azure Data Factory & Azure Synapse
Learn about file formats and compression codecs supported by file-based connectors in Azure Data Factory and Synapse Analytics.
ORC format support - Azure Data Factory & Azure Synapse
This topic describes how to deal with ORC format in Azure Data Factory and Synapse Analytics pipelines.