Dataset format needed for azure blob storage

Reema DSouza 20 Reputation points
2024-05-24T13:53:47.7833333+00:00

How do I select what format is right for me while moving documents from SQL Server to Azure blob storage using Azure Data Factory pipeline? The existing documents in the Blob storage show the type as Block Blob, but it doesn't show the format. I need to match my new documents to the same format as the existing ones.

Is there anyway I can know what format are these documents stored as?

Image preview

Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,559 questions
SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
13,114 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,930 questions
0 comments No comments
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA-MSFT 83,301 Reputation points Microsoft Employee
    2024-05-27T05:36:00.8866667+00:00

    @Reema DSouza - Thanks for the question and using MS Q&A platform.

    To select the right format for your dataset while moving documents from SQL Server to Azure Blob Storage using Azure Data Factory pipeline, you need to know the format of the existing documents in the Blob storage.

    The format of the documents in Azure Blob Storage can be one of the following:

    • Text format (CSV, TSV, JSON, or Avro)
    • Binary format (Parquet, ORC, or binary file)

    You can check the format of the existing documents in Azure Blob Storage by looking at the file extension. For example, if the file extension is .csv, then the format is CSV. If the file extension is .parquet, then the format is Parquet.

    If the existing documents in the Blob storage show the type as Block Blob, but it doesn't show the format, you can assume that the format is binary.

    To match your new documents to the same format as the existing ones, you need to specify the format in the dataset properties of your Azure Data Factory pipeline. You can specify the format under the format property of the dataset.

    Here is an example of how to specify the format as CSV in the dataset properties:

    {
        "name": "MyBlobStorageDataset",
        "properties": {
            "structure": [
                {
                    "name": "Column1",
                    "type": "String"
                },
                {
                    "name": "Column2",
                    "type": "String"
                }
            ],
            "published": false,
            "type": "AzureBlob",
            "linkedServiceName": "MyBlobStorageLinkedService",
            "typeProperties": {
                "fileName": "*.csv",
                "folderPath": "myfolder",
                "format": {
                    "type": "TextFormat",
                    "columnDelimiter": ",",
                    "rowDelimiter": "\n",
                    "quoteChar": "\"",
                    "escapeChar": "\"",
                    "nullValue": "\\N",
                    "encodingName": "UTF-8"
                }
            },
            "availability": {
                "frequency": "Day",
                "interval": 1
            }
        }
    }
    

    For more details, refer to Supported file formats and compression codecs by copy activity in Azure Data Factory and Azure Synapse pipelines and Copy and transform data in Azure Blob Storage by using Azure Data Factory or Azure Synapse Analytics.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Gowtham CP 3,655 Reputation points
    2024-05-24T14:02:24.6366667+00:00

    Hello Reema DSouza ,

    Thanks for reaching out in the Microsoft Q&A!

    To identify the format of existing documents in Blob Storage, you can download a sample for inspection or utilize Azure Storage Explorer. Once you've determined the format (e.g., CSV, JSON), ensure that the format matches for new documents. In Azure Data Factory, you can configure the sink dataset format to align with the existing documents, specifying it in the Copy Activity settings. This ensures seamless compatibility between the old and new documents stored in Azure Blob Storage. For detailed steps, you can refer to Microsoft's documentation on copying data from SQL Server to Blob Storage. If you find this helpful, please accept this answer to close the thread. Thanks!

    1 person found this answer helpful.