Dataset format needed for azure blob storage

Question

Dataset format needed for azure blob storage

Reema DSouza 40

How do I select what format is right for me while moving documents from SQL Server to Azure blob storage using Azure Data Factory pipeline? The existing documents in the Blob storage show the type as Block Blob, but it doesn't show the format. I need to match my new documents to the same format as the existing ones.

Is there anyway I can know what format are these documents stored as?

Image preview

Accepted answer

1 additional answer

Your answer

Answer 1

@Reema DSouza - Thanks for the question and using MS Q&A platform.

To select the right format for your dataset while moving documents from SQL Server to Azure Blob Storage using Azure Data Factory pipeline, you need to know the format of the existing documents in the Blob storage.

The format of the documents in Azure Blob Storage can be one of the following:

Text format (CSV, TSV, JSON, or Avro)
Binary format (Parquet, ORC, or binary file)

You can check the format of the existing documents in Azure Blob Storage by looking at the file extension. For example, if the file extension is .csv, then the format is CSV. If the file extension is .parquet, then the format is Parquet.

If the existing documents in the Blob storage show the type as Block Blob, but it doesn't show the format, you can assume that the format is binary.

To match your new documents to the same format as the existing ones, you need to specify the format in the dataset properties of your Azure Data Factory pipeline. You can specify the format under the format property of the dataset.

Here is an example of how to specify the format as CSV in the dataset properties:

{
    "name": "MyBlobStorageDataset",
    "properties": {
        "structure": [
            {
                "name": "Column1",
                "type": "String"
            },
            {
                "name": "Column2",
                "type": "String"
            }
        ],
        "published": false,
        "type": "AzureBlob",
        "linkedServiceName": "MyBlobStorageLinkedService",
        "typeProperties": {
            "fileName": "*.csv",
            "folderPath": "myfolder",
            "format": {
                "type": "TextFormat",
                "columnDelimiter": ",",
                "rowDelimiter": "\n",
                "quoteChar": "\"",
                "escapeChar": "\"",
                "nullValue": "\\N",
                "encodingName": "UTF-8"
            }
        },
        "availability": {
            "frequency": "Day",
            "interval": 1
        }
    }
}

For more details, refer to Supported file formats and compression codecs by copy activity in Azure Data Factory and Azure Synapse pipelines and Copy and transform data in Azure Blob Storage by using Azure Data Factory or Azure Synapse Analytics.

Hope this helps. Do let us know if you any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Answer 2

Gowtham CP 6,020 Volunteer Moderator

Hello Reema DSouza ,

Thanks for reaching out in the Microsoft Q&A!

To identify the format of existing documents in Blob Storage, you can download a sample for inspection or utilize Azure Storage Explorer. Once you've determined the format (e.g., CSV, JSON), ensure that the format matches for new documents. In Azure Data Factory, you can configure the sink dataset format to align with the existing documents, specifying it in the Copy Activity settings. This ensures seamless compatibility between the old and new documents stored in Azure Blob Storage. For detailed steps, you can refer to Microsoft's documentation on copying data from SQL Server to Blob Storage. If you find this helpful, please accept this answer to close the thread. Thanks!

Reema DSouza 40 Reputation points

2024-05-24T14:48:53.7166667+00:00

I am not looking at format as in .csv or .pdf. I am looking for the format to be selected in the screenshot above.

The documents are in a Azure Managed Instance of a SQL Server table with datatype varbinary. I need to move it to Azure Blob storage.
Gowtham CP 6,020 Reputation points Volunteer Moderator

2024-05-24T16:11:06.9433333+00:00

No worries about the format selection in Data Factory, Reema. Blob Storage treats data as raw bytes, just like your SQL Server data (varbinary). It'll transfer as-is. You can add a file extension later (.bin for example) but that's just for naming. The key is for the application reading the data from Blob Storage to understand its structure. They might need some extra work to figure it out!
Gowtham CP 6,020 Reputation points Volunteer Moderator

2024-05-27T04:49:53.35+00:00

Hello Reema DSouza ,

We haven't heard back from you. Could you please provide an update on your issue? If you have any questions or concerns, feel free to reach out to us. If the information provided has been helpful, please consider marking it as the accepted answer to close this case. Doing so will assist others with similar questions in finding solutions more easily.

Share via

Dataset format needed for azure blob storage

1 additional answer

Your answer