Azure data factory mapping data flow does not compress in sink

Yoon, Sojin 70 Reputation points
2023-07-04T13:59:29.5333333+00:00

Hello,

I am trying to read data from a database and write it into ADLS gen2 using the Mapping data flow in Data Factory.

It is a fairly simple flow, consisting of 2 steps. 'Inline JSON' was selected as the inline dataset type and in 'settings' tab, the compression was configured.

User's image

The compression type is set as gzip and the filename pattern also has the '.gz' at the end.

concat($param_df_adls_dest,'', toString(toTimestamp($param_df_p_timestamp, 'yyyy-MM-dd'T'HH:mm:ss'), 'yyyyMMddHHmmss'),'.000000[n]','.json.gz")

User's image

Even with this config, the file that are being written into the blob storage is a json file (the name of the file is xxxx.json.gz), instead of being a compressed json file.

I tried different ways, but still get the same result.

Does anyone know what I might be doing wrong? Thanks.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,540 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,211 questions
0 comments No comments
{count} votes

Accepted answer
  1. QuantumCache 20,346 Reputation points
    2023-07-05T17:57:51.1866667+00:00

    Hello @Yoon, Sojin

    How does the source data format looks like? any sample data?
    To resolve this issue, you can try the following:

    Check compression settings: Double-check the compression settings in your Mapping data flow to make sure that they are set correctly. Make sure that the compression type is set to gzip and that the filename pattern includes the '.gz' extension.

    Check data format: Make sure that the data being written to ADLS Gen2 is in a format that can be compressed, such as JSON or CSV. If the data is not in the correct format, you may need to transform it before writing it to ADLS Gen2.

    Check pipeline configuration: Make sure that the pipeline is configured correctly and that all settings are set correctly. Check the input and output datasets to make sure that they are configured correctly and that the compression settings are being applied correctly.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.