Strange behaviour using stringify in synapse pipeline dataflow

RickMsBi 86 Reputation points
2023-06-15T11:41:12.5733333+00:00

Hi all,

We encountered an issue with Azure pipelines. We are using a dataflow task to convert some json files to parquet.

We've chosen to keep some complete json objects in the parquet files, so we can, at a later stage unroll them when needed.

The dataflow has 3 steps, source, stringify (for the array(s)) and a step to write to a parquet file.

In the source we defined the array with all the fields, including the field with strange behaviour. To keep things simple, we kept every field defined as a str.

So that looks like this in the dataflow (json code).

jaarloon as string

And, as you can see, the input for the stringify also shows string.

User's image

We convert about 700k json files, and all work well, except for one where the column contains this value.

                    "jaarloon": -14809425.49,

The column in the resulting parquet file that contains the stringified array suddenly has this value:

                "jaarloon": "-1.480942549E7"

This is weird behaviour, as I specifically told to handle this field as a string. So the engine is doing interpreting of some kind.

Any clues? Or is this a bug?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,373 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.