Strange behaviour using stringify in synapse pipeline dataflow
Hi all,
We encountered an issue with Azure pipelines. We are using a dataflow task to convert some json files to parquet.
We've chosen to keep some complete json objects in the parquet files, so we can, at a later stage unroll them when needed.
The dataflow has 3 steps, source, stringify (for the array(s)) and a step to write to a parquet file.
In the source we defined the array with all the fields, including the field with strange behaviour. To keep things simple, we kept every field defined as a str.
So that looks like this in the dataflow (json code).
jaarloon as string
And, as you can see, the input for the stringify also shows string.
We convert about 700k json files, and all work well, except for one where the column contains this value.
"jaarloon": -14809425.49,
The column in the resulting parquet file that contains the stringified array suddenly has this value:
"jaarloon": "-1.480942549E7"
This is weird behaviour, as I specifically told to handle this field as a string. So the engine is doing interpreting of some kind.
Any clues? Or is this a bug?