Hi @Ichhwak Malhare
Why is the number changing when processed through ADF?
The number changes due to floating-point precision limitations
. ADF often interprets large numbers (e.g., 123456789012345000) as double (64-bit floating-point), which supports only ~15-17 significant digits. This leads to rounding or approximation errors (e.g., 123456789012344992). Automatic type inference in Data Flows may cast large numbers to double instead of preserving them as strings, causing precision loss.
How does ADF interpret numeric values from Excel or CSV sources?
CSV:
ADF infers data types by scanning the first 200 rows. Numbers without quotes (e.g., 123456789012345000) are treated as double or long, leading to precision loss for large numbers. Quoted numbers (e.g., "123456789012345000") can be treated as strings if configured correctly.
Excel:
Excel stores numbers as IEEE 754 double-precision (15-digit limit). If a number is stored as a number, it may already be rounded in Excel. If formatted as text (e.g., '123456789012345000), ADF can read it as a string if the schema is set to String. ADF maps source types to Spark types (e.g., DoubleType, StringType) based on schema inference or explicit definitions.
What is the correct way to configure the source so that large numbers like this are preserved exactly
CSV:
- Enclose numbers in quotes in the CSV (e.g., "123456789012345000").
- In the dataset, set the column type to String in the schema.
- In the Data Flow source, ensure the projection maps the column to String. Use a
Derived Column
transformation(toString(columnName))
if needed. - Disable schema drift and validate the schema in source options.
Excel:
- Format the column as
Text
in Excel or prefix numbers with a single quote (e.g., '123456789012345000). - In the dataset, set the column type to String in the schema.
- In the Data Flow source, confirm the projection maps the column to String. Add a
Derived Column
transformation(toString(columnName))
if necessary. - Specify the correct sheet and range and disable
schema drift.
I hope this information helps. Please do let us know if you have any further queries.
Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.
Thank you.