Issue with Large Numbers Changing in ADF Data Flow Source (CSV vs Excel)

Ichhwak Malhare 0 Reputation points
2025-04-23T17:26:13.67+00:00

Hi Microsoft Support,

I'm experiencing a data accuracy issue when working with large numbers in Azure Data Factory (ADF) Data Flows.

For example:

  • In my file, the number is: 123456789012345000
  • But when I open the same value in DATAFLOW/, it becomes: 123456789012344992

And when I use this file as a source in ADF Data Flow, the number appears to have been altered or rounded.

Could you help me understand:

Why is the number changing when processed through ADF?

How does ADF interpret numeric values from Excel or CSV sources?

  1. What is the correct way to configure the source so that large numbers like this are preserved exactly (e.g., treating them as strings)?

I’d appreciate a detailed explanation or guidance to ensure numeric precision is maintained.

Thanks in advance!

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,495 questions
{count} votes

1 answer

Sort by: Most helpful
  1. J N S S Kasyap 2,125 Reputation points Microsoft External Staff Moderator
    2025-04-23T18:39:55.75+00:00

    Hi @Ichhwak Malhare

    Why is the number changing when processed through ADF?

    The number changes due to floating-point precision limitations. ADF often interprets large numbers (e.g., 123456789012345000) as double (64-bit floating-point), which supports only ~15-17 significant digits. This leads to rounding or approximation errors (e.g., 123456789012344992). Automatic type inference in Data Flows may cast large numbers to double instead of preserving them as strings, causing precision loss.

    How does ADF interpret numeric values from Excel or CSV sources?

     CSV: ADF infers data types by scanning the first 200 rows. Numbers without quotes (e.g., 123456789012345000) are treated as double or long, leading to precision loss for large numbers. Quoted numbers (e.g., "123456789012345000") can be treated as strings if configured correctly.
     Excel: Excel stores numbers as IEEE 754 double-precision (15-digit limit). If a number is stored as a number, it may already be rounded in Excel. If formatted as text (e.g., '123456789012345000), ADF can read it as a string if the schema is set to String. ADF maps source types to Spark types (e.g., DoubleType, StringType) based on schema inference or explicit definitions.

    What is the correct way to configure the source so that large numbers like this are preserved exactly

     CSV:

    1. Enclose numbers in quotes in the CSV (e.g., "123456789012345000").
    2. In the dataset, set the column type to String in the schema.
    3. In the Data Flow source, ensure the projection maps the column to String. Use a Derived Column transformation (toString(columnName)) if needed.
    4. Disable schema drift and validate the schema in source options.

    Excel:

    1. Format the column as Text in Excel or prefix numbers with a single quote (e.g., '123456789012345000).
    2. In the dataset, set the column type to String in the schema.
    3. In the Data Flow source, confirm the projection maps the column to String. Add a Derived Column transformation (toString(columnName)) if necessary.
    4. Specify the correct sheet and range and disable schema drift.

    I hope this information helps. Please do let us know if you have any further queries.

    Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.

    Thank you.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.