An Azure service for ingesting, preparing, and transforming data at scale.
Hello Dhruv Singla,
Thanks for raising this question in Q&A forum.
The Get Metadata activity in ADF returning string instead of integer for Excel columns is a known limitation, not a bug in your configuration.
Why This Happens
The Get Metadata activity uses the first few rows of the file to sample and infer data types. For Excel files, the connector reads cell values as generic strings by default because Excel internally stores cell content with formatting metadata that ADF doesn't fully evaluate during schema inference. This is particularly common when:
The column header row is text-based
Cells have mixed formatting (e.g., text-formatted number cells)
- The Excel connector defaults to treating values as strings to avoid data loss
Workarounds
1. Manually Define Schema in Dataset Instead of relying on inferred types, explicitly define the schema in your Excel dataset:
Open the dataset → Go to the Schema tab
Click Import schema or manually add columns with correct types
Set the column type to Int32 or Int64 for integer columns
Once schema is set in the dataset, Get Metadata will respect it.stackoverflow+1
2. Use a Lookup Activity Instead For more reliable type inference, replace Get Metadata with a Lookup activity using a query to fetch the data and then reference @activity('Lookup1').output.firstRow:
json
@activity('Lookup1').output.firstRow.YourColumnName
3. Use Data Flow with Schema Inference For more accurate schema detection, use Mapping Data Flows with the source configured to infer schema from the Excel file:
- Data Flows use Apache Spark for inference, which is significantly more accurate for data types compared to Get Metadata
4. Type Cast After Get Metadata If you must use Get Metadata, cast the string value to integer downstream using an expression:
json
@int(activity('GetMetadata1').output.structure[0].type)
Or handle the type conversion in a subsequent Copy or Data Flow activity.
Best Practice
For Excel specifically, the recommended approach is to define the schema explicitly in the dataset rather than relying on inference, as Excel is not a self-describing format like Parquet or Avro. This gives you full control over data types and avoids surprises in downstream pipeline activities.
If this helps accept the answer.
Best Regards,
Jerald Felix