If the source allows querying (for example SQL Database or a similar interface), use a query to truncate the value to the first 4,000 characters before passing it to the copy activity.
SELECT SUBSTRING(your_column, 1, 4000) AS truncated_column, other_columns FROM your_table
In the Mapping tab of the Copy Activity:
- Map the problematic column (
your_column
) to a new computed column or directly to the target column. - For transformation, ADF Copy Activity doesn't natively support substring functions. However, you can manage truncation by:
- Writing an inline query for intermediate staging.
- Pre-process the Parquet file.
Or if your source is Parquet and no query can be directly applied, use a Script Activity or an Azure Function:
import pandas as pd
# Read Parquet
df = pd.read_parquet('path_to_file.parquet')
# Truncate the column
df['your_column'] = df['your_column'].str[:4000]
# Save back to Parquet or CSV
df.to_parquet('output_path.parquet', index=False)
If you prefer a SQL-based approach:
- Load the data into a staging table with
VARCHAR(MAX)
or a pre-defined column length. - Use an
INSERT INTO SELECT
query to truncate the value during the transfer to the final table:
INSERT INTO final_table (column1, truncated_column, ...)
SELECT column1, LEFT(your_column, 4000), ...
FROM staging_table;
If your pipeline includes Synapse Analytics, execute a stored procedure that processes the data after copying it to a staging table:
CREATE PROCEDURE TruncateLargeColumn
AS
BEGIN
INSERT INTO final_table
SELECT column1, LEFT(large_column, 4000), ...
FROM staging_table;
END