Bug - Data Factory Copy Data activity, when source is a Cosmos Dataset it is remembering schema

Brent Leslie 381 Reputation points

This is a clearly reproducible bug.

  1. Create a Cosmos SQL API Linked Service that passes through parameters for it's settings, like so:
  2. Create a Cosmos Dataset, that uses the Linked Service, with parameters passed through to it, like so:
  3. Create a new pipeline with a Copy data activity. Set up parameters for source and destination tables, like so:
  4. In the sink, set it to a SQL database and set up destination. Add a pre-copy script which will drop the table first:
  5. Do not specify any mappings. Then we can re-use this for any Cosmos extracts. Click the "Clear" button to make sure there are no mappings:
  6. Run the pipeline, extracting data to a new table. The table will be dropped before extraction occurs, then new rows created based on whatever is in the source. All works a-OK.
  7. Add a column to your new SQL table
  8. Rerun the the pipeline. You will now get an error that the new column added is not in the source. This is indicating that our schema is being saved, when in fact, we have purposefully not saved any schema and it should work based solely on what is in the source.

Verify this is a bug by:

  1. Go to the Cosmos Dataset. Click into the Schema and hit "Clear".
  2. Go back to the Copy Data activity on your pipeline, go to the mapping, hit "Clear"
  3. Save All
  4. Run the pipeline again - pipeline run is now successful.

The bug is that the the Copy Data activity or the Cosmos Dataset (most likely) is remembering schema and refusing to map even if no mappings are set. Anyone think of a work around?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
6,336 questions
{count} votes

Accepted answer
  1. Brent Leslie 381 Reputation points

    OK, I actually figured out a workaround. Drop the destination table before running the Copy Data activity (which it should do anyway, but obviously doesn't). So I create a Lookup activity, set the dynamic values to the destination server, then run a query with some dynamic settings, the Dynamic Query value looks like:

    @{concat('DROP TABLE IF EXISTS [',pipeline().parameters.dest_schema,'].[',pipeline().parameters.dest_table,']; SELECT 1 as Col')}


    Would much rather this extra step doesn't need to be done, anyone on the Data Factory team confirm this is a bug and will be fixed?

    No comments

0 additional answers

Sort by: Most helpful