Edit

Share via


Learn how to create a Copy job in Data Factory for Microsoft Fabric

The Copy job in Data Factory makes it easy to move data from your source to your destination without creating a pipeline. You can set up data transfers using built-in patterns for both batch and incremental copy, and copy once or on a schedule. Follow the steps in this article to start copying your data either from a database or from storage.

Create a Copy job to ingest data from a database

Follow these steps to set up a Copy job that moves data from a database:

  1. Create a new workspace or use an existing workspace.

  2. Select + New Item, choose the Copy job icon, name your Copy job, and select Create.

    Screenshot showing where to navigate to the Data Factory home page and create a new Copy job.

  3. Choose the database to copy data from. In this example, we're using Azure SQL DB.

    Screenshot showing where to choose a data source for the Copy job.

  4. For Azure SQL DB enter your server path and credentials. You can copy data securely within a virtual network environment using on-premises or virtual network gateway. For other databases, the connection details will vary.

    Screenshot showing where to enter credentials.

  5. Select the tables and columns to copy. Use the search box to identify specific tables and columns you want to copy.

    Screenshot showing where to select tables and columns for the Copy job.

  6. Select your destination store. In this example, we're using another Azure SQL DB.

    Screenshot showing where to select the destination store for the Copy job.

  7. (Optional) Choose Update method to decide how data gets written to your destination. If you pick Merge, select the Key columns that identify each row.

    Screenshot showing where to update method.

    Screenshot showing how to update method.

  8. (Optional) Configure table or column mapping to rename tables or columns in the destination, or apply data type conversions. By default, data is copied with the same table name, column name, and data type as the source.

    Screenshot showing where to specify table mappings.

    Screenshot showing where to specify column mappings.

  9. Choose a copy mode: Full data copy or Incremental copy. In this example, we use Incremental copy. Choose an Incremental column for each table, to track which rows have changed. You can use the preview button to find the right column. For more information about that column, see: Incremental column.

    Note

    When you choose incremental copy mode, Copy job initially performs a full load and performs incremental copies in the next runs.

    Screenshot showing where to select the Copy job mode.

  10. Review the job summary, select your run option to run once or on a schedule, and select Save + Run.

    Screenshot showing where to review and save the newly created Copy job.

  11. Your Copy job will start immediately, and you can track the job's status from the inline monitoring panel that has information including row counts and copy duration for each table. Learn more in How to monitor a Copy job

    Screenshot showing the Copy job panel where you can monitor run history.

  12. You can run your Copy job whenever you want, even if it's set to run on a schedule. Just select the Run button at any time, and Copy job copies only the data that's changed since the last run.

  13. You can also edit your Copy job at any time, including adding or removing tables and columns to be copied, configuring the schedule, or adjusting advanced settings. Some changes, such as updating the incremental column, will reset the incremental copy to start from an initial full load in the next run.

    Screenshot showing how to edit Copy job.

Create a Copy job to ingest files from a storage

Follow these steps to set up a Copy job that moves data from file storage:

  1. Create a new workspace or use an existing workspace.

  2. Select + New Item, choose the Copy job icon, name your Copy job, and select Create.

    Screenshot showing where to navigate to the Data Factory home page and create a new Copy job.

  3. Choose the data stores to copy data from. In this example, we used Azure Data Lake Storage Gen2.

    Screenshot showing where to choose a storage source for the Copy job.

  4. To connect to Azure Data Lake Storage Gen2, enter your Storage url and Credentials to connect to Azure Data Lake Storage Gen2. For other data stores, the connection details will vary. You can copy data securely within a virtual network environment using on-premises or virtual network gateway.

    Screenshot showing where to enter credentials for storage store.

  5. Select the folder or files to copy. You can choose to copy an entire folder with all its files, or a single file.

    Tip

    Schema agnostic (binary copy) copies files to another data store without parsing the schema. This can significantly improve copy performance.

    Screenshot showing where to select folder for the Copy job.

  6. Select your destination store. In this example, we chose Lakehouse.

    Screenshot showing where to select the storage destination store for the Copy job.

  7. Select the Folder path in your destination storage. Choose Preserve Hierarchy to maintain the same folder structure as the source, or Flatten Hierarchy to place all files in a single folder.

    Screenshot showing how to select destination folder.

  8. Choose a copy mode: Full data copy or Incremental copy. In this example, we use Incremental copy so that the Copy job will copy all files on the first run, and then copy only new or updated files in the next runs.

    Screenshot showing where to select the Copy job mode for storage.

  9. Review the job summary, select your run option to run once or on a schedule, and select Save + Run.

    Screenshot showing where to review and save the newly created Copy job for storage.

  10. Your Copy job will start immediately, and you can track the job's status from the inline monitoring panel that has information including row counts and copy duration for each table. Learn more in How to monitor a Copy job

    Screenshot showing the Copy job panel where you can monitor run history for moving data between storage.

  11. You can re-run your Copy job whenever you want, even if it's set to run on a schedule. Just select the Run button at any time, and Copy job copies only the data that's changed since the last run.

  12. You can also edit your Copy job at any time, including configuring the schedule, or adjusting advanced settings.

    Screenshot showing how to edit Copy job for storage store.

Known limitations

  • Currently, incremental copy mode only works with some sources. For details, see supported connectors for Copy job.
  • Row deletion can't be captured from a source store.
  • When copying files to storage locations, empty files will be created at the destination if no data is loaded from the source.