Configure MongoDB in a copy activity
This article outlines how to use the copy activity in data pipelines to copy data from and to MongoDB.
Supported configuration
For the configuration of each tab under copy activity, go to the following sections respectively.
General
Refer to the General settings guidance to configure the General settings tab.
Source
Go to Source tab to configure your copy activity source. See the following content for the detailed configuration.
The following properties are required:
- Data store type: Select External.
- Connection: Select a MongoDB connection from the connection list. If no connection exists, then create a new MongoDB connection by selecting New.
- Database: Select your database from the drop-down list.
- Collection name: Specify the name of the collection in MongoDB database. You can select the collection from the drop-down list or select Edit to enter it manually.
Under Advanced, you can specify the following fields:
- Filter: Specifies selection filter using query operators. To return all documents in a collection, omit this parameter or pass an empty document ({}).
- Cursor methods: Select + New to specify the way that the underlying query is executed. The ways to execute query are:
- project: Specifies the fields to return in the documents for projection. To return all fields in the matching documents, omit this parameter.
- sort: Specifies the order in which the query returns matching documents. Go to cursor.sort() for more information.
- limit: Specifies the maximum number of documents the server returns. Go to cursor.limit() for more information.
- skip: Specifies the number of documents to skip and from where MongoDB begins to return results. Go to cursor.skip() for more information.
- Batch size: Specifies the number of documents to return in each batch of the response from MongoDB instance. In most cases, modifying the batch size will not affect the user or the application.
- Additional columns: Add additional data columns to store source files' relative path or static value. Expression is supported for the latter.
Destination
Go to Destination tab to configure your copy activity destination. See the following content for the detailed configuration.
The following properties are required:
- Data store type: Select External.
- Connection: Select a MongoDB connection from the connection list. If no connection exists, then create a new MongoDB connection by selecting New.
- Database: Select your database from the drop-down list.
- Collection name: Specify the name of the collection in MongoDB database. You can select the collection from the drop-down list or select Edit to enter it manually.
Under Advanced, you can specify the following fields:
Write behavior: Describes how to write data to MongoDB. Allowed values: Insert and Upsert.
The behavior of Upsert is to replace the document if a document with the same
_id
already exists; otherwise, insert the document.Note
The service automatically generates an
_id
for a document if an_id
isn't specified either in the original document or by column mapping. This means that you must ensure that, for Upsert to work as expected, your document has an ID.Write batch timeout: Specify the wait time for the batch insert operation to finish before it times out. The allowed value is timespan.
Write batch size: This property controls the size of documents to write in each batch. You can try increasing the value to improve performance and decreasing the value if your document size being large.
Mapping
For Mapping tab configuration, see Configure your mappings under mapping tab. Mapping is not supported when both source and destination are hierarchical data.
Settings
For Settings tab configuration, go to Configure your other settings under settings tab.
Table summary
The following table contains more information about the copy activity in MongoDB.
Source information
Name | Description | Value | Required | JSON script property |
---|---|---|---|---|
Data store type | Your data store type. | External | Yes | / |
Connection | Your connection to the source data store. | < your MongoDB connection > | Yes | connection |
Database | Your database that you use as source. | < your database > | Yes | database |
Collection name | Name of the collection in MongoDB database. | < your collection > | Yes | collection |
Filter | The selection filter using query operators. To return all documents in a collection, omit this parameter or pass an empty document ({}). | < your selection filter > | No | filter |
Cursor methods | The way that the underlying query is executed. | • project • sort • limit • skip |
No | cursorMethods: • project • sort • limit • skip |
Batch size | The number of documents to return in each batch of the response from MongoDB instance. | < your write batch size > (the default is 100) |
No | batchSize |
Additional columns | Add additional data columns to store source files' relative path or static value. Expression is supported for the latter. | • Name • Value |
No | additionalColumns: • name • value |
Destination information
Name | Description | Value | Required | JSON script property |
---|---|---|---|---|
Data store type | Your data store type. | External | Yes | / |
Connection | Your connection to the destination data store. | < your MongoDB connection > | Yes | connection |
Database | Your database that you use as destination. | < your database > | Yes | database |
Collection name | Name of the collection in MongoDB database. | < your collection > | Yes | collection |
Write behavior | Describes how to write data to MongoDB. Allowed values: Insert and Upsert. The behavior of Upsert is to replace the document if a document with the same _id already exists; otherwise, insert the document.Note: The service automatically generates an _id for a document if an _id isn't specified either in the original document or by column mapping. This means that you must ensure that, for Upsert to work as expected, your document has an ID. |
• Insert (default) • Upsert |
No | writeBehavior: • insert • upsert |
Write batch timeout | The wait time for the batch insert operation to finish before it times out. | timespan (the default is 00:30:00 - 30 minutes) |
No | writeBatchTimeout |
Write batch size | Controls the size of documents to write in each batch. You can try increasing this value to improve performance and decreasing the value if your document size being large. | < your write batch size > | No | writeBatchSize |