Can I load data from MongoDB incrementally with ADF ?

WeirdMan 340 Reputation points
2024-10-19T20:58:13.5233333+00:00

I want to load data from MongoDB incrementally to Azure storage with ADF.

The link in the documentation doesn't provide this information.

It talks about the connector but not how to load data in an incremental way.

https://learn.microsoft.com/en-us/azure/data-factory/connector-mongodb?tabs=data-factory

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,624 questions
0 comments No comments
{count} votes

Accepted answer
  1. Amira Bedhiafi 33,071 Reputation points Volunteer Moderator
    2024-10-19T20:59:47.1933333+00:00

    Yes, you can load data from MongoDB incrementally usingADF , but you'll need to manage the incremental load process based on certain conditions, such as a timestamp or an incrementing field in your MongoDB collection.

    1. Identify Incremental Field:
      • MongoDB doesn't inherently track changes, so you need a field in the documents that indicates when a document was created or updated. This is usually a timestamp field (e.g., createdAt or updatedAt) or a numeric field (e.g., an auto-incrementing ID).
    2. Create a Source Dataset:
      • Create a dataset in ADF for MongoDB that points to your collection.
    3. Define Query for Incremental Data:
      • In the Source of your ADF pipeline, you can define a filter query to load only the new or modified data since the last load. For example:
         
         { "updatedAt": { "$gt": "last_loaded_timestamp" } }
         
      
      • Replace "last_loaded_timestamp" with the value of the last loaded record's timestamp, which you'll need to store in a metadata store (like an Azure SQL Database, blob, or another storage).
    4. Set Up Control Flow (Lookup/Stored Last Processed Timestamp):
      • Use the Lookup activity in ADF to retrieve the last loaded timestamp from your metadata store (e.g., Azure SQL or blob storage). This timestamp will be used in the MongoDB filter query to fetch only new or updated records.
    5. Copy Activity:
      • In the Copy Activity, use the query with the updatedAt condition to only load the incremental data from MongoDB to Azure storage. This could be Azure Blob Storage, Azure Data Lake, etc.
    6. Update the Last Loaded Timestamp:
      • After the load, update the stored timestamp to the maximum updatedAt value from the latest batch of data. This can be done with another Lookup or Stored Procedure Activity to update the timestamp in the metadata store for future incremental loads.

    Here is a link with details that may help you :

    https://stackoverflow.com/questions/76654844/load-mongodb-data-incrementally-through-azure-data-factory

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.