Best way to read from MongoDB change streams from within Azure

Jaime Becker 20 Reputation points
2024-11-21T18:06:15.1766667+00:00

Hello! We have an application that needs events from our MongoDB change stream and insert the change event into Data Lake storage. We cannot use MongoDB Triggers to call an Azure Function because we have no way of authenticating the Trigger (we have been denied using any sort of passwords, etc in MongoDB). Thus, we need a way, from inside azure, to listen to the MongoDB change streams and put the events into json files in our data lake.

Ideally, we'd like to use event driven architecture for this.

I know Data Factory has a MongoDB connector, but this is only for copying data from the DB, not for getting the change events.

Any suggestions on this? Thanks!

Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,904 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,637 questions
{count} votes

Accepted answer
  1. Pinaki Ghatak 5,600 Reputation points Microsoft Employee Volunteer Moderator
    2024-11-25T13:09:40.91+00:00

    Hello @Jaime Beckner

    The MongoDB connector in Azure Data Factory is not designed to capture change events from MongoDB. However, you can use Azure Event Grid to create an event-driven architecture for this scenario. You can create an Azure Function that listens to the MongoDB change stream and publishes the change events to an Azure Event Grid topic.

    Then, you can create an Azure Event Grid subscription that routes the events to an Azure Data Lake Storage Gen1 account.

    Here are the high-level steps to implement this solution:

    1. Create an Azure Function that listens to the MongoDB change stream and publishes the change events to an Azure Event Grid topic. You can use the MongoDB driver for Node.js to listen to the change stream. Here's an example of how to use the driver to listen to the change stream:
    const MongoClient = require('mongodb').MongoClient; 
    const client = new MongoClient(''); 
    async function listenToChangeStream() 
    { 
    	await client.connect(); 
    	const db = client.db(''); 
    	const collection = db.collection(''); 
    	const changeStream = collection.watch(); 
    	changeStream.on('change', (change) => { 
    	// Publish the change event to Azure Event Grid }); } 
    	listenToChangeStream();  
    }
    
    1. Create an Azure Event Grid topic and subscribe to it using an Azure Function that writes the events to Data Lake Storage Gen1. You can use the Azure Event Grid trigger for Azure Functions to listen to the events. Here's an example of how to use the trigger to write the events to Data Lake Storage Gen1:
    module.exports = async function (context, eventGridEvent) { 
    	const eventData = eventGridEvent.data; // Write the event data to Data Lake Storage Gen1 		context.log(eventData); 
    };
    
    1. Configure the Azure Event Grid subscription to route the events to the Azure Function that writes the events to Data Lake Storage Gen1. You can use the Azure portal or Azure CLI to create the subscription. Here's an example of how to create the subscription using Azure CLI:
    az eventgrid event-subscription create \ --name \ --source-resource-id \ --endpoint-type webhook \ --endpoint
    

    I hope that this response has addressed your query and helped you overcome your challenges. If so, please mark this response as Answered. This will not only acknowledge our efforts, but also assist other community members who may be looking for similar solutions.

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. hossein jalilian 11,055 Reputation points Volunteer Moderator
    2024-11-21T18:59:32.81+00:00

    Hello Jaime Becker,

    Thanks for posting your question in the Microsoft Q&A forum.

    The best approach to read from MongoDB change streams within Azure and store the events in Data Lake storage would be to use Azure Functions with a custom implementation.

    Create an Azure Function App, implement the MongoDB change stream listener in C# or Python.

    Use the official MongoDB driver for .NET or Python, connect to your MongoDB instance securely using connection strings stored in Azure Key Vault.

    As the function receives change events, process them and convert them to JSON format, use the Azure Data Lake Storage Gen2 SDK to write the JSON files to your Data Lake storage


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.