Logic for consolidation of incremental DynamicsF&O Records in Azure Storage

Haris Aqeel 0 Reputation points
2024-07-10T07:43:36.5166667+00:00

We are using Power Apps with Azure Synapse Jobs to extract data from Dynamics 365 and incrementally push it into Azure Storage. The data syncing works well, but we are encountering issues with data consolidation.

We have an Azure Function App set on a time trigger, which runs hourly. Instead of using a blob trigger, we list all timestamp folders containing a model.json file and maintain their status in an Azure Table, marked as "Processed" or "Unprocessed". Each hour, the function consolidates data from the unprocessed folders into a target container in Azure Storage, appending new data to avoid recalculating all timestamp records.

However, we are experiencing issues with missed records. Below is the detailed logic we are using:

Identify Root-Level Folders with model.json:

  • Traverse through the Azure Blob Storage to identify folders at the root level that contain a model.json file.
    • Confirm the existence of model.json and its size is greater than 0 bytes to ensure it's not empty, which may indicate readiness of data in that folder for processing.
    Verify Timestamp and Processing Status:
    - Extract the timestamp from the folder's name using a regex pattern.
    
       - Consult the Azure Table (ProcessingStatus) to determine whether the timestamp has already been processed.
    
          - To ensure data is ready for consolidation, check that the `model.json` file's modified time is within one hour of the folder's creation time. This aligns with the hourly cycle of our Azure Function and ensures data has not been prematurely flagged for processing.
    
          **Data Consolidation for Unprocessed Folders**:
    
             - For each folder marked as unprocessed:
    
                   - Aggregate CSV files by scanning each unprocessed folder and collecting data.
    
                         - Load these files into pandas DataFrames and concatenate them based on similar schemas or table names.
    
                               - Append this new consolidated data to existing datasets in the target container while ensuring no duplicates (assuming uniqueness in the first column).
    
                                     - After consolidation, upload the updated DataFrame back to the target Azure Blob container.
    
                                     **Update Processing Status**:
    
                                        - Post consolidation, update the processing status in the Azure Table to "Processed" with the current timestamp, indicating that no further actions are needed for this batch in subsequent runs.
    

We are wondering whether to implement additional checks for the model.json file size and modification time to improve data readiness verification. Specifically, should we:

  • Only read timestamp folders if the model.json file is larger than 0 bytes?
  • Consider a timestamp folder for processing only if the model.json file's modified time is equal to or greater than one hour from the folder's creation time, matching the hourly trigger of our Power Apps function?
Azure Storage Explorer
Azure Storage Explorer
An Azure tool that is used to manage cloud storage resources on Windows, macOS, and Linux.
241 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,646 questions
Microsoft Power Platform Training
Microsoft Power Platform Training
Microsoft Power Platform: An integrated set of Microsoft business intelligence services.Training: Instruction to develop new skills.
209 questions
Microsoft Dataverse Training
Microsoft Dataverse Training
Microsoft Dataverse: A Microsoft service that enables secure storage and management of data used by business apps. Previously known as Common Data Service.Training: Instruction to develop new skills.
15 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Pinaki Ghatak 2,875 Reputation points Microsoft Employee
    2024-07-10T09:29:00.18+00:00

    Hello @Haris Aqeel

    Based on the detailed logic you provided, it seems like you are already performing several checks to ensure data readiness verification.

    Regarding your first question, it is a good practice to check the model.json file size and ensure that it is greater than 0 bytes to ensure that it is not empty, which may indicate readiness of data in that folder for processing. This can help in ensuring data completeness and prevent the issue of missing records.

    Regarding your second question, it is also a good practice to consider a timestamp folder for processing only if the model.json file's modified time is equal to or greater than one hour from the folder's creation time, matching the specified time (1 hour) configured in the Azure Synapse jobs in Power Apps.

    This can help in ensuring that the data is ready for consolidation and prevent the issue of prematurely flagged data for processing.

    Overall, your detailed logic seems to be well thought out and covers most of the necessary checks for data readiness verification.

    However, it is always a good practice to continuously review and improve your logic to ensure data completeness and prevent any issues with missing records.


    I hope that this response has addressed your query and helped you overcome your challenges. If so, please mark this response as Answered. This will not only acknowledge our efforts, but also assist other community members who may be looking for similar solutions.

    0 comments No comments