This article provides information on frequently asked questions about exporting Microsoft Dataverse table data to Azure Synapse Analytics and Azure Data Lake.
Can I manually perform tasks such as creating, updating, deleting, or setting retention policies for data files in the connected Azure storage?
Data files should not be modified by a customer and no customer files should be placed in the data folders.
Note
To drop stale and stagnant data in the data lake without breaking the Synapse Link, you should consider using the feature Query and analyze the incremental updates
Can I link a Synapse workspace to an existing Synapse Link profile with data lake only?
Yes. In your web browser's address bar, append ?athena.updateLake=true
to the web address that ends with exporttodatalake. When you select an existing profile from the Synapse Link homepage, you'll see a new action in the extended option Link to Azure Synapse Analytics Workspace.
How can I access my table relationships?
To access many-to-many relationships, the relationship will be available as a table to select from the Add tables page for a new link and from the Manage tables for a preexisting link.
What happens when I add a column?
When you add a new column to a table in the source, it is also added at the end of the file in the destination in the corresponding file partition. While the rows that existed prior to the addition of the column won't show the new column, new or updated rows will show the newly added column.
What happens when I delete a column?
When you delete a column from a table in the source, the column is not dropped from the destination. Instead, the rows are no longer updated and are marked as null while preserving the previous rows.
What happens if I change the data type of a column?
Changing the data type of a column is a breaking change and you will need to unlink and relink.
What happens when I delete a row?
Deleting a row is handled differently based on which data write options you choose:
- In-place update: This is the default mode and when you delete a table row in this mode, the row is also deleted from the corresponding data partition in the Azure Data Lake. In other words, data is hard deleted from the destination.
- Append-only: In this mode, when a Dataverse table row is deleted, it is not hard deleted from the destination. Instead, a row is added and set as isDeleted=True to the file in the corresponding data partition in Azure Data Lake.
Why does the Model.json file increase or change in length for the data types and doesn't keep what is defined in Dataverse?
Model.json keeps the database length for the column's size. Dataverse has a concept of database length for each column. If you create a column with a size of 200 and later reduce it to 100, Dataverse still allows your existing data to be present in Dataverse. It does that by keeping DBLength to 200 and MaxLength to 100. What you see in Model.json is DBLength and if you use that for downstream processes you will never provision lesser space for your Dataverse columns.
What date and time formats can be expected in exported Dataverse tables?
There are three date and time formats that can be expected in the exported Dataverse tables.
Column Name | Format | Data Type | Example |
---|---|---|---|
SinkCreatedOn and SinkModifiedOn | M/d/yyyy H:mm:ss tt | datetime | 6/28/2021 4:34:35 PM |
CreatedOn | yyyy-MM-dd'T'HH:mm:ss.sssssssXXX | datetimeOffset | 2018-05-25T16:21:09.0000000+00:00 |
All Other Columns | yyyy-MM-dd'T'HH:mm:ss'Z' | datetime | 2021-06-25T16:21:12Z |
Note
CreatedOn data type changed from datetime to datetimeOffset on 07/29/2022. To edit the data type format for a table created before the change, drop and re-add the table.
You can choose different column behaviors for a Date and Time column in Dataverse, which updates the data type format. More information: Behavior and format of the Date and Time column
When should I use a yearly or monthly partition strategy?
For Dataverse tables where data volume is high within a year, we recommend you use monthly partitions. Doing so results in smaller files and better performance. Additionally, if the rows in Dataverse tables are updated frequently, splitting into multiple smaller files help improve performance in the case of in-place update scenarios.
What is append only mode and what is the difference between append only and in-place update mode?
In append only mode, incremental data from Dataverse tables are appended to the corresponding file partition in the lake. For more information: Advanced Configuration Options in Azure Synapse Link
When do I use append only mode for a historical view of changes?
Append only mode is the recommended option for writing Dataverse table data to the lake, especially when the data volumes are high within a partition with frequently changing data. Again, this is a commonly used and highly recommended option for enterprise customers. Additionally, you can choose to use this mode for scenarios where the intent is to incrementally review changes from Dataverse and process the changes for ETL, AI, and ML scenarios. Append only mode provides a history of changes, instead of the latest change or in place update, and enables several time series from AI scenarios, such as prediction or forecasting analytics based on historical values.
How do I retrieve the most up-to-date row of each record and exclude deleted rows when I export data in append only mode?
In append only mode, you should identify the latest version of record with the same ID using VersionNumber and SinkModifiedOn then apply isDeleted=0 on the latest version.
Why do I see duplicated version numbers when I export data using append only mode?
For append only mode, if Azure Synapse Link for Dataverse does not get an acknowledgement from the Azure data lake that the data has been committed due to any reason such as network delays, Synapse Link will retry in those scenarios and commit the data again. The downstream consumption should be made resilient to this scenario by filtering data using SinkModifiedOn.
Which Dataverse tables are not supported for export?
Any table that does not have change tracking enabled will not be supported in addition to following system tables:
- Attachment
- Calendar
- Calendarrule
Does Azure Synapse Link support calculated columns?
Calculated columns are only supported when the lookup field is located within the same table.
Which Dataverse tables use append only mode by default?
All tables that do not have a createdOn field will be synced using append only mode by default. This includes relationship tables as well as the ActivityParty table.
Why does Azure Synapse Link for Dataverse require all resources to be in the same region and what can I do about it?
To ensure high performance and low latency in addition to preventing egress charges, Synapse Link requires all resources to be located in the same region. If you have a cross-region scenario, you can:
- Move the Azure resources to the same region as the environment.
- Move the environment to the same region as the Azure resources by contacting Microsoft customer support.
- Enable Read Access - Geo Redundant Storage (RA-GRS) on the Azure Data Lake to replicate Azure data to a near by region. More information: Read access to data in the secondary region
- Use Azure Synapse pipelines or Azure Data Factory to copy data from Azure resources in one region to Azure resources in another.
Why do I see the error message - cannot bulk load because the file is incomplete or could not be read?
Dataverse data can continuously change through creating, updating, and deleting transactions. This error is caused by the underlying file being changed when you are reading data from it. So, for tables with continuous changes, you should change your consumption pipeline to use snapshot data (partitioned tables) to consume. More information: Create an Azure Synapse Link for Dataverse with your Azure Synapse Workspace