Azure Synapse Link for Dataverse FAQ

This article provides information on frequently asked questions about exporting Microsoft Dataverse table data to Azure Synapse Analytics and Azure Data Lake.

Can I manually perform tasks such as creating, updating, deleting, or setting autodelete policies for data files in the connected Azure storage?

Data files shouldn't be modified by a customer and no customer files should be placed in the data folders.

Note

To drop stale and stagnant data in the data lake without breaking the Azure Synapse Link, consider using the feature Query and analyze the incremental updates

How can I access my table relationships?

To access many-to-many relationships, the relationship is available as a table to select from the Add tables page for a new link and from the Manage tables for a pre-existing link.

Note

All relationships data is in Append-only mode by default when written in CSV format.

Azure Synapse Link is a free feature with Dataverse. Utilizing Azure Synapse Link for Dataverse doesn't incur additional charges under Dataverse. However, consider potential costs for the Azure service:

What happens when I add a column?

When you add a new column to a table in the source, it's also added at the end of the file in the destination in the corresponding file partition. While the rows that existed prior to the addition of the column won't show the new column, new or updated rows will show the newly added column.

What happens when I delete a column?

When you delete a column from a table in the source, the column isn't dropped from the destination. Instead, the rows are no longer updated and are marked as null while preserving the previous rows.

What happens if I change the data type of a column?

Changing the data type of a column is a breaking change and you'll need to unlink and relink.

What happens when I delete a row?

Deleting a row is handled differently based on which data write options you choose:

  • In-place update with CSV format: This is the default mode. When you delete a table row in this mode, the row is also deleted from the corresponding data partition in the Azure Data Lake. In other words, data is hard deleted from the destination.
  • Append-only with CSV format and incremental folder update: In this mode, when a Dataverse table row is deleted, it isn't hard deleted from the destination. Instead, a row is added and set as isDeleted=True to the file in the corresponding data partition in Azure Data Lake.
  • Export to Delta lake format: Azure Synapse Link performs a soft delete on data during the next delta synchronization cycle, followed by a hard delete after 30 days.

Why am I not seeing a column header in the exported file?

Azure Synapse Link follows the Common Data Model to make it possible for data and its meaning to be shared across applications and business processes such as Microsoft Power Apps, Power BI, Dynamics 365, and Azure. In each CDM folder, metadata like a column header is stored in the model.json file. More information: Common Data Model and Azure Data Lake Storage Gen2 | Microsoft Learn

Why does the Model.json file increase or change in length for the data types and doesn't keep what is defined in Dataverse?

Model.json keeps the database length for the column's size. Dataverse has a concept of database length for each column. If you create a column with a size of 200 and later reduce it to 100, Dataverse still allows your existing data to be present in Dataverse. It does that by keeping DBLength to 200 and MaxLength to 100. What you see in Model.json is DBLength and if you use that for downstream processes you'll never provision lesser space for your Dataverse columns.

Note

Memo fields are defined as varchar(max) with a default maximum length of 9999.

What date and time formats can be expected in exported Dataverse tables?

There are three date and time formats that can be expected in the exported Dataverse tables.

Column Name Format Data Type Example
SinkCreatedOn and SinkModifiedOn M/d/yyyy H:mm:ss tt datetime 6/28/2021 4:34:35 PM
CreatedOn yyyy-MM-dd'T'HH:mm:ss.sssssssXXX datetimeOffset 2018-05-25T16:21:09.0000000+00:00
All Other Columns yyyy-MM-dd'T'HH:mm:ss'Z' datetime 2021-06-25T16:21:12Z

Note

CreatedOn data type changed from datetime to datetimeOffset on 07/29/2022. To edit the data type format for a table created before the change, drop and readd the table.

You can choose different column behaviors for a Date and Time column in Dataverse, which updates the data type format. More information: Behavior and format of the Date and Time column

Why am I seeing 1.csv or 1_001.csv file names instead of regular date time partitioned file names for some Dataverse tables?

This behavior is expected when you choose append-only export mode and have tables without a valid CreatedOn column. Blobs are organized into files such as 1.csv, 2.csv (employing custom partitioning due to the absence of a valid creation date). When any partition approaches 95% of the MaxBlockPerBlobLimit, the system automatically generates a new file—illustrated here as 1_001.csv.

When should I use a yearly or monthly partition strategy?

For Dataverse tables where data volume is high within a year, we recommend you use monthly partitions. Doing so results in smaller files and better performance. Additionally, if the rows in Dataverse tables are updated frequently, splitting into multiple smaller files helps improve performance in the case of in-place update scenarios. Delta Lake is only available with yearly partition due to its superior performance compared to CSV format.

What is append only mode and what is the difference between append only and in-place update mode?

In append only mode, incremental data from Dataverse tables are appended to the corresponding file partition in the lake. For more information: Advanced Configuration Options in Azure Synapse Link

When do I use append only mode for a historical view of changes?

Append only mode is the recommended option for writing Dataverse table data to the lake, especially when the data volumes are high within a partition with frequently changing data. Again, this is a commonly used and highly recommended option for enterprise customers. Additionally, you can choose to use this mode for scenarios where the intent is to incrementally review changes from Dataverse and process the changes for ETL, AI, and ML scenarios. Append only mode provides a history of changes, instead of the latest change or in place update, and enables several time series from AI scenarios, such as prediction or forecasting analytics based on historical values.

How do I retrieve the most up-to-date row of each record and exclude deleted rows when I export data in append only mode?

In append only mode, you should identify the latest version of record with the same ID using VersionNumber and SinkModifiedOn then apply isDeleted=0 on the latest version.

Why do I see duplicated version numbers when I export data using append only mode?

For append only mode, if Azure Synapse Link for Dataverse doesn't get an acknowledgment from the Azure data lake that the data has been committed due to any reason such as network delays, Azure Synapse Link will retry in those scenarios and commit the data again. The downstream consumption should be made resilient to this scenario by filtering data using SinkModifiedOn.

Why am I seeing differences in Sinkmodifiedon and Modifiedon columns?

It's expected. Modifiedon is the datetime that record being changed in Dataverse; Sinkmodifiedon is the date and time that record is modified in the data lake.

Which Dataverse tables aren't supported for export?

Any table that doesn't have change tracking enabled won't be supported in addition to following system tables:

  • Attachment
  • Calendar
  • Calendarrule

Note

You can add the audit table for export using Azure Synapse Link for Dataverse. However, the export of the audit table is only supported with Delta Lake profiles.

I'm using export to delta lake feature, can I stop the Apache Spark job or change the execution time?

The Delta Lake conversion job is triggered when there was a data change in the configured time interval. There's no option to stop or pause the Apache Spark pool. However, you can modify the time interval after the link creation under Manage tables > Advanced Time interval.

Calculated columns are only supported when the lookup field is located within the same table. Data updates occur only when change tracking is triggered: Lookup values will change on the root tables only when the root table records are changed. To better reflect the value of a lookup field, it's recommended to join with the original table to get the latest value.

Which Dataverse tables use append only mode by default?

All tables that don't have a createdOn field will be synced using append only mode by default. This includes relationship tables as well as the ActivityParty table.

Why do I see the error message - Content of directory on path can't be listed?

  • Dataverse data is stored in the connected storage container. You need the "Storage Blob Data Contributor" role in the linked storage account to perform read and query operations through Synapse Workspace.
  • If you choose to export data with Delta Lake format, your CSV file is cleaned up after the Delta Lake conversion. You need to query data with non_partitioned tables through Synapse Workspace.

Why do I see the error message - can't bulk load because the file is incomplete or couldn't be read (CSV file only)?

Dataverse data can continuously change through creating, updating, and deleting transactions. This error is caused by the underlying file being changed when you read data from it. So, for tables with continuous changes, change your consumption pipeline to use snapshot data (partitioned tables) to consume. More information: Troubleshoot serverless SQL pool

Azure Synapse Link for Dataverse is designed for analytics purposes. We recommend customers use long-term-retention for archive purposes. More information: Dataverse long term data retention overview