Question 1

Can I manually perform tasks such as creating, updating, deleting, or setting autodelete policies for data files in the connected Azure storage?

Accepted Answer

Data files shouldn't be modified by a customer and no customer files should be placed in the data folders.

Note

To drop stale and stagnant data in the data lake without breaking the Azure Synapse Link, consider using the feature Query and analyze the incremental updates

Question 2

How can I access my table relationships?

Accepted Answer

To access many-to-many relationships, the relationship is available as a table to select from the Add tables page for a new link and from the Manage tables for a pre-existing link.

Note

All relationships data is in Append-only mode by default when written in CSV format.

Question 3

How can I get estimated costs before adding Azure Synapse Link?

Accepted Answer

Azure Synapse Link is a free feature with Dataverse. Utilizing Azure Synapse Link for Dataverse doesn't incur additional charges under Dataverse. However, consider potential costs for the Azure service:

Data storage in Azure Data Lake Storage Gen2: Azure Storage Data Lake Gen2 Pricing | Microsoft Azure
Data consumption cost (such as Synapse Workspace): Pricing - Azure Synapse Analytics | Microsoft Azure For comprehensive Microsoft Cost Management information, go to: Plan to manage Azure costs - Microsoft Cost Management | Microsoft Learn

Question 4

What happens when I add a column?

Accepted Answer

When you add a new column to a table in the source, it's also added at the end of the file in the destination in the corresponding file partition. While the rows that existed prior to the addition of the column won't show the new column, new or updated rows will show the newly added column.

Question 5

What happens when I delete a column?

Accepted Answer

When you delete a column from a table in the source, the column isn't dropped from the destination. Instead, the rows are no longer updated and are marked as null while preserving the previous rows.

Question 6

What happens if I change the data type of a column?

Accepted Answer

Changing the data type of a column is a breaking change and you'll need to unlink and relink.

Question 7

What happens when I delete a row?

Accepted Answer

Deleting a row is handled differently based on which data write options you choose:

In-place update with CSV format: This is the default mode. When you delete a table row in this mode, the row is also deleted from the corresponding data partition in the Azure Data Lake. In other words, data is hard deleted from the destination.
Append-only with CSV format and incremental folder update: In this mode, when a Dataverse table row is deleted, it isn't hard deleted from the destination. Instead, a row is added and set as isDeleted=True to the file in the corresponding data partition in Azure Data Lake.
Export to Delta lake format: Azure Synapse Link performs a soft delete on data during the next delta synchronization cycle, followed by a hard delete after 30 days.

Question 8

Why am I not seeing a column header in the exported file?

Accepted Answer

Azure Synapse Link follows the Common Data Model to make it possible for data and its meaning to be shared across applications and business processes such as Microsoft Power Apps, Power BI, Dynamics 365, and Azure. In each CDM folder, metadata like a column header is stored in the model.json file. More information: Common Data Model and Azure Data Lake Storage Gen2 | Microsoft Learn

Question 9

Why does the Model.json file increase or change in length for the data types and doesn't keep what is defined in Dataverse?

Accepted Answer

Model.json keeps the database length for the column's size. Dataverse has a concept of database length for each column. If you create a column with a size of 200 and later reduce it to 100, Dataverse still allows your existing data to be present in Dataverse. It does that by keeping DBLength to 200 and MaxLength to 100. What you see in Model.json is DBLength and if you use that for downstream processes you'll never provision lesser space for your Dataverse columns.

Note

Memo fields are defined as varchar(max) with a default maximum length of 9999.

Note

Memo fields are defined as varchar(max) with a default maximum length of 9999.

Question 10

What date and time formats can be expected in exported Dataverse tables?

Accepted Answer

There are three date and time formats that can be expected in the exported Dataverse tables.

Column Name	Format	Data Type	Example
SinkCreatedOn and SinkModifiedOn	M/d/yyyy H:mm:ss tt	datetime	6/28/2021 4:34:35 PM
CreatedOn	yyyy-MM-dd'T'HH:mm:ss.sssssssXXX	datetimeOffset	2018-05-25T16:21:09.0000000+00:00
All Other Columns	yyyy-MM-dd'T'HH:mm:ss'Z'	datetime	2021-06-25T16:21:12Z

Note

CreatedOn data type changed from datetime to datetimeOffset on 07/29/2022. To edit the data type format for a table created before the change, drop and readd the table.

You can choose different column behaviors for a Date and Time column in Dataverse, which updates the data type format. More information: Behavior and format of the Date and Time column

Question 11

Why am I seeing 1.csv or 1_001.csv file names instead of regular date time partitioned file names for some Dataverse tables?

Accepted Answer

This behavior is expected when you choose append-only export mode and have tables without a valid CreatedOn column. Blobs are organized into files such as 1.csv, 2.csv (employing custom partitioning due to the absence of a valid creation date). When any partition approaches 95% of the MaxBlockPerBlobLimit, the system automatically generates a new file—illustrated here as 1_001.csv.

Question 12

When should I use a yearly or monthly partition strategy?

Accepted Answer

For Dataverse tables where data volume is high within a year, we recommend you use monthly partitions. Doing so results in smaller files and better performance. Additionally, if the rows in Dataverse tables are updated frequently, splitting into multiple smaller files helps improve performance in the case of in-place update scenarios. Delta Lake is only available with yearly partition due to its superior performance compared to CSV format.

Question 13

What is append only mode and what is the difference between append only and in-place update mode?

Accepted Answer

In append only mode, incremental data from Dataverse tables are appended to the corresponding file partition in the lake. For more information: Advanced Configuration Options in Azure Synapse Link

Question 14

When do I use append only mode for a historical view of changes?

Accepted Answer

Append only mode is the recommended option for writing Dataverse table data to the lake, especially when the data volumes are high within a partition with frequently changing data. Again, this is a commonly used and highly recommended option for enterprise customers. Additionally, you can choose to use this mode for scenarios where the intent is to incrementally review changes from Dataverse and process the changes for ETL, AI, and ML scenarios. Append only mode provides a history of changes, instead of the latest change or in place update, and enables several time series from AI scenarios, such as prediction or forecasting analytics based on historical values.

Question 15

How do I retrieve the most up-to-date row of each record and exclude deleted rows when I export data in append only mode?

Accepted Answer

In append only mode, you should identify the latest version of record with the same ID using VersionNumber and SinkModifiedOn then apply isDeleted=0 on the latest version.

Question 16

Why do I see duplicated version numbers when I export data using append only mode?

Accepted Answer

For append only mode, if Azure Synapse Link for Dataverse doesn't get an acknowledgment from the Azure data lake that the data has been committed due to any reason such as network delays, Azure Synapse Link will retry in those scenarios and commit the data again. The downstream consumption should be made resilient to this scenario by filtering data using SinkModifiedOn.

Question 17

Why am I seeing differences in Sinkmodifiedon and Modifiedon columns?

Accepted Answer

It's expected. Modifiedon is the datetime that record being changed in Dataverse; Sinkmodifiedon is the date and time that record is modified in the data lake.

Question 18

Which Dataverse tables aren't supported for export?

Accepted Answer

Any table that doesn't have change tracking enabled won't be supported in addition to following system tables:

Attachment
Calendar
Calendarrule

Note

You can add the audit table for export using Azure Synapse Link for Dataverse. However, the export of the audit table is only supported with Delta Lake profiles.

Question 19

I'm using export to delta lake feature, can I stop the Apache Spark job or change the execution time?

Accepted Answer

The Delta Lake conversion job is triggered when there was a data change in the configured time interval. There's no option to stop or pause the Apache Spark pool. However, you can modify the time interval after the link creation under Manage tables > Advanced Time interval.

Question 20

Does Azure Synapse Link support calculated columns?

Accepted Answer

Calculated columns are only supported when the lookup field is located within the same table. Data updates occur only when change tracking is triggered: Lookup values will change on the root tables only when the root table records are changed. To better reflect the value of a lookup field, it's recommended to join with the original table to get the latest value.

Question 21

Which Dataverse tables use append only mode by default?

Accepted Answer

All tables that don't have a createdOn field will be synced using append only mode by default. This includes relationship tables as well as the ActivityParty table.

Question 22

Why do I see the error message - Content of directory on path can't be listed?

Accepted Answer

Dataverse data is stored in the connected storage container. You need the "Storage Blob Data Contributor" role in the linked storage account to perform read and query operations through Synapse Workspace.
If you choose to export data with Delta Lake format, your CSV file is cleaned up after the Delta Lake conversion. You need to query data with non_partitioned tables through Synapse Workspace.

Question 23

Why do I see the error message - can't bulk load because the file is incomplete or couldn't be read (CSV file only)?

Accepted Answer

Dataverse data can continuously change through creating, updating, and deleting transactions. This error is caused by the underlying file being changed when you read data from it. So, for tables with continuous changes, change your consumption pipeline to use snapshot data (partitioned tables) to consume. More information: Troubleshoot serverless SQL pool

Question 24

How can I use Azure Synapse Link to archive critical data?

Accepted Answer

Azure Synapse Link for Dataverse is designed for analytics purposes. We recommend customers use long-term-retention for archive purposes. More information: Dataverse long term data retention overview

Azure Synapse Link for Dataverse FAQ

Can I manually perform tasks such as creating, updating, deleting, or setting autodelete policies for data files in the connected Azure storage?

How can I access my table relationships?

How can I get estimated costs before adding Azure Synapse Link?

What happens when I add a column?

What happens when I delete a column?

What happens if I change the data type of a column?

What happens when I delete a row?

Why am I not seeing a column header in the exported file?

Why does the Model.json file increase or change in length for the data types and doesn't keep what is defined in Dataverse?

What date and time formats can be expected in exported Dataverse tables?

Why am I seeing 1.csv or 1_001.csv file names instead of regular date time partitioned file names for some Dataverse tables?

When should I use a yearly or monthly partition strategy?

What is append only mode and what is the difference between append only and in-place update mode?

When do I use append only mode for a historical view of changes?

How do I retrieve the most up-to-date row of each record and exclude deleted rows when I export data in append only mode?

Why do I see duplicated version numbers when I export data using append only mode?

Why am I seeing differences in Sinkmodifiedon and Modifiedon columns?

Which Dataverse tables aren't supported for export?

I'm using export to delta lake feature, can I stop the Apache Spark job or change the execution time?

Does Azure Synapse Link support calculated columns?

Which Dataverse tables use append only mode by default?

Why do I see the error message - Content of directory on path can't be listed?

Why do I see the error message - can't bulk load because the file is incomplete or couldn't be read (CSV file only)?

How can I use Azure Synapse Link to archive critical data?

See also

Feedback

Additional resources