Extract data from Business Central
A common problem that organizations face is how to gather data from multiple sources, in multiple formats. Then you'd need to move it to one or more data stores. The destination might not be the same type of data store as the source. Often the format is different, or the data needs to be shaped or cleaned before loading it into its final destination.
Various tools, services, and processes have been developed over the years to help address these challenges. No matter the process used, there's a common need to coordinate the work and apply some level of data transformation within the data pipeline. The following sections highlight the common methods used to perform these tasks.
Extract, transform, and load (ETL) process
Extract, transform, and load (ETL) is a data pipeline used to collect data from various sources. It then transforms the data according to business rules, and it loads the data into a destination data store. The transformation work in ETL takes place in a specialized engine, and it often involves using staging tables to temporarily hold data as it is being transformed and ultimately loaded to its destination.
When establishing a data lake or a data warehouse, you typically need to do two types of data extraction:
- A historical load (all data from a given point-in-time)
- Delta loads (what's changed since the historical load)
Getting a historical load
The fastest (and least disruptive) way to get a historical load from Business Central online is to get a database export as a BACPAC file (using the Business Central admin center) and restore it in Azure SQL Database or on a SQL Server. For on-premises installations, you can just take a backup of the tenant database.
Getting delta loads
The fastest (and least disruptive) way to get delta loads from Business Central online is to set up API queries configured with read-scaleout and use the data audit field LastModifiedOn (introduced in version 17.0) to filter only the data that was changed/added since the last time you read data.
Also consider this pattern: For each company, start by determining which tables that you load data from have new data. You can do this in an AL codeunit that you expose as a web service endpoint. Based on the result of this initial call, only call the APIs that you know will return data. This practice reduces the number of API calls you need to issue for your incremental loads. The impact of this optimization depends on the distribution of new data since last load.
Note
Delta links were deprecated in Business Central 2023 release wave 2 (v23)and in 2024 release wave 1 (v24). We recommend using webhooks as a replacement. For more information, see Deprecated Features in the Platform.
Tools for reading data
For Business Central on-premises, you can just read directly from the environment database. This option isn't available for Business Central online, so establishing this type of integration could block you from migrating from on-premises to the online version.
For Business Central online, the only supported option for reading data is using APIs. The APIs can be either the standard APIs that are shipped with Business Central or custom APIs that you build in Visual Studio Code and ship as an extension).
There exists a code sample that you may use to be more productive with APIs: API generator. For more information, see API query generator
Throughput of data reads with APIs
Measurements have shown that it isn't unusual to be able to read 2 MB/sec per API call. This rate means that it's possible to transfer up to 120 MB/min or 7200 MB/hour for pipelines running sequentially.
In Business Central online, the current parallelism for API calls is 5. This value means that you can read up to 35 GB/hour if no other processes call web services on Business Central. For an ETL setup that updates the staging area nightly with yesterday's changes, this rate should fit most maintenance windows (unless the environment has several GB of new/updated data per day).
Even if the ETL setup reads the historical dataset before switching to incremental loads, with a 2 MB/sec throughput, it takes up to one day to load a 100-GB database. Also, this operation is something you only do once.
Stability of ETL pipelines
No matter which tool you choose, you must make your data pipelines robust towards timeouts and design them so that they are rerunnable. All Business Central tables have system fields SystemRowversion and SystemLastModifiedOn. For more information about system fields, go to About system fields. If your ETL setup tracks which watermark (either a date or a rowversion) was read last time, then data pipelines can utilize this behavior to read changes since the watermark.
ETL tools
You can use your ETL tool of choice to read and transform data. Read about some popular options in the sections that follow.
SQL Server Integration Services (SSIS)
For Business Central on-premises, one popular tool of choice is SQL Server Integration Services (SSIS) that is shipped with SQL Server. It's possible to autogenerate SSIS packages from a metadata store using script tasks or a tool such as BIML. For more information about these tools, see About BIML and About SSIS.
Azure Data Factory
For Business Central online, you can use a tool such as Azure Data Factory to read and transform data. Azure Data Factory is a managed cloud service that is built for complex hybrid extract-transform-load (ETL), extract-load-transform (ELT), and data integration projects. For more information about Azure Data Factory, see About ADF
You can use the Azure Data Factory OData connector with service principal authentication to extract data from Business Central. For more information, see Azure Data Factory OData connector and How-to connect Business Central to Azure.
Power BI dataflows
It's also possible to use Power BI dataflows for your extract pipelines. With Power BI dataflows, you can connect to Business Central APIs and utilize incremental refresh to only load data that was changed since last refresh. For more information about, see About incremental refresh and About Power BI dataflows.
bc2adls code sample (unsupported)
Another (unsupported) option is to use the bc2adls code sample to transfer data from the Business Central Server directly to an Azure Data Lake Storage (ADLS) data lake. For more information about bc2adls, see About bc2adls on GitHub.com.