Ingest data into your Warehouse using Transact-SQL
Applies to: Warehouse in Microsoft Fabric
The Transact-SQL language offers options you can use to load data at scale from existing tables in your lakehouse and warehouse into new tables in your warehouse. These options are convenient if you need to create new versions of a table with aggregated data, versions of tables with a subset of the rows, or to create a table as a result of a complex query. Let's explore some examples.
Creating a new table with the result of a query by using CREATE TABLE AS SELECT (CTAS)
The CREATE TABLE AS SELECT (CTAS) statement allows you to create a new table in your warehouse from the output of a
SELECT statement. It runs the ingestion operation into the new table in parallel, making it highly efficient for data transformation and creation of new tables in your workspace.
The examples in this article use the Bing COVID-19 sample dataset. To load the sample dataset, follow the steps in Ingest data into your Warehouse using the COPY statement to create the sample data into your warehouse.
The first example illustrates how to create a new table that is a copy of the existing
dbo.[bing_covid-19_data_2023] table, but filtered to data from the year 2023 only:
CREATE TABLE [dbo].[bing_covid-19_data_2023] AS SELECT * FROM [dbo].[bing_covid-19_data] WHERE DATEPART(YEAR,[updated]) = '2023';
You can also create a new table with new
dayofmonth columns, with values obtained from
updated column in the source table. This can be useful if you're trying to visualize infection data by year, or to see months when the most COVID-19 cases are observed:
CREATE TABLE [dbo].[bing_covid-19_data_with_year_month_day] AS SELECT DATEPART(YEAR,[updated]) [year], DATEPART(MONTH,[updated]) [month], DATEPART(DAY,[updated]) [dayofmonth], * FROM [dbo].[bing_covid-19_data];
As another example, you can create a new table that summarizes the number of cases observed in each month, regardless of the year, to evaluate how seasonality affects spread in a given country/region. It uses the table created in the previous example with the new
month column as a source:
CREATE TABLE [dbo].[infections_by_month] AS SELECT [country_region],[month], SUM(CAST(confirmed as bigint)) [confirmed_sum] FROM [dbo].[bing_covid-19_data_with_year_month_day] GROUP BY [country_region],[month];
Based on this new table, we can see that the United States observed more confirmed cases across all years in the month of
January, followed by
April is the month with the lowest number of cases overall:
SELECT * FROM [dbo].[infections_by_month] WHERE [country_region] = 'United States' ORDER BY [confirmed_sum] DESC;
For more examples and syntax reference, see CREATE TABLE AS SELECT (Transact-SQL).
Ingesting data into existing tables with T-SQL queries
The previous examples create new tables based on the result of a query. To replicate the examples but on existing tables, the INSERT...SELECT pattern can be used. For example, the following code ingests new data into an existing table:
INSERT INTO [dbo].[bing_covid-19_data_2023] SELECT * FROM [dbo].[bing_covid-19_data] WHERE [updated] > '2023-02-28';
The query criteria for the
SELECT statement can be any valid query, as long as the resulting query column types align with the columns on the destination table. If column names are specified and include only a subset of the columns from the destination table, all other columns are loaded as
NULL. For more information, see Using INSERT INTO...SELECT to Bulk Import data with minimal logging and parallelism.
Ingesting data from tables on different warehouses and lakehouses
For both CREATE TABLE AS SELECT and INSERT...SELECT, the
SELECT statement can also reference tables on warehouses that are different from the warehouse where your destination table is stored, by using cross-warehouse queries. This can be achieved by using the three-part naming convention
[warehouse_or_lakehouse_name.][schema_name.]table_name. For example, suppose you have the following workspace assets:
- A lakehouse named
cases_lakehousewith the latest case data.
- A warehouse named
reference_warehousewith tables used for reference data.
- A warehouse named
research_warehousewhere the destination table is created.
A new table can be created that uses three-part naming to combine data from tables on these workspace assets:
CREATE TABLE [research_warehouse].[dbo].[cases_by_continent] AS SELECT FROM [cases_lakehouse].[dbo].[bing_covid-19_data] cases INNER JOIN [reference_warehouse].[dbo].[bing_covid-19_data] reference ON cases.[iso3] = reference.[countrycode];
To learn more about cross-warehouse queries, see Write a cross-database SQL Query.