Events
Mar 31, 11 PM - Apr 2, 11 PM
The biggest SQL, Fabric and Power BI learning event. March 31 – April 2. Use code FABINSIDER to save $400.
Register todayThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Applies to:
Azure Synapse Analytics
Azure Synapse Pathway's goal is to preserve the functional intent of the original code while optimizing for Synapse SQL. Synapse Pathway uses a three-stage process for translating SQL code from a source system to Azure Synapse SQL.
Each of the stages preserves and augments the knowledge of the source including source-specific metadata to ensure the highest quality in translation.
SQL Language parsing is a problem that has been solved many times over. There are many commercial and open-source parsers that help with the underlying process of taking a source statement, breaking it down into logical tokens and then executing against a set or parser rules to ensure language consistency.
Synapse Pathway defines source grammars that allow the tool to identify and process the SQL input into an augmented Abstract Syntax Tree (AST) that is used in further processing.
Synapse Pathway defines a common representation of all objects in an augmented Abstract Syntax Tree (AST). The Pathway AST includes metadata from other statements or fragments to assist in the proper conversion of a statement.
By not just tracking that a token is a function but rather the source system type requirement, the script generation components can make smarter decisions about translating to Synapse SQL.
For example, the source function for the absolute function is defined as:
ABS( float_expression )
Azure Synapse SQL defines the absolute function as:
ABS ( numeric_expression )
In this simple case, Synapse Pathway understands that the conversion in Synapse SQL from float to numeric is an implicit conversion and requires no further type casting. Simple, clean, and effective code translation.
Keeping this meta-information about the source statements and fragments helps the structural differences between platforms – conversions in opt-out logic for search condition predicates in a WHERE clause for example.
The last stage of the process is to generate syntax for Synapse SQL. Using the AST structure generated from the source files, Synapse Pathway writes each DDL object to an individual file. The syntax generators use in-depth knowledge of the target platform to optimize statements.
For example, a common pattern that is seen in data loading scenarios is to first delete all of the contents in a staging table and then load the data from another staging table in an INSERT/SELECT fashion.
DELETE staging.table1 ALL;
INSERT INTO staging.table1…
FROM staging.table2;
Synapse SQL has an optimized path for this scenario – a CREATE TABLE AS SELECT. The CTAS statement is a batch-based operation and minimally logged driving high performance by using all the compute infrastructure available. Without this insight about Synapse SQL, tools often produce a truncate and INSERT/SELECT statement.
TRUNCATE TABLE staging.table1;
INSERT INTO staging.table1…
FROM staging.table2;
While not bad, this code can be optimized to a DROP TABLE and CTAS to have higher performance.
DROP TABLE staging.table1;
CREATE TABLE staging.table1
WITH
(
-- Derived from the original table definition
DISTRIBUTION = HASH(column1),
-- Derived from the original table definition
CLUSTERED COLUMNSTORE INDEX
)
AS SELECT * FROM staging.table2;
Events
Mar 31, 11 PM - Apr 2, 11 PM
The biggest SQL, Fabric and Power BI learning event. March 31 – April 2. Use code FABINSIDER to save $400.
Register todayTraining
Learning path
Transfer and transform data with Azure Synapse Analytics pipelines - Training
Learn how to use Azure Synapse Analytics pipelines to automate and orchestrate data transfer and transformation activities.
Certification
Microsoft Certified: Azure Data Engineer Associate - Certifications
Demonstrate understanding of common data engineering tasks to implement and manage data engineering workloads on Microsoft Azure, using a number of Azure services.