Azure Synapse SQL Serverless Virtual Data Warehouse

Kieran Wood 111 Reputation points
2021-12-23T20:32:58.51+00:00

Hi,
I have many GBs of data in a data lake in partitioned parque files. I am using Azure Synapse SQL Serverless to expose this data lake to Power BI authors via virtual data warehouse defined in T-SQL for the data exploration phase of a data analytics project.

What is the best means of exposing this data lake data within SQL Serverless is it External Tables or Views?

Initially I thought of using External Tables. However basically I have to use varchar for the vast majority of the columns since data types such as Date are not available.

So now I am creating views and Casting the relevant data types when exposing these views to Power BI.

Does anyone have any thoughts on this?

Kieran Wood , Microsoft Certified Data Engineer and Solutions Expert in Data Analytics

http://uk.linkedin.com/in/kieranpatrickwood

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,373 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Kieran Wood 111 Reputation points
    2021-12-28T13:59:13.827+00:00

    After further investigation if when creating the parquet file using Synapse Pipelines I use the mapping type column for example as follows ...

    160951-image.png

    I can use most of the data types when defining and External table using parquet files as a data source ....

    DROP EXTERNAL TABLE [dbo].[NYT_Fact]  
    GO  
    CREATE EXTERNAL TABLE [dbo].[NYT_Fact] (  
     VendorID tinyint,  
     tpep_pickup_datetime DATE,  
     tpep_dropoff_datetime DATE,  
     RatecodeID  tinyint,  
     PULocationID  smallint,  
     DOLocationID  smallint,  
     payment_type  tinyint,  
     passenger_count tinyint,  
     trip_distance  DECIMAL (8,2),  
     fare_amount  DECIMAL (8,2),  
     extra DECIMAL (8,2),  
     mta_tax  DECIMAL (8,2),  
     tip_amount  DECIMAL (8,2),  
     tolls_amount  DECIMAL (8,2),  
     improvement_surcharge  DECIMAL (8,2),  
     total_amount  DECIMAL (10,2),  
     congestion_surcharge  DECIMAL (8,2)  
     )  
     WITH (  
     LOCATION = 'Parquet/NYT/**',  
     DATA_SOURCE = [synfilesys_woodk_dfs_core_windows_net],  
     FILE_FORMAT = [SynapseParquetFormat]  
     )  
    GO  
    SELECT * FROM  [dbo].[NYT_Fact]  
    GO  
      
      
      
    

    The main exception I have found so far is Boolean.

    Kieran Wood , Microsoft Certified Data Engineer and Solutions Expert in Data Analytics

    http://uk.linkedin.com/in/kieranpatrickwood
    www.dataplatformservices.com


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.