Synapse dedicated sqlpool partition switch: error trying to get the partition number based on a value or boundary

Miguel Buenestado 41 Reputation points
2022-09-12T13:39:40.753+00:00

I am trying partition switching in a dedicated SQL pool in Synapse.

Both tables, staging (temp) and destiny (final), have in common the same partition distribution, based on monthly cuts (e.g., 20190101, 20190201, 20190301 and so on).
I have already got the data in the staging table, ready for the partition to be swapped.
The staging table will vary through each data load: today you could have a month worth of data, tomorrow you may get two months of data...

Which means that the switch criteria will be dynamic.

Thus, I must identify which partitions need to be switched, that is, the partition numbers based on the boundary values given in the staging table
To do so, I was trying the following query,

SELECT DISTINCT s.[name]                        AS      [schema_name]  
,           t.[name]                        AS      [table_name]  
,           i.[name]                        AS      [index_name]  
,           p.[partition_number]            AS      [partition_number]  
--,           SUM(a.[used_pages]*8.0)         AS      [partition_size_kb]  
--,           SUM(a.[used_pages]*8.0)/1024    AS      [partition_size_mb]  
--,           SUM(a.[used_pages]*8.0)/1048576 AS      [partition_size_gb]  
,           p.[rows]                        AS      [partition_row_count]  
,           rv.[value]                      AS      [partition_boundary_value]  
,           p.[data_compression_desc]       AS      [partition_compression_desc]  
FROM sys.schemas s  
INNER JOIN  sys.tables t                    ON      t.[schema_id]         = s.[schema_id]  
INNER JOIN  sys.partitions p                ON      p.[object_id]         = t.[object_id]  
--JOIN        sys.allocation_units a          ON      a.[container_id]      = p.[partition_id]  
INNER JOIN  sys.indexes i                   ON      i.[object_id]         = p.[object_id]  
AND     i.[index_id]          = p.[index_id]  
INNER JOIN  sys.data_spaces ds              ON      ds.[data_space_id]    = i.[data_space_id]  
INNER JOIN  sys.partition_schemes ps        ON      ps.[data_space_id]    = ds.[data_space_id]  
INNER JOIN  sys.partition_functions pf      ON      pf.[function_id]      = ps.[function_id]  
INNER JOIN  sys.partition_range_values rv   ON      rv.[function_id]      = pf.[function_id]  
AND     rv.[boundary_id]      = p.[partition_number]  
  
          
WHERE       p.[index_id] <=1 AND  
t.[name] = 'STAGING_XXXX' AND  
cast(rv.[value] as nvarchar(128)) IN (  
SELECT  DISTINCT d_boundary_key AS partition_key  
FROM DIM_DATE dat  
INNER JOIN STAGING_XXXX stg  
ON dat.c_date_id = stg.c_date_id  
)  
GROUP BY    s.[name]  
,           t.[name]  
,           i.[name]  
,           p.[partition_number]  
,           p.[rows]  
,           rv.[value]  
,           p.[data_compression_desc]  
ORDER BY 1  

but it appears I cannot join the SYS information with my DATA information, as I am getting the following error:

Unsupported data type error. Statement references a data type that is unsupported in Parallel Data Warehouse, or there is an expression that yields an unsupported data type. Modify the statement and re-execute it.

I tried several join methods (INNER JOIN, IN clauses) and different CAST and CONVERT functions, none of which resulted.

Can you provide a proper method to achieve this dynamic partition switching this in Synapse please?

Thank you in advance.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
0 comments No comments
{count} votes

Answer accepted by question author
  1. HimanshuSinha 19,547 Reputation points Microsoft Employee Moderator
    2022-09-13T20:08:29.963+00:00

    Hello @Miguel Buenestado ,
    Thanks for the question and using MS Q&A platform.
    As we understand the ask here is "Unsupported data type error. Statement references a data type that is unsupported in Parallel Data Warehouse, or there is an expression that yields an unsupported data type" . Modify the statement and re-execute it., please do let us know if its not accurate.

    I am assuming that the error is on the line 26 . When I checked rv.[value] the datatype is SQL_Variant .

    240709-image.png

    and from the doc here i see that we do can convert SQL_variant to varchar : https://learn.microsoft.com/en-us/sql/t-sql/data-types/sql-variant-transact-sql?view=sql-server-ver16

    Can you please try to create a table ( for testing ) and perfomr a INSERT with the below logic .

    SELECT DISTINCT d_boundary_key AS partition_key
    FROM DIM_DATE dat
    INNER JOIN STAGING_XXXX stg
    ON dat.c_date_id = stg.c_date_id

    and then try to perform the JOIN .

    Also can you please share the schema details for DIM_DATE & STAGING_XXXX tables ?

    Please do let me if you have any queries.
    Thanks
    Himanshu


    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
      • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    1 person found this answer helpful.
    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Miguel Buenestado 41 Reputation points
    2022-09-15T07:40:36.38+00:00

    Hi @HimanshuSinha ,

    Yes indeed, the problem lies with the SQLVariant rv.value field.

    Your workaround seems satisfactory: I tried the INSERT you suggested (that is, a temp table just holding the partition details) and then I was able to perform the JOIN statement with this temporary table successfully.

    Thanks,
    Miguel


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.