Share via


Concatenating Columns over Rows in APS/PDW AU4 using T-SQL

CREATE TABLE CONCATEX(ORDID INT, ITEMFAMILY VARCHAR(3), ITEMSOLD VARCHAR(25))   WITH(DISTRIBUTION = HASH(ORDID))

INSERT CONCATEX VALUES(1234,’QWE’,’KEYBOARD’)

INSERT CONCATEX VALUES(1234,’QGH’,’LAPTOP’)

INSERT CONCATEX VALUES(1234,’QDS’,’BLUETOOTH’)

INSERT CONCATEX VALUES(1234,’QLK’,’ROUTER’)

INSERT CONCATEX VALUES(1234,’QPO’,’MOUSE’)

………………………….

https://saketbi.files.wordpress.com/2016/04/pic11.png?w=648

The expected output here is something like below:

https://saketbi.files.wordpress.com/2016/04/pic21.png?w=648

  • The order of strings in the concatenated string didn’t matter in my scenario

Typical ways to achieve this in SQL Server would be to either employ Recursive CTEs or use COALESCE or FOR XML PATH(). However, these constructs don’t work in APS AU4 except COALESCE which behaves differently in APS when it comes to its usage for concatenating strings in table rows. This is because APS generally only supports a subset of T-SQL constructs and query constructs supported by SQL Server (the same version that APS is based on). Note: there are no fancy functions yet in SQL server to do this unlike MySQL, Oracle, PostgreSQL etc.

So, in order to achieve these results easily in APS, we can use the following method as one of the solutions.

SELECT ORDID,[1]+’;’+ [2]+’;’+[3]+’;’+[4]+’;’+[5] AS CONCATITEMLIST

FROM (SELECT ROW_NUMBER() OVER (ORDER BY ITEMSOLD) AS ID,ORDID,ITEMSOLD FROM CONCATEX) AS src

PIVOT (MAX(ITEMSOLD) FOR ID IN ([1],[2],[3],[4],[5])) AS pv

https://saketbi.files.wordpress.com/2016/04/pic3.png?w=648

In case we have multiple ORDID values in the table, we can use PARTITION BY.

SELECT ORDID,[1]+’;’+ [2]+’;’+[3]+’;’+[4]+’;’+[5] AS CONCATITEMLIST

FROM (SELECT ROW_NUMBER() OVER (PARTITION BY ORDID ORDER BY ITEMSOLD) AS ID,ORDID,ITEMSOLD FROM CONCATEX) AS src

PIVOT (MAX(ITEMSOLD) FOR ID IN ([1],[2],[3],[4],[5])) AS pv

We may in some cases need to make this query dynamic because at run-time we do not know the actual number of rows a certain ORDID has and it may be different for different ORDID values.

DECLARE @norows int =<No of rows for the given ORDID>;

WHILE @count <= @norows

BEGIN

IF @COLList = ”

SET @COLList = @COLList + ‘[‘+ CAST(@count as varchar(10)) +’]’;

ELSE

SET @COLList = @COLList + ‘,’ + ‘[‘+ CAST(@count as varchar(10)) +’]’;

IF @CONCATCols = ”

SET @CONCATCols = @CONCATCols + ‘[‘+ CAST(@count as varchar(10)) +’]’;

ELSE

SET @CONCATCols = @CONCATCols + ‘+’ + ”” + ‘;’ + ”” + ‘+’ + ‘[‘+ CAST(@count as varchar(10)) +’]’;

SET @count = @count +1;

END

DECLARE @QUERYOS varchar(8000) = ‘SELECT ORDID, ‘+ @CONCATCols +’ AS CONCATITEMLIST

FROM (SELECT ROW_NUMBER() OVER (PARTITION BY ORDID ORDER BY ITEMSOLD) AS ID,ORDID,ITEMSOLD FROM CONCATEX) AS src

PIVOT (MAX(ITEMSOLD) FOR ID IN (‘+ @COLList +’)) AS pv;’

EXEC(@QUERYOS);

Please update if you find a better way to do it with the future versions of APS.

Note: Any other way to do this ends in very bad performance.