T-SQL : Performing Rowwise String Operations in SQL Server
This article covers some of the common scenarios of string manipulation across rows of a given table.
Many a time, we come across a situation where we would want to perform string operations across the rows of a table. Some examples would be: concatenation of multiple rows satisfying a particular condition of a table or splitting a row to multiple ones based on specific conditions or generation of extra rows to match the definition of a table. We also see a lot of questions on these topics on the forums. Hence, I have discussed below a few such scenarios with examples.
I came across this particular thread in the forums where the OP had asked for a mechanism to split (convert) multiple rows of a given id to a set of rows each of them having a length 'n'. This can be achieved by first concatenating the rows of a given id (that qualify the condition) as a single string and then splitting this string into multiple rows each of a length 'n'.
For example, lets assume that we would want the below mentioned first table to be converted as the second one (on the right). Here data2 is alone placed much below as the data2 carries a later date timestamp. Similarly dataX falls before dataA in the result because of the lesser date value.
So for this to happen, we would need an input of the respective table data and the length upto which the rows are to be trimmed.
declare @length int
set @length=10
declare @temp table
(
file_number int,
data varchar(max),
insertion_date datetime
)
insert @temp select 19,'data1','2014-01-30 12:41:14.733'
insert @temp select 19,'data2','2014-01-31 12:41:14.733'
insert @temp select 19,'data3','2014-01-30 15:41:14.733'
insert @temp select 19,'data4','2014-01-30 15:41:14.733'
insert @temp select 19,'data5','2014-01-30 15:41:14.733'
insert @temp select 19,'data6','2014-01-30 15:41:14.733'
insert @temp select 27,'dataA','2014-09-01 12:41:14.733'
insert @temp select 27,'dataB','2014-08-01 12:41:14.733'
insert @temp select 27,'dataX','2014-08-30 12:41:14.733'
--select * from @temp
--collation of all comments of a particular id as a single row
;with tab
as
(
select file_number,
(select ''+data from @temp where file_number=t.file_number order by insertion_date for
xml path('')) as data
from @temp t
group by file_number
)
--splitting each data(single row) to multiple rows based on the user-provided length
,cte
as
(
select file_number,substring(data,1,@length) as text,len(substring(data,1,@length)) as pos
from tab
UNION ALL
select t.file_number,substring(t.data,c.pos+1,@length) as text,c.pos+len(substring(t.data,c.pos
+1,@length)) as pos
from cte c inner join tab t on c.file_number=t.file_number
where len(t.data)-c.pos>0
)
--Obtaining the final result
select file_number,text as data from cte
order by file_number,pos
Here, we have the table @temp consisting of the input data columns - file_number, data (which is to be concatenated and thereafter splitted) and insertion_Date (according to which the data are to be concatenated and shown)
The approach followed is:
-tab is prepared to give the concatenated string (with the help of xml path) for every file_number in the order of ascending insertion_date
-cte makes use of the tab generated above and the substring function in a looping manner, to generate rows of the length as dictated by @length. We have the 'pos' column to keep track of the length of text that has been covered on each line.
-select the file_number and data as the output using the 'pos' column to give it in the right order of ascending timestamp.
This gives an output as follows:
In this example, we are going to see the usage of dynamic pivot to obtain a holiday tracker table.
We have as input the start date and end date of holidays taken by various employees. We would now need to show this data in a calendar format with the respective holidays marked as 'X'. For example the conversion would be as illustrated below:
We are going to first expand the available data across the required period of time for example, for a full month. Then we will pivot the data and display the holiday tracker.](resources/5226.pic1.JPG)
----Sample #emp table
--
--create table #emp
--(
-- employ_ref int,
-- employ_name varchar(255),
-- fromdate datetime,
-- untildate datetime
--)
--
--insert #emp select 88123,'jk','2013-11-23','2013-11-25'
--insert #emp select 88123,'jk','2013-11-28','2013-11-28'
--insert #emp select 88123,'jk','2013-12-28','2014-01-13'
--insert #emp select 88215,'jay','2013-11-01','2013-11-09'
--insert #emp select 88215,'jay','2013-11-15','2013-11-15'
--insert #emp select 88999,'kumaur',NULL,NULL
--the input table
--select * from #emp
--declaration of variables
declare @datetab TABLE(ldate datetime)
declare @n int
declare @startdate datetime,@enddate datetime
declare @datestring nvarchar(max),@query nvarchar(max)
--assigning of user-defined variables
set @startdate='2013-11-01'--starting date of the forecast
set @n=2 --number of months from @startdate for which absence data is to be shown
set @enddate=dateadd(m,@n,@startdate)--the enddate of the absence chart calculated using @startdate and @n
--preparing the datetab table with the list of dates for which absence is to be shown
;with datetab_popl
as
(
select @startdate as ldate
UNION ALL
select dateadd(dd,1,ldate)
from datetab_popl
where ldate<@enddate
)
insert @datetab
select ldate from datetab_popl option (maxrecursion 0)
--Conversion of range of dates (single row) to a list of dates (multiple rows)
;with cte
as
(
select employ_Ref,fromdate,untildate,fromdate as ldate,'X' as flag,1 as level
from #emp
--where fromdate is not null
UNION ALL
select employ_ref,fromdate,untildate,dateadd(dd,1,ldate) as ldate,'X' as flag,level+1 as level
from cte
where ldate<untildate
)
--Joining of employee absence dates with the full date range of @datetab
,cte1
as
(
select *
from
(
select distinct employ_Ref
from #emp
) e
cross join @datetab
)
--select * from cte1
--order by employ_ref
--preparation of data for pivot
select a.employ_ref,convert(varchar,a.ldate,112) as ldate,ISNULL(b.flag ,'') as flag
into #temp
from cte1 a
left join cte b on a.employ_Ref=b.employ_Ref and a.ldate=b.ldate
order by employ_ref,ldate option (maxrecursion 0)
--preparation of the datestring for pivot query
select @datestring=ISNULL(@datestring,'')+',['+ldate+']'
from
(
select distinct ldate from #temp
)tt
--print @datestring
--Formation of the dynamic pivot query
set @query='
select *
from
(
select employ_Ref,ldate,flag
from #temp
) tt
PIVOT
(
max(flag) for ldate in('+stuff(@datestring,1,1,'')+')
)pvt'
--execution of the pivot query and viewing the results
exec sp_executesql @query
--temp table cleanup
drop table #temp
-Preparation of the total set of dates for which tracker to be shown (2 months in the example)
-Preparation of the dates for which employees are availing a holiday - typical conversion of columns to rows with expansion.
-Cross joining of the above tables to obtain the master list of data
-Pivoting them to obtain the holiday tracker and marking the holidays as 'X"
the output for the above query would be as follows:
There would be situations wherein we would have multiple rows of data with changes in a particular column alone. In such a case, we would want to combine all rows together with the changing column data set in a csv format. For example changing from the below given table1 to table2. Here, we assume that the input column_list is subject,book which means that these are the columns that are to be grouped and displayed in CSV format.
This transformation can be achieved with the help of subquery and xml path. However, doing this transformation for a dynamic column list takes it to the next level, wherein we are required to frame the above mentioned subquery also in a dynamic manner. So assuming that we have the list of columns (for which CSV representation is to be made) and the table data as input, we can achieve the functionality as below:
/* Creation of a sample table to mimic the duplicating output*/
CREATE TABLE #temp
(
id INT,
name NVARCHAR(100),
s_name NVARCHAR(100),
B_name NVARCHAR(100),
timestamp DATETIME
)
/* Inserting Records into the Sample Table */
INSERT INTO #temp SELECT 1,'ram','s1','B1','2013-06-04 13:56:47.017'
INSERT INTO #temp SELECT 1,'ram','s1','B1','2013-06-05 13:56:47.017'
INSERT INTO #temp SELECT 1,'ram','s3','B2','2013-06-04 13:56:47.017'
--SELECT * FROM #temp
--truncate table #temp
/*Obtaining the columns of the table*/
DECLARE @Column_Tab TABLE(id int identity,Column_Name nvarchar(100))
INSERT INTO @column_tab
SELECT c.name
FROM tempdb.sys.columns c
--inner join tempdb.sys.tables t ON c.object_id = t.object_id
WHERE object_id=object_id('tempdb..#temp')
/* Setting up the Constant Variables*/
DECLARE @i int
DECLARE @distinct_list nvarchar(max)
DECLARE @CSV_list nvarchar(max)
DECLARE @subquery1 VARCHAR(200)
DECLARE @subquery2 VARCHAR(200)
SET @i=1
SET @distinct_list=''
SET @CSV_list=''
SET @subquery1='
,substring(
(Select '',''+A.'
SET @subquery2=' From #temp A
Where A.timestamp = B.timestamp
ORDER BY A.timestamp For XML PATH (''''))
,2, 1000) as '
/*Obtaining the distinct and CSV column names list*/
WHILE(@i<=(SELECT COUNT(*) FROM @column_tab))
BEGIN
SELECT @distinct_list= ISNULL(@distinct_list,'')+@subquery1+column_name+@subquery2+column_name FROM @column_tab WHERE id=@i AND Column_name in ('s_name','B_name')
SELECT @CSV_list= ISNULL(@CSV_list,'')+',B.'+column_name FROM @column_tab WHERE id=@i AND Column_name not in ('s_name','B_name')
SET @i=@i+1
END
--PRINT @distinct_list
--print '---------------------------'
--PRINT STUFF(@CSV_list,1,1,'')
--PRINT '---------------------------'
/*Framing the Dynamic Query*/
DECLARE @sql nvarchar(max)
SET @sql='SELECT DISTINCT '+STUFF(@CSV_list,1,1,'')+@distinct_list+'
FROM #temp B'
--PRINT @sql
EXEC sp_executesql @sql
--Cleaning up
drop table #temp
/*
Sample Query
---------------
SELECT DISTINCT B.id,B.name,B.timestamp
,substring(
(Select ','+A.s_name From #temp A
Where A.timestamp = B.timestamp
ORDER BY A.timestamp For XML PATH (''))
,2, 1000) as s_name
,substring(
(Select ','+A.B_name From #temp A
Where A.timestamp = B.timestamp
ORDER BY A.timestamp For XML PATH (''))
,2, 1000) as B_name
FROM #temp B
*/
-Initial setup of the table
-Obtaining the column list of the table with which we'll be working upon
-initialization of the variables
-framing of the dynamic subquery components - the csv format is to be achieved for the column names mentioned in the in clause of line 50. This can obtained via a variable and this whole thing can also be in turn achieved using dynamic query as well.
-framing of the main query and execution of the same.
-a sample query output to demonstrate how the finally formed query would look like.
The output would be:
We have seen the ways of doing different set operations using strings that can be performed in SQL Server. The methods have been explained with the help of common examples that are seen commonly in forums.
See Also