Hello Everyone
We are converting several ETL processes from a windows 2012, SQL2016 environment to a windows 2019, SQL2019 environment. All is going well, except for one process. This process is running longer in SQL2019 then SQL2016.
The process performs an INSERT into A, select x from B, where x is a series of substring and case statements.
The source table basically contains 1 data field, 6,000 bytes long. The target table contains over 950 data columns, the largest in our ETL processes.
The source table contains 4.7 million rows and loads in 47 minutes, 100,000 rows per minute.
I split out 1 million rows for testing purposes, this runs in 10 minutes. Again, 100,000 rows per minute.
On my SQL2016, windows 2012 server, a full load runs in 12 - 14 minutes, while the 1 million sampling runs in 2 minutes.
For grins, I stalled SQL2016 on my ETL 2019 server and ran the same tests. Full load 7 minutes 14 seconds, whereas the 1 million sampling took 1 minute 32 seconds.
All three instances have a MAXDOP of 8 and a Cost Threshold For Parallelism (CTFP) of 50.
While the actual ETL process is procedure based, I pulled the insert code out and I'm running it in SSMS as t-sql commands.
I understand there are a lot of new "performance" improvements in SQL2019 and I've tried a lot of different combinations, so far, nothing has helped reduce the run times.
Some observations: A) reducing the target table to only 100 columns, 1 million rows loaded in 1 minute; 4.7 million loaded in 5 minutes. B) increasing the column count to 250 and loading 1 million rows took 6 minutes.
I cannot put my finger on it, but something around the substring/case function and how its interacting with SQL is causing me some grief.
So, I'm reaching out to SQL2019 guru's for advise and where to look, what to tune. Converting to SSMS at this time is not an option.
Your thoughts?