Hi,
Without information to reproduce the specific scenario, I don't think that we can say what's the best way, except discuss some relevant golden roles (which can help a lot)...
then makes a loop over these 100K rows
(1) In general, using tabular databases as SQL Server which are designed to work on SET of rows, should be avoided and will probably provide worse performance then using languages like C++ which loop a collection like array.
When working with SQL Server, you should not compare loop using TSQL but try to solve the needs without loop and compare working with SET in TSQL
to working with loop in C#
for example.
For another extract I changed the XML generating process to do it as a single step for all rows and this cut the time from 70 min. to 2 min.
This make sense as I mentioned above.
calls a stored procedure
(2) Do you mean that you call the SP for each row?!?
or for the entire SET, which make more sense probably and might have overhead.
this XML is inserted into another table and also sha value is calculated using a custom UDF.
(3) So... if you meant that the SP is called for each rows then this also mean using the UDF for each row inside the loop?
This sound like a bad design
Note! DDL + DML + SP code could be very helpful to the discussion!
The procedure to generate XML is quite complex (it creates table variable with supporting data first, then uses very complex query with many FOR XML PATH inside
(4) This by itself might mean that it will be much better to run in the client side, which add to the reason not to use TSQL
@XML .modify
(5) This is a CLR method which uses XQuery that (in general) is the same as executing UDF SQLCLR function XQuery and it can be done in the client side using XQSharp for example, which might provide better performance.
Note!!! SQL Server uses an implementation of a VERY old version (outdated) of XQuery 1.0 (using XPath version 1.0), which is aligned with the July 2004 working draft of XQuery.
(6) The current w3c standards is XQuery 3.1 (with XPath 3.1). XQuery 3.1 extends XQuery to support JSON as well as XML, adding maps and arrays to the data model and supporting them with new expressions and new functions in the language. It can provide much better performance and more flexible.
Using your client app you can use XQuery version 3.1 which was released in 2017. This might gain some improvement in performance simply by using XQuery 3.1 instead of version 1.0 and using the new features.
Note: there was a connect asking Microsoft to update the version of XQuery which was opened on the topic in any version of the Microsoft feedback. as much as I remember. In the current feedback system it is not well formatted (probably poorly automatically copied from the old system). You can find these two related open feedback
- https://feedback.azure.com/d365community/idea/4e42eddf-5d25-ec11-b6e6-000d3a4f0da0
- https://feedback.azure.com/d365community/idea/153a9604-7025-ec11-b6e6-000d3a4f0da0
(7) Note! SQL Server supports both XQuery and OpenXML (using sp_xml_prepareDocument and sp_xml_removeDocument). As a golden rule, OpenXML provides better performance in shredding large documents, while XQuery is better for small documents.
Think about using OpenXML instead of XQuery.
----------
Ronen Ariely
Personal Site | Blog | Facebook | Linkedin