Share via

Processing complex XML structure in SQL Server

Bammes 1 Reputation point
2022-09-14T12:07:12.27+00:00

Hello!
I have the following XML Structure in Files that I need to transform in a table structure:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE STACK SYSTEM 'C:\System\Config\DTD\sfx_export.dtd'>
<STACK Category="Kfz-Dokumente" Priority="5" SubSystem="1000_Hel" StackID="HELV-ENF_782ee4e0_1830a8aaa15_-8000">
<ATTRIBUTES>
<KeyValuePair Key="$Channel" Value="WI_33"/>
<KeyValuePair Key="$ScanDate" Value="2022-08-08 19:23"/>
<KeyValuePair Key="$ScanDatum" Value="2022-05-23T19:57:57"/>
<KeyValuePair Key="$ScanStackID" Value="HEL23423423432_234324324_4534534-2"/>
<KeyValuePair Key="$StackDir" Value="\HELV-ENF\HELV-ENF_782ee4e0_1830a8aaa15_-8000"/>
</ATTRIBUTES>
<PROCESS ProcessID="0002" Rescan="FALSE">
<DOCUMENT DocID="0002.0001" Rescan="FALSE" DocClass="IPM/KFZ-Dokumente/Default" AlteredBy="System" ExportName="KFZ-Dokumente" Comment="">
<PAGE LocationColorID="0002.0001.pdf:0" Deleted="FALSE" Attached="FALSE" Fragment="FALSE" Skipped="FALSE" BackSide="FALSE" Empty="TRUE" ImageID="0002.0001.0000" LocationID="0002.0001.pdf:0" Path="\helv-enf\HELV-ENF_782ee4e0_1830a8aaa15_-8000\0002.0001.pdf:0"/>
<PAGE LocationColorID="0002.0001.pdf:1" Deleted="FALSE" Attached="TRUE" Fragment="FALSE" Skipped="FALSE" BackSide="FALSE" Empty="TRUE" ImageID="0002.0001.0001" LocationID="0002.0001.pdf:1" Path="\helv-enf\HELV-ENF_782ee4e0_1830a8aaa15_-8000\0002.0001.pdf:1"/>
<PAGE LocationColorID="0002.0001.pdf:2" Deleted="FALSE" Attached="FALSE" Fragment="FALSE" Skipped="TRUE" BackSide="FALSE" Empty="TRUE" ImageID="0002.0002.0000" LocationID="0002.0001.pdf:2" Path="\helv-enf\HELV-ENF_782ee4e0_1830a8aaa15_-8000\0002.0001.pdf:2"/>
<PAGE LocationColorID="0002.0001.pdf:3" Deleted="FALSE" Attached="FALSE" Fragment="FALSE" Skipped="TRUE" BackSide="FALSE" Empty="TRUE" ImageID="0002.0003.0000" LocationID="0002.0001.pdf:3" Path="\helv-enf\HELV-ENF_782ee4e0_1830a8aaa15_-8000\0002.0001.pdf:3"/>
<FIELDGROUP Name="default">
<FIELD Rating="OK" Name="#KFZ_KENNZEICHEN" Value="W-4545" VerifierAction="Changed" AlteredBy="Administrator" Checks="OK" Comment=""/>
<FIELD Rating="OCR_RESULT" Name="ABSENDER" Value="sender" VerifierAction="NoAction" AlteredBy="System" Checks="OK" Comment=""/>
<FIELD Rating="OCR_RESULT" Name="ABSENDER_BRIEF" Value="sender letter" VerifierAction="NoAction" AlteredBy="System" Checks="OK" Comment=""/>
<FIELD Rating="OCR_RESULT" Name="ADRESSE" Value="sender address" VerifierAction="NoAction" AlteredBy="System" Checks="OK" Comment=""/>
<FIELD Rating="OCR_RESULT" Name="ARCHIV" Value="" VerifierAction="NoAction" AlteredBy="System" Checks="OK" Comment=""/>
<FIELD Rating="OCR_RESULT" Name="BETREFF" Value="" VerifierAction="NoAction" AlteredBy="System" Checks="OK" Comment=""/>
<FIELD Rating="OCR_RESULT" Name="EMAIL" Value="test@test .de" VerifierAction="NoAction" AlteredBy="System" Checks="OK" Comment=""/>
<FIELD Rating="OCR_RESULT" Name="EMPFAENGER" Value="" VerifierAction="NoAction" AlteredBy="System" Checks="OK" Comment=""/>
<FIELD Rating="OCR_RESULT" Name="EMPFAENGER_CC" Value="" VerifierAction="NoAction" AlteredBy="System" Checks="OK" Comment=""/>
<FIELD Rating="OCR_RESULT" Name="GESCHAEFTSSTELLE" Value="" VerifierAction="NoAction" AlteredBy="System" Checks="OK" Comment=""/>
<FIELD Rating="OCR_RESULT" Name="KUNDENDATUM" Value="" VerifierAction="NoAction" AlteredBy="System" Checks="OK" Comment=""/>
<FIELD Rating="OCR_RESULT" Name="PLZ" Value="" VerifierAction="NoAction" AlteredBy="System" Checks="OK" Comment=""/>
<FIELD Rating="OCR_RESULT" Name="PRAEMIE" Value="" VerifierAction="NoAction" AlteredBy="System" Checks="OK" Comment=""/>
<FIELD Rating="OCR_RESULT" Name="STAPELPOSITION" Value="" VerifierAction="NoAction" AlteredBy="System" Checks="OK" Comment=""/>
<FIELD Rating="OCR_RESULT" Name="VBNR" Value="" VerifierAction="NoAction" AlteredBy="System" Checks="OK" Comment=""/>
<FIELD Rating="OCR_RESULT" Name="ZUSATZINFO" Value="" VerifierAction="NoAction" AlteredBy="System" Checks="OK" Comment=""/>
</FIELDGROUP>
<TABLE Name="TA_GENR">
<ROW>
<FIELD Rating="OCR_RESULT" Name="DOKTYP" Value="KFZ-ZULASSUNG" VerifierAction="NoAction" AlteredBy="System" Checks="OK" Comment=""/>
<FIELD Rating="OCR_RESULT" Name="KDNR" Value="" VerifierAction="NoAction" AlteredBy="System" Checks="OK" Comment=""/>
<FIELD Rating="OCR_RESULT" Name="KFZ_KENNZEICHEN" Value="W-4545" VerifierAction="NoAction" AlteredBy="System" Checks="OK" Comment=""/>
<FIELD Rating="OCR_RESULT" Name="KLASSE" Value="KFZ-DOKUMENTE" VerifierAction="NoAction" AlteredBy="System" Checks="OK" Comment=""/>
<FIELD Rating="OCR_RESULT" Name="SDNR" Value="99999" VerifierAction="NoAction" AlteredBy="System" Checks="OK" Comment=""/>
<FIELD Rating="OCR_RESULT" Name="VOKA" Value="SONSTIGES" VerifierAction="NoAction" AlteredBy="System" Checks="OK" Comment=""/>
<FIELD Rating="OCR_RESULT" Name="VSNR" Value="" VerifierAction="NoAction" AlteredBy="System" Checks="OK" Comment=""/>
</ROW>
<ROW>
<FIELD Rating="OCR_RESULT" Name="DOKTYP" Value="KFZ-ZULASSUNG" VerifierAction="NoAction" AlteredBy="System" Checks="OK" Comment=""/>
<FIELD Rating="OCR_RESULT" Name="KDNR" Value="" VerifierAction="NoAction" AlteredBy="System" Checks="OK" Comment=""/>
<FIELD Rating="OCR_RESULT" Name="KFZ_KENNZEICHEN" Value="W-4545" VerifierAction="NoAction" AlteredBy="System" Checks="OK" Comment=""/>
<FIELD Rating="OCR_RESULT" Name="KLASSE" Value="KFZ-DOKUMENTE" VerifierAction="NoAction" AlteredBy="System" Checks="OK" Comment=""/>
<FIELD Rating="OCR_RESULT" Name="SDNR" Value="" VerifierAction="NoAction" AlteredBy="System" Checks="OK" Comment=""/>
<FIELD Rating="OCR_RESULT" Name="VOKA" Value="SONSTIGES" VerifierAction="NoAction" AlteredBy="System" Checks="OK" Comment=""/>
<FIELD Rating="OCR_RESULT" Name="VSNR" Value="12345" VerifierAction="NoAction" AlteredBy="System" Checks="OK" Comment=""/>
</ROW>
</TABLE>
</DOCUMENT>
</PROCESS>
</STACK>

The result schould be ab table where the fields from DOCUMENT\FIELDGROUP[@DeezNutz ="default"]-Node are cross-joined with the fields from each row-Node under node "table[@DeezNutz ="TA_GENR]"

the expected result for the given example would be (if reduced the example resultset to only a subset of the fields available)

241023-image.png

There may be multiple DOCUMENT-Nodes

The XML data will be provided as an XML file in the filesystem.
I would prefer one single statement to read and process the data receive the result as records from a query.

The data won't be stored in the DB. The resultset will be consumed by an application that will process the data.
All columns may be assumed to be of type VARCHAR(100)

Is that doable?

SQL-Server Version: Microsoft SQL Server 2019 (RTM-GDR) (KB5014356) - 15.0.2095.3 (X64)

Kind regards
Thomas

Developer technologies | Transact-SQL
Developer technologies | Transact-SQL

A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.

SQL Server | Other
SQL Server | Other

Additional SQL Server features and topics not covered by specific categories


1 answer

Sort by: Most helpful
  1. Yitzhak Khabinsky 27,196 Reputation points
    2022-09-14T14:50:57.207+00:00

    Hi @Bammes ,

    Please try the following solution.

    T-SQL CROSS APPLY statements are simulating relationships in the hierarchical XML.

    SQL

    WITH rs (xmlData) AS  
    (  
       SELECT TRY_CAST(BulkColumn AS XML)   
       FROM OPENROWSET(BULK N'e:\Temp\Bammes-7329.xml', SINGLE_BLOB) AS x  
    )  
    SELECT p.value('@ProcessID', 'VARCHAR(10)') AS ProcessID  
       , d.value('@DocID', 'VARCHAR(20)') AS DocID  
       , f.value('(FIELD[@Name="ABSENDER"]/@Value)[1]', 'VARCHAR(100)') AS [Field.ABSENDER]  
       , f.value('(FIELD[@Name="ABSENDER_BRIEF"]/@Value)[1]', 'VARCHAR(100)') AS [Field.ABSENDER_BRIEF]  
       , f.value('(FIELD[@Name="ADRESSE"]/@Value)[1]', 'VARCHAR(100)') AS [Field.ADRESSE]  
       , f.value('(FIELD[@Name="EMAIL"]/@Value)[1]', 'VARCHAR(100)') AS [Field.EMAIL]  
       , r.value('(FIELD[@Name="DOKTYP"]/@Value)[1]', 'VARCHAR(100)') AS [TA_GENR.DOKTYP]  
       , r.value('(FIELD[@Name="VSNR"]/@Value)[1]', 'VARCHAR(100)') AS [TA_GENR.VSNR]  
       , r.value('(FIELD[@Name="SDNR"]/@Value)[1]', 'VARCHAR(100)') AS [TA_GENR.SDNR]  
    FROM rs   
       CROSS APPLY xmlData.nodes('/STACK/PROCESS') AS t(p)  
       CROSS APPLY p.nodes('DOCUMENT') AS t1(d)  
       CROSS APPLY d.nodes('FIELDGROUP[@Name="default"]') AS t2(f)  
       CROSS APPLY d.nodes('TABLE/ROW') AS t3(r)  
    

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.