I need output in below format
In SQL? I would say close to impossible. HTML is a markup language for text presentation & formating, it's not structured data.
This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Below is the sample Input
<p>
<strong>
<em>AAAA</em>
</strong>
</p>
<ul>
<li>#2 </li>
<li>#3 </li>
<li>#6 </li>
</ul>
<p>
<strong>
<em>
<br />BBBBB:</em>
</strong>
</p>
<ul>
<li>Best 295 Business Schools</li>
<li>rrrr</li>
</li>
</ul>
<p>
I need output in below format
I need output in below format
In SQL? I would say close to impossible. HTML is a markup language for text presentation & formating, it's not structured data.
Hi @Ranga Reddy ,
HTML could be a legitimate (X)HTML.
In such case, it is very easy to process in T-SQL via its XQuery support.
Check it out.
T-SQL
DECLARE @input XML =
N'<p>
<strong>
<em>AAAA</em>
</strong>
</p>
<ul>
<li>#2 </li>
<li>#3 </li>
<li>#6 </li>
</ul>
<p>
<strong>
<em>
<br />BBBBB:</em>
</strong>
</p>
<ul>
<li>Best 295 Business Schools</li>
<li>rrrr</li>
</ul>';
SELECT seq
, p.value('(strong/em/text())[1]', 'VARCHAR(100)') AS col1
, li.value('text()[1]', 'VARCHAR(100)') AS col2
FROM @input.nodes('/p') AS t1(p)
CROSS APPLY (
SELECT t1.p.value('let $i := . return count(/p[. << $i]) + 1', 'INT')
) AS t2(seq)
OUTER APPLY @input.nodes('/ul[sql:column("t2.seq")]/li') AS t3(li);
Output
+-----+--------+---------------------------+
| seq | col1 | col2 |
+-----+--------+---------------------------+
| 1 | AAAA | #2 |
| 1 | AAAA | #3 |
| 1 | AAAA | #6 |
| 2 | BBBBB: | Best 295 Business Schools |
| 2 | BBBBB: | rrrr |
+-----+--------+---------------------------+
HTML could be a legitimate (X)HTML.
In such case, it is very easy to process in T-SQL via its XQuery support.
Indeed. Hopefully that is the case for Ranga.
However, HTML is more lax than XML. XML requires all tags to be closed. In the above sample, you have <br />
, which is good. But HTML permits <br>
only. And if your HTML looks like that, you cannot convert it to the XML data type. So how you do parse it in that case? Not in T-SQL, that much is clear. If you want to do it from SQL Server, you would have to use the CLR, or call a script in Python or similar if you have the Machine Learning Extensions installed. (For Python, you need SQL 2017 or later.)