Try html-agility-pack.net, you can get it in nuget
Agree html is really weird, but with packet above and some work I believe you can get what you want
For documentation
How to parse values in html table with same class?
I'm trying to parse an html with basically the same tags.
Is there a way to get as output:
BTC - Bitcoin, BEP20(BSC), Bitcoin(Segwit)
ETH - ERC20, BEP20(BSC), POLYGON, ARBITRUM, AURORA, MATISEVM
USDT - OMNI,TRC20,ERC20,BEP20(BSC),HECO,POLYGON,FTM, AVAX-C ,ARBITRUM,METISEVM
QASH - ERC20
from the follow html code sample
<div data-v-326d86f4="" class="table-box">
<table data-v-326d86f4="">
<tr data-v-326d86f4="">
<td data-v-326d86f4="">BTC</td>
<td data-v-326d86f4="" class="block-chain">
<div data-v-326d86f4="" class="chain_box"><span data-v-326d86f4="" class="chain_name">Bitcoin</span> <span data-v-326d86f4=""><i data-v-326d86f4="" class="fa fa-caret-down"></i></span></div>
<div data-v-326d86f4="" class="select-list"><span data-v-326d86f4="">Bitcoin</span><span data-v-326d86f4="">BEP20(BSC)</span><span data-v-326d86f4="">Bitcoin(SegWit)</span></div>
</td>
<td data-v-326d86f4="">0.001</td>
<td data-v-326d86f4="">0.002</td>
</tr>
<tr data-v-326d86f4="">
<td data-v-326d86f4="">ETH</td>
<td data-v-326d86f4="" class="block-chain">
<div data-v-326d86f4="" class="chain_box"><span data-v-326d86f4="" class="chain_name">ERC20</span> <span data-v-326d86f4=""><i data-v-326d86f4="" class="fa fa-caret-down"></i></span></div>
<div data-v-326d86f4="" class="select-list"><span data-v-326d86f4="">ERC20</span><span data-v-326d86f4="">BEP20(BSC)</span><span data-v-326d86f4="">POLYGON</span><span data-v-326d86f4="">ARBITRUM</span><span data-v-326d86f4="">AURORA</span><span data-v-326d86f4="">METISEVM</span></div>
</td>
<td data-v-326d86f4="">0.012</td>
<td data-v-326d86f4="">0.024</td>
</tr>
<tr data-v-326d86f4="">
<td data-v-326d86f4="">USDT</td>
<td data-v-326d86f4="" class="block-chain">
<div data-v-326d86f4="" class="chain_box"><span data-v-326d86f4="" class="chain_name">OMNI</span> <span data-v-326d86f4=""><i data-v-326d86f4="" class="fa fa-caret-down"></i></span></div>
<div data-v-326d86f4="" class="select-list"><span data-v-326d86f4="">OMNI</span><span data-v-326d86f4="">TRC20</span><span data-v-326d86f4="">ERC20</span><span data-v-326d86f4="">BEP20(BSC)</span><span data-v-326d86f4="">HECO</span><span data-v-326d86f4="">POLYGON</span><span data-v-326d86f4="">FTM</span><span data-v-326d86f4="">AVAX-C</span><span data-v-326d86f4="">ARBITRUM</span><span data-v-326d86f4="">METISEVM</span></div>
</td>
<td data-v-326d86f4="">30</td>
<td data-v-326d86f4="">50</td>
</tr>
<tr data-v-326d86f4="">
<td data-v-326d86f4="">QASH</td>
<td data-v-326d86f4="" class="block-chain">
<div data-v-326d86f4="" class="chain_box">
<span data-v-326d86f4="" class="chain_name">ERC20</span> <!---->
</div>
<!---->
</td>
<td data-v-326d86f4="">513</td>
<td data-v-326d86f4="">1026</td>
</tr>
I'm trying using the library htmlAgilityPack, without success with the code
Dim arqHtml As String = "C:\Users\Mattia\Desktop\ready.html"
Dim myHtml As HtmlAgilityPack.HtmlDocument = New HtmlAgilityPack.HtmlDocument()
myHtml.Load(arqHtml)
Dim myTable As HtmlAgilityPack.HtmlNode = myHtml.DocumentNode.SelectSingleNode("//table")
Dim myRows As HtmlAgilityPack.HtmlNodeCollection = myTable.SelectNodes("tr")
For Each tmpRow As HtmlAgilityPack.HtmlNode In myRows
Dim myCells As HtmlAgilityPack.HtmlNodeCollection = tmpRow.SelectNodes("td")
If myCells IsNot Nothing Then
Dim myToken As String = myCells(0).InnerText
Dim mySpans As HtmlAgilityPack.HtmlNodeCollection = myCells(1).SelectNodes("div[contains(@class,'select-list')]/span")
Dim myListBChain As New List(Of String)
For Each mySpan As HtmlAgilityPack.HtmlNode In mySpans
myListBChain.Add(mySpan.InnerText)
Next
RichTextBox1.Text += String.Join(", ", myListBChain)
End If
Next
which throw the error:
Object Reference not set to an instance of an object
on the line
For Each mySpan As HtmlAgilityPack.HtmlNode In mySpans
even if during debug, the compiler seems to be halfway to the final result like the following image
Thanks
-
Jose Zero 576 Reputation points
2022-02-05T17:58:06.65+00:00
1 additional answer
Sort by: Most helpful
-
Mattia Fanti 356 Reputation points
2022-02-06T10:38:01.783+00:00 I solved with
Dim arqHtml As String = "C:\Users\Mattia\Desktop\ready.html" Dim myHtml As HtmlAgilityPack.HtmlDocument = New HtmlAgilityPack.HtmlDocument() myHtml.Load(arqHtml) Dim myTable As HtmlAgilityPack.HtmlNode = myHtml.DocumentNode.SelectSingleNode("//table") Dim myRows As HtmlAgilityPack.HtmlNodeCollection = myTable.SelectNodes("tr") For Each tmpRow As HtmlAgilityPack.HtmlNode In myRows Dim myCells As HtmlAgilityPack.HtmlNodeCollection = tmpRow.SelectNodes("td") If myCells IsNot Nothing Then Dim myToken As String = myCells(0).InnerText Dim mySpans As HtmlAgilityPack.HtmlNodeCollection = myCells(1).SelectNodes("div[contains(@class,'select-list')]/span") Dim chainText As String If mySpans Is Nothing Then Dim chainTextNode As HtmlAgilityPack.HtmlNode = myCells(1).SelectSingleNode( "div[contains(@class, 'chain_box')]/span[contains(@class, 'chain_name')]" ) chainText = If(chainTextNode Is Nothing OrElse String.IsNullOrWhiteSpace(chainTextNode.InnerText), "(unknown)", chainTextNode.InnerText) Else chainText = String.Join(", ", mySpans.Select(Function(span) span.InnerText)) ' Alternative: chainText = String.Join(", ", From span In mySpans Select span.InnerText) End If RichTextBox1.Text &= $"{myToken} - {chainText}{Environment.NewLine}" End If Next