How to parse values in html table with same class?

Mattia Fanti 356 Reputation points
2022-02-05T11:40:20.617+00:00

I'm trying to parse an html with basically the same tags.

Is there a way to get as output:

BTC - Bitcoin, BEP20(BSC), Bitcoin(Segwit)

ETH - ERC20, BEP20(BSC), POLYGON, ARBITRUM, AURORA, MATISEVM

USDT - OMNI,TRC20,ERC20,BEP20(BSC),HECO,POLYGON,FTM, AVAX-C ,ARBITRUM,METISEVM

QASH - ERC20

from the follow html code sample

<div data-v-326d86f4="" class="table-box">  
   <table data-v-326d86f4="">  
      <tr data-v-326d86f4="">  
         <td data-v-326d86f4="">BTC</td>  
         <td data-v-326d86f4="" class="block-chain">  
            <div data-v-326d86f4="" class="chain_box"><span data-v-326d86f4="" class="chain_name">Bitcoin</span> <span data-v-326d86f4=""><i data-v-326d86f4="" class="fa fa-caret-down"></i></span></div>  
            <div data-v-326d86f4="" class="select-list"><span data-v-326d86f4="">Bitcoin</span><span data-v-326d86f4="">BEP20(BSC)</span><span data-v-326d86f4="">Bitcoin(SegWit)</span></div>  
         </td>  
         <td data-v-326d86f4="">0.001</td>  
         <td data-v-326d86f4="">0.002</td>  
      </tr>  
      <tr data-v-326d86f4="">  
         <td data-v-326d86f4="">ETH</td>  
         <td data-v-326d86f4="" class="block-chain">  
            <div data-v-326d86f4="" class="chain_box"><span data-v-326d86f4="" class="chain_name">ERC20</span> <span data-v-326d86f4=""><i data-v-326d86f4="" class="fa fa-caret-down"></i></span></div>  
            <div data-v-326d86f4="" class="select-list"><span data-v-326d86f4="">ERC20</span><span data-v-326d86f4="">BEP20(BSC)</span><span data-v-326d86f4="">POLYGON</span><span data-v-326d86f4="">ARBITRUM</span><span data-v-326d86f4="">AURORA</span><span data-v-326d86f4="">METISEVM</span></div>  
         </td>  
         <td data-v-326d86f4="">0.012</td>  
         <td data-v-326d86f4="">0.024</td>  
      </tr>  
      <tr data-v-326d86f4="">  
         <td data-v-326d86f4="">USDT</td>  
         <td data-v-326d86f4="" class="block-chain">  
            <div data-v-326d86f4="" class="chain_box"><span data-v-326d86f4="" class="chain_name">OMNI</span> <span data-v-326d86f4=""><i data-v-326d86f4="" class="fa fa-caret-down"></i></span></div>  
            <div data-v-326d86f4="" class="select-list"><span data-v-326d86f4="">OMNI</span><span data-v-326d86f4="">TRC20</span><span data-v-326d86f4="">ERC20</span><span data-v-326d86f4="">BEP20(BSC)</span><span data-v-326d86f4="">HECO</span><span data-v-326d86f4="">POLYGON</span><span data-v-326d86f4="">FTM</span><span data-v-326d86f4="">AVAX-C</span><span data-v-326d86f4="">ARBITRUM</span><span data-v-326d86f4="">METISEVM</span></div>  
         </td>  
         <td data-v-326d86f4="">30</td>  
         <td data-v-326d86f4="">50</td>  
      </tr>  
      <tr data-v-326d86f4="">  
         <td data-v-326d86f4="">QASH</td>  
         <td data-v-326d86f4="" class="block-chain">  
            <div data-v-326d86f4="" class="chain_box">  
               <span data-v-326d86f4="" class="chain_name">ERC20</span> <!---->  
            </div>  
            <!---->  
         </td>  
         <td data-v-326d86f4="">513</td>  
         <td data-v-326d86f4="">1026</td>  
      </tr>  

I'm trying using the library htmlAgilityPack, without success with the code

Dim arqHtml As String = "C:\Users\Mattia\Desktop\ready.html"  
        Dim myHtml As HtmlAgilityPack.HtmlDocument = New HtmlAgilityPack.HtmlDocument()  
        myHtml.Load(arqHtml)  
        Dim myTable As HtmlAgilityPack.HtmlNode = myHtml.DocumentNode.SelectSingleNode("//table")  

        Dim myRows As HtmlAgilityPack.HtmlNodeCollection = myTable.SelectNodes("tr")  
        For Each tmpRow As HtmlAgilityPack.HtmlNode In myRows  
            Dim myCells As HtmlAgilityPack.HtmlNodeCollection = tmpRow.SelectNodes("td")  
            If myCells IsNot Nothing Then  
                Dim myToken As String = myCells(0).InnerText  
                Dim mySpans As HtmlAgilityPack.HtmlNodeCollection = myCells(1).SelectNodes("div[contains(@class,'select-list')]/span")  
                Dim myListBChain As New List(Of String)  
                For Each mySpan As HtmlAgilityPack.HtmlNode In mySpans  
                    myListBChain.Add(mySpan.InnerText)  
                Next  
                RichTextBox1.Text += String.Join(", ", myListBChain)  

            End If  
        Next  

which throw the error:

Object Reference not set to an instance of an object

on the line

For Each mySpan As HtmlAgilityPack.HtmlNode In mySpans

even if during debug, the compiler seems to be halfway to the final result like the following image

171557-senza-titolo.png

Thanks

VB
VB
An object-oriented programming language developed by Microsoft that is implemented on the .NET Framework. Previously known as Visual Basic .NET.
2,605 questions
0 comments No comments
{count} votes

Accepted answer
  1. Jose Zero 576 Reputation points
    2022-02-05T17:58:06.65+00:00

    Try html-agility-pack.net, you can get it in nuget
    Agree html is really weird, but with packet above and some work I believe you can get what you want
    For documentation


1 additional answer

Sort by: Most helpful
  1. Mattia Fanti 356 Reputation points
    2022-02-06T10:38:01.783+00:00

    I solved with

    Dim arqHtml As String = "C:\Users\Mattia\Desktop\ready.html"
    Dim myHtml As HtmlAgilityPack.HtmlDocument = New HtmlAgilityPack.HtmlDocument()
    myHtml.Load(arqHtml)
    Dim myTable As HtmlAgilityPack.HtmlNode = myHtml.DocumentNode.SelectSingleNode("//table")
    
    Dim myRows As HtmlAgilityPack.HtmlNodeCollection = myTable.SelectNodes("tr")
    For Each tmpRow As HtmlAgilityPack.HtmlNode In myRows
        Dim myCells As HtmlAgilityPack.HtmlNodeCollection = tmpRow.SelectNodes("td")
        If myCells IsNot Nothing Then
            Dim myToken As String = myCells(0).InnerText
            Dim mySpans As HtmlAgilityPack.HtmlNodeCollection = myCells(1).SelectNodes("div[contains(@class,'select-list')]/span")
            Dim chainText As String
    
            If mySpans Is Nothing Then
                Dim chainTextNode As HtmlAgilityPack.HtmlNode = myCells(1).SelectSingleNode(
                    "div[contains(@class, 'chain_box')]/span[contains(@class, 'chain_name')]"
                )
    
                chainText = If(chainTextNode Is Nothing OrElse String.IsNullOrWhiteSpace(chainTextNode.InnerText), "(unknown)", chainTextNode.InnerText)
            Else
                chainText = String.Join(", ", mySpans.Select(Function(span) span.InnerText))
                ' Alternative: chainText = String.Join(", ", From span In mySpans Select span.InnerText)
            End If
    
            RichTextBox1.Text &= $"{myToken} - {chainText}{Environment.NewLine}"
        End If
    Next