Share via


Return Data From HTML Table But Only Certain Fields

Question

Monday, December 10, 2018 2:43 PM

Hypothetical data below - let's say that my HTML table looks like below, how could I use C# to only return the first two columns of data for each row in the table?

<h4><span id="UserInfo"></span><span class="mw-headline" id="User Info">Information</span></h4>
<table>
<tbody><tr>
<th>User ID</th>
<th>User Name</th>
<th>Phone</th>
<th>State</th>
<th>Zip</th></tr>
<tr>
<td>abcd</td>
<td>alpha beta charlie delta</td>
<td>5555555555</td>
<td>NY</td>
<td>00000</td>
<tr>
<td>abc</td>
<td>alpha beta charlie</td>
<td>1111111111</td>
<td>NY</td>
<td>00000</td>
<tr>
<td>ab</td>
<td>alpha beta</td>
<td>2222222222</td>
<td>NY</td>
<td>00000</td>>
</tbody>
</table>

All replies (6)

Monday, December 10, 2018 2:52 PM

Hypothetical data below - let's say that my HTML table looks like below, how could I use C# to only return the first two columns of data for each row in the table?

<h4><span id="UserInfo"></span><span class="mw-headline" id="User Info">Information</span></h4>
<table>
<tbody><tr>
<th>User ID</th>
<th>User Name</th>
<th>Phone</th>
<th>State</th>
<th>Zip</th></tr>
<tr>
<td>abcd</td>
<td>alpha beta charlie delta</td>
<td>5555555555</td>
<td>NY</td>
<td>00000</td>
<tr>
<td>abc</td>
<td>alpha beta charlie</td>
<td>1111111111</td>
<td>NY</td>
<td>00000</td>
<tr>
<td>ab</td>
<td>alpha beta</td>
<td>2222222222</td>
<td>NY</td>
<td>00000</td>>
</tbody>
</table>

The question is too vague to answer.  You're showing HTML which is client code but asking about C# which runs on a web server.  

If this is a query question then can you tell us what data access you are using?  Are you using Entity Framework or ADO.NET?  Where does the data come from?

If you are asking how to render dynamic HTML from the server then we need to know what kind of application you are building; Web Forms, MVC, Razor Pages?

It is also possible to affect the HTML using JavaScript.


Monday, December 10, 2018 2:57 PM

This is how data is displaying on a web page when I view the page source.

I am wanting to use C# to "query" the page and return only the first two columns rom the table.


Monday, December 10, 2018 3:05 PM

This is how data is displaying on a web page when I view the page source.

I am wanting to use C# to "query" the page and return only the first two columns rom the table.

You have not answered any of the clarifying questions so I'm not sure how to provide assistance.  Is there anyway you can show the current C# code?


Monday, December 10, 2018 4:14 PM

If you want to use C# to parse HTML, you should look at the HtmlAgiltyPack or AngleSharp libraries: 


Monday, December 10, 2018 10:17 PM

Precisely, what is it that you cannot do?  Do you know how to get the text of the page into a c# string?  Do you know how to find elements within the html string?  Is the sample the result of a GET or a POST?  Do you know how to make the request?  Will the real html  have the sorts of syntax errors that your sample shows?  Be prepared for a lot of difficulty if that is the case. What data structure do you want the answer in?  

Show whatever code you have so far and tell us what result you want.

I generally use linq-to-xml (Eg Beth Massi article ) but, I believe, the agility pack is far superior at handling mal-formed html like your sample.


Tuesday, December 11, 2018 4:37 AM

Hi ManderinViolin12,

If you want to only show the first columns of the table using c#, you could use xml api.

Below is my code.

protected void Page_Load(object sender, EventArgs e)
        {
             //clear all the other content
            Response.Clear();
          
            XmlDocument file = new XmlDocument();
            //load the html file, please path your own path
            file.Load(Server.MapPath("/FileDemo/table.html"));
                      //get the tbody element
            XmlNode node= file.GetElementsByTagName("tbody")[0];
            
            //loop through all the tr elements
            foreach ( XmlNode tr in node.ChildNodes)
            {
                //remove the last three columns
                //because every time one element is removed,length of ChildNodes will be reduced by one, so 
                //the index is always 2 instead 2,3,4
                tr.RemoveChild(tr.ChildNodes[2]);

                tr.RemoveChild(tr.ChildNodes[2]);
                tr.RemoveChild(tr.ChildNodes[2]);
              
            }
           
             Response.Write(file.InnerXml);
            //remove other content
            Response.End();
       
        }

The result.

Best regards,

Ackerly Xu