c# Use xpath scriptlets to fetch strings

moondaddy 881 Reputation points
2021-02-11T21:47:10.687+00:00

Using VS 2019, .net 4.4 (and HtmlAgilityPack - optional, open to using anything).

I'm building a simple WPF app that will pull up a web page in a WebBrowser control( would use webView2 if I could get past it's bugs) and want to fetch some string from a page using xpath. However, the examples I've found using xpath also make use of many properties in c# making this lest flexible for dynamic XPath input. for example this works:

HtmlWeb web = new HtmlWeb();
//this could be any web page
HtmlDocument document = web.Load("https://www.yellowpages.com/search?search_terms=custom+software&geo_location_terms=Los+Angeles%2C+CA");
HtmlNode[] nodes = document.DocumentNode.SelectNodes("//h2 [@class='n']").ToArray();

string xPath1 = "//h2 [@class='n']";
HtmlNode nodex = document.DocumentNode.SelectNodes(xPath1).First();

string xPath2 = "(a)[1]//span";
string myNewString = nodex.SelectNodes(xPath2).First().InnerText;
Console.WriteLine(myNewString);

But I would like to fetch the same string in one XPath call and no c# properties after other than ToString() or maybe InnerText like this:

//String comes from user input in the UI
string xPath3 = "(//h2 [@class='n'])(a)[1]//span";
string myNewString = document.DocumentNode.SelectNodes(xPath3).InnerText;

This would allow my to enter an XPath into a UI and get back a string without having to modify any c#.

Any recommendations?

Thank you

C#
C#
An object-oriented and type-safe programming language that has its roots in the C family of languages and includes support for component-oriented programming.
9,444 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Xingyu Zhao-MSFT 5,351 Reputation points
    2021-02-12T04:06:09.517+00:00

    Hi @moondaddy ;
    You may need the following code to fetch strings:

                HtmlWeb web = new HtmlWeb();  
                HtmlDocument document = web.Load("https://www.yellowpages.com/search?search_terms=custom+software&geo_location_terms=Los+Angeles%2C+CA");  
      
                foreach (var nodes in document.DocumentNode.SelectNodes("//h2 [@class='n']"))  
                {  
                    var nodex = nodes.SelectNodes("(a)[1]//span");  
                    if (nodex != null)  
                    {  
                        foreach (var n in nodex)  
                        {  
                            Console.WriteLine(n.InnerText);  
                        }  
                    }               
                }  
    

    Hope it could be helpful.

    Best Regards,
    Xingyu Zhao
    *
    If the answer is helpful, please click "Accept Answer" and upvote it.
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    0 comments No comments

  2. moondaddy 881 Reputation points
    2021-02-12T04:19:08.78+00:00

    Thank you @Xingyu Zhao-MSFT . The problem with that solution is that custom c# code is written to achieve the results. So I would use c# to get the document html (the webpage) and then use the xpath expression after that.

    I'm looking for something that behaves more like the "XPath Helper" in google chrome where you can enter an XPath expression and get the exact text you want with no other coding needed. With this it can retrieve text from a single element or a string[] for a list of elements like multiple phone numbers for example.

    This would allow me to simply enter an xpath expression into a UI textbox and get the text I want.

    67166-image.png

    or

    67271-image.png

    and even complex xpath like this:
    //h6[text()='Alternate Business Name']/following-sibling::div1/ul/li

    0 comments No comments