How to Get Xpath the href for this url?

Hsieh, Iverson 65 Reputation points
2023-03-24T08:28:32.1733333+00:00

Hi All,

About https://www.adobe.com/devnet-docs/acrobatetk/tools/ReleaseNotesDC/index.html this url, I need to get the href in the topmost position. Since his href is in a different position every time, how can I fix the href in the top link and get the information I want?

My script:

Import-Module -ErrorAction Stop \cifstp01\TP_Specific\IT\Networking\PS_SCRIPT\PowerHTML

$res = Invoke-WebRequest -Uri "https://www.adobe.com/devnet-docs/acrobatetk/tools/ReleaseNotesDC/" -UseBasicParsing

$htm = ConvertFrom-Html -Content $res

$p = $htm.SelectNodes('/html/body/div/section/div[2]/div/div[2]/div/div/div[1]/div[1]/ul/li[1]/p/a')

The href in the top layer is as follows:

https://www.adobe.com/devnet-docs/acrobatetk/tools/ReleaseNotesDC/continuous/dccontinuousmar2023qfe.html#dccontinuousmartwentytwentythreeqfe

How can I grab the href in the top layer every time and automatically go in to get the information I want?

The href to the top of the system is different every time, I can only grab his version, but if the outer layer uses 2006x, and the version after the href is entered is 20064, so I have to get the href in the top layer and enter this href to get the actual link I want

2023-03-24_16-27-58

Windows Server PowerShell
Windows Server PowerShell
Windows Server: A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.PowerShell: A family of Microsoft task automation and configuration management frameworks consisting of a command-line shell and associated scripting language.
5,510 questions
PowerShell
PowerShell
A family of Microsoft task automation and configuration management frameworks consisting of a command-line shell and associated scripting language.
2,462 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Deleted

    This answer has been deleted due to a violation of our Code of Conduct. The answer was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.


    Comments have been turned off. Learn more

  2. Rich Matheisen 46,556 Reputation points
    2023-03-26T19:25:18.6333333+00:00

    If the href in the "Acrobat Enterprise Release Notes" section of the web page, under the heading "Continuous Track Installers" is always in a "different position" you're going to have to look for it. It isn't clear in your question if you're looking for a particular href by its position in the list, or by the value found in the href.

    If you, for instance, always want the 0-th (i.e., the first) href, then this will work (adjust the $PositionInList value if you want some other position). If you want to find the hrefs' value then you should be able to modify this code pretty easily to accomplish that.

    There quite a few assumptions in the code regarding the structure of the HTML, but that's always a problem when you're scraping a web page. If the author change, say, the contents of each list ("li") element, the script needs adjusting!

    Import-Module PowerHTML
    $url = "https://www.adobe.com/devnet-docs/acrobatetk/tools/ReleaseNotesDC/"
    # This is the value in $TheLink at the end of the script (at the time the script was written)
    #       https://www.adobe.com/devnet-docs/acrobatetk/tools/ReleaseNotesDC/continuous/dccontinuousmar2023qfe.html#dccontinuousmartwentytwentythreeqfe
     $response = Invoke-WebRequest $url
     $htmlDoc = ConvertFrom-Html -Content $response.Content
    
    $PositionInList = 0     # the position in the list of "Continuous Track Installers" you want to get
    $listCount = 0
    $Script:hrefValue = ""
    $a=$htmlDoc.SelectNodes('//div[@id="acrobat-enterprise-release-notes"]')
    $descendants = $a.DescendantNodes()
    
    $ulFound = $false
    ForEach ($item in $descendants){
            # search for "ul" element
            if (-NOT $ulFound -AND $item.NodeType -eq 'Element' -AND $item.Name -eq "ul"){
                $ulFound = $true
                Continue
            }
            if (-NOT $ulFound){
                # keep looking until the 1st 'ul' found after the 'div'
                Continue
            }
            if ($item.NodeType -eq "Element" -and $item.Name -eq "li"){             # found a list element
                if ($ListCount -eq $PositionInList){                                # is it in the position wanted?
                    $c = $item.ChildNodes                                           # get its childnodes (should only be 1)
                    if ($c.NodeType -eq "Element" -and $c.Name -eq "p"){            # is this child a "paragraph"?
                        $d = $c.ChildNodes                                              # should have a single child node
                        if ($d.NodeType -eq "Element" -and $d.Name -eq "a"){            # and it should be an "a" (anchor)
                            ForEach ($e in $d.Attributes){
                                if ($e.Name -eq "href"){
                                    $Script:hrefValue = $e.Value
                                    break
                                }   
                            }
                        }
                    }
                }
                $PositionInList++
                if ($hrefValue.length -gt 0){
                    break
                }
            }
        }
    $TheLink = $URL + $hrefvalue
    

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.