Why can't I use Invoke-WebRequest twice?

Hsieh, Iverson 65 Reputation points
2023-03-25T04:01:16.9866667+00:00

Hi All,

I need to rely on the first Invoke-WebRequest to piece together a URL for the second Invoke-WebRequest, but the second time is forcibly terminated.

My Script:

Import-Module -ErrorAction Stop \cifstp01\TP_Specific\IT\Networking\PS_SCRIPT\PowerHTML

$url = "https://www.adobe.com/devnet-docs/acrobatetk/tools/ReleaseNotesDC/"

$response = Invoke-WebRequest $url

$link = $response.Links | Where-Object { $_.OuterHtml -like "continuous/dccontinuous" } | Select-Object -First 1

$link.href

$linkhref = $url + $link.href

$url2 = "$linkhref"

$response2 = Invoke-WebRequest $url2

2023-03-25_12-00-53

Windows for business Windows Server User experience PowerShell
0 comments No comments
{count} votes

Accepted answer
  1. MotoX80 36,291 Reputation points
    2024-05-03T18:00:27.4333333+00:00

    can't get any information for adobe?

    It appears that you have to pass headers on the call.

    See https://github.com/PowerShell/PowerShell/issues/17499

    Invoke-WebRequest -UseBasicParsing -Uri "https://www.adobe.com/devnet-docs/acrobatetk/tools/ReleaseNotesDC/" -Headers @{  "accept"="text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9"
      "accept-encoding"="gzip, deflate, br"
      "accept-language"="en-GB,en;q=0.9,en-US;q=0.8"
    }
    
    
    
    1 person found this answer helpful.

5 additional answers

Sort by: Most helpful
  1. Erkan Sahin 840 Reputation points
    2023-03-25T12:32:34.8333333+00:00

    It's possible that the URL you're constructing in $linkhref is not valid or is being malformed, which could be causing the second Invoke-WebRequest to fail. You can try printing out the value of $linkhref to make sure it is a valid URL.

    You can also try using the -UseBasicParsing parameter with Invoke-WebRequest to avoid loading the HTML document into a full Internet Explorer browser object, which can sometimes cause issues with complex HTML pages. Try modifying your code like this:

    $url = "https://www.adobe.com/devnet-docs/acrobatetk/tools/ReleaseNotesDC/"
    $response = Invoke-WebRequest $url
    $link = $response.Links | Where-Object { $_.OuterHtml -like "continuous/dccontinuous" } | Select-Object -First 1
    $linkhref = $url + $link.href
    $url2 = "$linkhref"
    $response2 = Invoke-WebRequest $url2 -UseBasicParsing
    
    

  2. MotoX80 36,291 Reputation points
    2023-03-25T20:58:20.8933333+00:00

    This appears (but not confirmed) to be a self-defense mechanism of Adobe.com to prevent bots from crawling their web site.

    This Powershell script tries different methods of calling a site works just fine with Bing.com, but fails with Adobe.com.

    cls
    "Use this command to watch sockets"
    "netstat -aon | Select-String $pid"
    "" 
    $url = "https://www.bing.com/"
    #$url = "https://www.adobe.com/"
    Write-Host "Testing $url" -BackgroundColor Yellow -ForegroundColor Blue
    $response = Invoke-WebRequest $url -verbose -debug -UseBasicParsing -SessionVariable Sess -TimeoutSec 10 
    $response.StatusCode
    $response.Content.Length
    netstat -aon | Select-String $pid
    read-host "End of call 1"
    $response = Invoke-WebRequest $url -verbose  -UseBasicParsing  -WebSession $Sess -TimeoutSec 10
    $response.StatusCode
    $response.Content.Length
    netstat -aon | Select-String $pid
    read-host "End of call 2"
    $response = Invoke-WebRequest $url -verbose  -UseBasicParsing  -WebSession $Sess -TimeoutSec 10
    $response.StatusCode
    $response.Content.Length
    netstat -aon | Select-String $pid
    read-host "Done with websession tests."
    Write-Host  "Using -DisableKeepAlive" -ForegroundColor Red
    $response = Invoke-WebRequest $url -verbose  -UseBasicParsing -DisableKeepAlive  -TimeoutSec 10
    $response.StatusCode
    $response.Content.Length
    netstat -aon | Select-String $pid
    read-host "End of call 1"
    $response = Invoke-WebRequest $url  -verbose -UseBasicParsing  -DisableKeepAlive  -TimeoutSec 10
    $response.StatusCode
    $response.Content.Length
    netstat -aon | Select-String $pid
    read-host "End of call 2"
    Write-Host "-DisableKeepAlive removed" -ForegroundColor Red
    $response = Invoke-WebRequest $url  -verbose -UseBasicParsing  -TimeoutSec 10
    $response.StatusCode
    netstat -aon | Select-String $pid
    read-host "Done"
    

    I tried experimenting with a delay to see if I could get the socket connections on both client and server to timeout, but that was not successful.

    Next I tried curl.exe instead of Invoke-WebRequest and I got back a response of "OK Bot.". That response is what leads me to conclude that Adobe is blocking your request.

    User's image

    You might want to look for an alternative method or contact Adobe to find out what their preferred solution is for customers.


  3. Hsieh, Iverson 65 Reputation points
    2023-03-27T02:56:54.4133333+00:00

    Is there a way to use the first XPath to get the latest adobe download link inside?


  4. Hsieh, Iverson 65 Reputation points
    2023-03-27T03:23:32.64+00:00

    2023-03-27_11-23-03

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.