What wrong about this PowerShell code with REGEX?

Suzana Eree 811 Reputation points
2021-03-16T06:49:39.343+00:00

hi can someone tell me what about this PowerShell code?

Someone made this code for me. Basically you use a regular expression to change a few lines in an html file. I think it must be filled with something for it to work. Can anyone correct it, as to make it work fine?

So, basically, I have an html file with some tags such as <title> and I want to select everything between tags and make a replace with a regex formula. but something wrong.

$Content = Get-Content -Path $Path "c:\Users\Castel\Videos\Captures" -Filter "*.html"
Set-Content -Path $Path -Value $Content

# Get each page as a HTML file/ assign to a variable
 #$Htmltext = (Invoke-WebRequest -Uri $MainUrl)# This is the link to the webpage
 #$HtmlPage.content

foreach($Line in $HtmlPage)
  {
$GetTitle = [regex] "$RegexForTitle"
$PageTitle = $FindTitle.Match($HtmlPage)
$PageTitle.Captures[0].Value
$Title = $PageTitle.Captures[0].Value -ireplace '<[^\>]*>,'$1'  #This regex selects everything between tags and make a replace:
  } #end foreach file
Not Monitored
Not Monitored
Tag not monitored by Microsoft.
37,600 questions
Windows Server PowerShell
Windows Server PowerShell
Windows Server: A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.PowerShell: A family of Microsoft task automation and configuration management frameworks consisting of a command-line shell and associated scripting language.
5,451 questions
{count} votes

Accepted answer
  1. Suzana Eree 811 Reputation points
    2021-03-17T16:02:54.423+00:00

    use -Raw

    This is a solution !

    $path = 'c:\Folder1\file1.html'
    (Get-Content -Path $path -Raw) -replace '(<title>).*?(</title>)', '$1NEW is now there!$2' |
        Set-Content -Path $Path
    
    0 comments No comments

5 additional answers

Sort by: Most helpful
  1. Ian Xue (Shanghai Wicresoft Co., Ltd.) 34,111 Reputation points Microsoft Vendor
    2021-03-16T08:52:58.553+00:00

    Hi @Suzana Eree ,

    Do you want to replace titles between <title> and </title> with the two characters '$1'? If so, you can simply do it like this

    $path = 'D:\temp\file.html'  
    $Content = Get-Content -Path $path   
    $newContent =@()   
    $RegexForTitle = '(?<=title>).*(?=</title>)'  
    foreach($Line in $Content)  
    {  
        $newContent += $Line -replace $RegexForTitle,'$1'  
    }   
    Set-Content -Path $Path -Value $newContent  
    

    Best Regards,
    Ian Xue

    ============================================

    If the Answer is helpful, please click "Accept Answer" and upvote it.
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    0 comments No comments

  2. Suzana Eree 811 Reputation points
    2021-03-16T11:02:25.327+00:00

    yes, works.

    but, in other case, the replace of $1 does not working.

    suppose I have the following Search and Replace with Regex:

    FIND: (?<=title>).*(?=</title>)

    REPLACE BY $1 OTHER TITLE

    So, I want to change the title, the words between <title> .. </title> tags. Your $1 does not working, because it must not be seen, only OTHER TITLE

    Basically, it is a REGEX FORMULA replaced with OTHER REGEX FORMULA, NOT Regex Formula replace with A Word.


  3. Rich Matheisen 45,671 Reputation points
    2021-03-16T18:22:38.707+00:00

    In your last example there's no matching group. $1 would represent the 1st matching group. Add parentheses around the ".*".

    "<title>Something goes here</title>" -replace  '(?<=title>)(.*?)</title>', '$1 NEW is now there!'
    
    <title>Something goes here NEW is now there!
    

  4. Rich Matheisen 45,671 Reputation points
    2021-03-16T20:05:30.227+00:00

    There's really no place to put the code in your original example because the example needs to be fixed so it at least makes sense.

    However, taking your example as a small framework, this:

    $Content = "<bogus></bogus>", "<title>Something goes here</title>","<TheEnd>END</TheEnd>"
    
    foreach($Line in $Content){
        $Line -replace "<title>(.*?)</title>",'$1 NEW is now there!'  #This regex selects everything between tags and make a replace:
    }
    

    Or this:

    $Content = "<bogus></bogus>", "<title>Something goes here</title>","<TheEnd>END</TheEnd>"
    
    $GetTitle = [regex]"<title>(.*?)</title>"
    foreach($Line in $Content){
        $Line -replace $GetTitle,'$1 NEW is now there!'  #This regex selects everything between tags and make a replace:
    }
    

    Or this:

    $Content = "<bogus></bogus>", "<title>Something goes here</title>","<TheEnd>END</TheEnd>"
    
    $GetTitle = [regex]"<title>(.*?)</title>"
    foreach($Line in $Content){
        $GetTitle.Replace($Line, '$1 New is now there!')
    }
    

    Each produces:

    <bogus></bogus>
    Something goes here NEW is now there!
    <TheEnd>END</TheEnd>