Is this even possible???

mrontz-the-dev 1 Reputation point
2021-07-14T01:27:32.977+00:00

I am currently stuck on a pretty significant issue, so much so that it seems even Google does not have an answer for it. Here is my situation. I have a program that takes an html file as input and converts the tags into AMP-valid format. For some reason, after conversion, it bunches up all the code onto a single line, so I have to go in, scroll to each tag, and press [enter] in order to move the tag onto a new line. My question is this, how the heck do I write a mini-script that can run after the conversion to do this one simple function? For the life of me, I can not figure it out. Someone please help!

Windows Server PowerShell
Windows Server PowerShell
Windows Server: A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.PowerShell: A family of Microsoft task automation and configuration management frameworks consisting of a command-line shell and associated scripting language.
4,766 questions
No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Rich Matheisen 36,481 Reputation points
    2021-07-14T02:12:56.42+00:00

    I'm assuming that "AMP format" is a modified form of HTML? If that's true, see if something like this works for you:

    $HTML = New-Object -Com "HTMLFile"
    $src = Get-Content c:\junk\x.html -Raw
    $HTML.IHTMLDocument2_write($src)
    $HTML.documentElement.outerHTML |
        Out-File c:\junk\NewX.html
    

    Note that using COM is known for being persnicketie, and the HTMLFile COM object uses (IIRC) the Internet Explorer HTML parser -- so be prepared for possible parsing problems!

    Another choice may be the HTMAgility package . . . it's not something I've used but it seems to be better than that COM stuff. Here's an example using PowerShell: html-agility-pack-rocks-your-screen-scraping-world

  2. Ian Xue (Shanghai Wicresoft Co., Ltd.) 18,846 Reputation points Microsoft Vendor
    2021-07-14T03:39:04.757+00:00

    Hi,

    If it's an html file you can try this

    $input = "C:\temp\input.html"  
    $output = "C:\temp\output.html"  
    (Get-Content -Path $input) -replace "<(?!/)","`r`n<" | Out-File -FilePath $output  
    

    Best Regards,
    Ian Xue

    ============================================

    If the Answer is helpful, please click "Accept Answer" and upvote it.
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.