Is this even possible???

mrontz-the-dev 1 Reputation point
2021-07-14T01:27:32.977+00:00

I am currently stuck on a pretty significant issue, so much so that it seems even Google does not have an answer for it. Here is my situation. I have a program that takes an html file as input and converts the tags into AMP-valid format. For some reason, after conversion, it bunches up all the code onto a single line, so I have to go in, scroll to each tag, and press [enter] in order to move the tag onto a new line. My question is this, how the heck do I write a mini-script that can run after the conversion to do this one simple function? For the life of me, I can not figure it out. Someone please help!

Windows Server PowerShell
Windows Server PowerShell
Windows Server: A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.PowerShell: A family of Microsoft task automation and configuration management frameworks consisting of a command-line shell and associated scripting language.
5,504 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Rich Matheisen 46,551 Reputation points
    2021-07-14T02:12:56.42+00:00

    I'm assuming that "AMP format" is a modified form of HTML? If that's true, see if something like this works for you:

    $HTML = New-Object -Com "HTMLFile"
    $src = Get-Content c:\junk\x.html -Raw
    $HTML.IHTMLDocument2_write($src)
    $HTML.documentElement.outerHTML |
        Out-File c:\junk\NewX.html
    

    Note that using COM is known for being persnicketie, and the HTMLFile COM object uses (IIRC) the Internet Explorer HTML parser -- so be prepared for possible parsing problems!

    Another choice may be the HTMAgility package . . . it's not something I've used but it seems to be better than that COM stuff. Here's an example using PowerShell: html-agility-pack-rocks-your-screen-scraping-world

    0 comments No comments

  2. Ian Xue (Shanghai Wicresoft Co., Ltd.) 36,166 Reputation points Microsoft Vendor
    2021-07-14T03:39:04.757+00:00

    Hi,

    If it's an html file you can try this

    $input = "C:\temp\input.html"  
    $output = "C:\temp\output.html"  
    (Get-Content -Path $input) -replace "<(?!/)","`r`n<" | Out-File -FilePath $output  
    

    Best Regards,
    Ian Xue

    ============================================

    If the Answer is helpful, please click "Accept Answer" and upvote it.
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.