Reformating malformed csv

Festus Hagen 41 Reputation points
2021-02-05T21:16:07.283+00:00

This site is so wanky I have no clue if is in the right place or even if I'll be able to find it again ... New and improved is not always better, and this new Q&A forum is a perfect example of that!

Hi all, hoping I can get some help.

I would like to make a malformed csv file into a proper csv so it can be imported into excel.

There is no header.
There are no quotes on columns with multi line data. (i.e. CR/LF in the data)
Multi line data is ended with a marker of: <end>
No quotes on hex data. (first and second column)

I do believe there are just three columns that need fixing ... Column1, Column2, and column9 (the last one and is marked with <end>).
There is also a blank line between records to be removed.

Windows 7, Powershell.

Two records of example data

12345,ABCD,0,2,2,"Part Description",Bla Bla,"-1,2,1,-40",

This is a test of the ability to cleanup the file

The files have thousands of lines



And multi lines un-quoted.

<end>

ABCDE,1234,0,2,2,"Description of the part",Simple text without CR/LF,"-20,-40,1,2",

There can be a variable number of lines

Just another line of data

There is a end marker on the un-quoted multi line column.

<end>
Windows Server PowerShell
Windows Server PowerShell
Windows Server: A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.PowerShell: A family of Microsoft task automation and configuration management frameworks consisting of a command-line shell and associated scripting language.
5,462 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Rich Matheisen 45,906 Reputation points
    2021-02-06T03:23:39.503+00:00

    It's a hack, but give this a try:

    $x = get-content c:\junk\badfile.txt -raw
    $y = $x -replace "`r`n",""
    [array]$z = $y -split "<end>"
    $z | Out-File C:\junk\ml.txt
    $c = import-csv c:\junk\ml.txt -header "h1","h2","h3","h4","h5","h6","h7","h8","h9"
    
    0 comments No comments

  2. Titan 206 Reputation points
    2021-02-06T07:00:22.393+00:00

    Hello @Festus Hagen !

    If you like to preserve the multiline column and assuming that there aren't any , character in that column, you could try this:

      $parentPath = Split-Path -Parent $PSCommandPath  
      $srcPath = Join-Path -Path $parentPath -ChildPath "malformed.csv"  
      $tmpPath = Join-Path -Path $parentPath -ChildPath "corrected.csv"  
      $dstPath = Join-Path -Path $parentPath -ChildPath "final.csv"  
      
      Get-Content -Path $srcPath -Delimiter '<end>' |  
      ForEach-Object {  
        $_ -replace '(?<=,)[^,]*$', '"$0"'  
      } | Set-Content -Path $tmpPath  
      
      Import-Csv -Path $tmpPath -Header $(1..9) | Export-Csv -Path $dstPath -Delimiter ';' -NoTypeInformation  
    

    Does this work for you?

    Best whishes

    ---
    Please accept and upvote this answer, if it was helpful

    0 comments No comments