Extract specific information from text file

bman 21 Reputation points
2022-05-18T16:24:36.793+00:00

Trying to retrieve new user information from a text file to create a domain user with the first name and last name NOT middle name, and email address from a text file. The file is actually an .eml file so there is more to this text file but this is just the bottom of file but it's format is always the same. I need to extract those items to variables $Fname, $Lname and $Eaddr. What I need to understand is how to first search for a specific line, in this case "BILLING ADDRESS" and then grab the line 2 lines down and put the first name and last name in the above variables. Email address is the same situation but keying on "Congratulations on the sale." and moving up. Can't just count from "BILLING ADDRESS" because there could be an additional address line like apt or suite. Also there could be a middle name in the name line so the script needs to work around that possibility like the 2nd address line.

----------------------------------------

BILLING ADDRESS

Joe Some Blow
123 Nowhere
Someplace, TX 75075
joeblow@chris chan .org

----------------------------------------

Congratulations on the sale.

----------------------------------------

Windows Server PowerShell
Windows Server PowerShell
Windows Server: A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.PowerShell: A family of Microsoft task automation and configuration management frameworks consisting of a command-line shell and associated scripting language.
5,319 questions
0 comments No comments
{count} votes

Accepted answer
  1. Rich Matheisen 44,416 Reputation points
    2022-05-19T19:26:53.643+00:00

    Here's one approach to extracting information from an e-mail address (it's version 2 of the code sample I posted yesterday):

    $m = select-string -Path c:\junk\billing.txt -Pattern "BILLING ADDRESS" -SimpleMatch -CaseSensitive -Context 0,20
    $GotIt = $false
    $m.Context.PostContext|
        ForEach-Object{
            if ($_ -match "Congratulations on the sale."){
                $GotIt = $true
            }
        }
    if ($GotIt){
        ####################
        #  Get the text name  (alternative 1)
        # $names = $m.Context.PostContext[1] -split " "   # ASSUMES! name is on this line
        # Switch ($names.count){
        #     2 {$Fname = $names[0]; $Lname = $names[1]; break}
        #     3 {$Fname = $names[0]; $Lname = $names[2]; break}
        #     4 {$Fname = $names[0]; $Lname = $names[3]; break} # maternal or paternal -- your choice -- maybe both if "Otto Von Bismark"! Good luck!
        # }
        ####################
    
        # Take first name from user part of email address (alternative 2)
        # Take last name as the domain name (minus the top-level domain name)
        $m.Context.PostContext|
            ForEach-Object{
                if ($_ -match "(.+@.+\..+)"){
                    $Eaddr = $matches[1].trim()
                }
            }
        if ($Eaddr){
            $Eaddr |
                ForEach-Object{
                    $parts = $_ -split "@"              # separate email user name from domain name
                    $Fname = $parts[0]                  # email user name may include periods (also other punctuation see RFC822)
                    $p2 = $parts[1] -split('\.')        # split the FQDN
                    $Lname = $p2[(0..($p2.count - 2))] -join "."    # use all but the top-level domain (e.g., edu, org, com, etc.)
                }
        }
        [PSCustomObject]@{
            Fname = $Fname
            Lname = $Lname
            Ename = $Eaddr
        }
    }
    

3 additional answers

Sort by: Most helpful
  1. Rich Matheisen 44,416 Reputation points
    2022-05-18T19:58:54.62+00:00

    An EML file isn't as simple as you think it is. :-)

    Here's a link to get you started: PowerShell-Parse-Eml-File

    Since the EML file may (most likely will!) be in plain-text, RTF, or HTML (or some combination of those) dealing with the raw file is ugly. But since almost all email contains MIME, and MIME want there to be a text representation at the beginning of the message body, using the TextBody property from the function in the link is probably all you need.

    Once you have the text, you can use the Select-String cmdlet with the -Context parameter to extract the data.

    And, yes, you can "count the lines" as long as you ask for more than you think you'll need.

    With the context in hand (it's an array) you can look for whatever pattern you need. For example, if the name is always the 2nd line after "BILLING ADDRESS" you can split that line using a space as a separator. Then take the 1st and last elements of the result as the first name and surname. It's not perfect, though. I see you have a "TX" as the state. More than likely you'll encounter Spanish naming convention problems with "first middle maternal paternal" patterns, and you won't know whether to use the maternal or paternal surname.


  2. bman 21 Reputation points
    2022-05-18T23:56:51.99+00:00

    203361-worky.gif

    We clear?


  3. bman 21 Reputation points
    2022-05-19T17:07:10.38+00:00

    WooHoo!

    $EaddrF = $Eaddr -replace "@.*"
    $EaddrL = $Eaddr.split('@')[1].split('.')[0]