question

bman-8694 avatar image
0 Votes"
bman-8694 asked bman-8694 commented

Extract specific information from text file

Trying to retrieve new user information from a text file to create a domain user with the first name and last name NOT middle name, and email address from a text file. The file is actually an .eml file so there is more to this text file but this is just the bottom of file but it's format is always the same. I need to extract those items to variables $Fname, $Lname and $Eaddr. What I need to understand is how to first search for a specific line, in this case "BILLING ADDRESS" and then grab the line 2 lines down and put the first name and last name in the above variables. Email address is the same situation but keying on "Congratulations on the sale." and moving up. Can't just count from "BILLING ADDRESS" because there could be an additional address line like apt or suite. Also there could be a middle name in the name line so the script needs to work around that possibility like the 2nd address line.





BILLING ADDRESS

Joe Some Blow
123 Nowhere
Someplace, TX 75075
joeblow@nowhere.org




Congratulations on the sale.





windows-server-powershell
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

RichMatheisen-8856 avatar image
0 Votes"
RichMatheisen-8856 answered bman-8694 commented

Here's one approach to extracting information from an e-mail address (it's version 2 of the code sample I posted yesterday):

 $m = select-string -Path c:\junk\billing.txt -Pattern "BILLING ADDRESS" -SimpleMatch -CaseSensitive -Context 0,20
 $GotIt = $false
 $m.Context.PostContext|
     ForEach-Object{
         if ($_ -match "Congratulations on the sale."){
             $GotIt = $true
         }
     }
 if ($GotIt){
     ####################
     #  Get the text name  (alternative 1)
     # $names = $m.Context.PostContext[1] -split " "   # ASSUMES! name is on this line
     # Switch ($names.count){
     #     2 {$Fname = $names[0]; $Lname = $names[1]; break}
     #     3 {$Fname = $names[0]; $Lname = $names[2]; break}
     #     4 {$Fname = $names[0]; $Lname = $names[3]; break} # maternal or paternal -- your choice -- maybe both if "Otto Von Bismark"! Good luck!
     # }
     ####################
    
     # Take first name from user part of email address (alternative 2)
     # Take last name as the domain name (minus the top-level domain name)
     $m.Context.PostContext|
         ForEach-Object{
             if ($_ -match "(.+@.+\..+)"){
                 $Eaddr = $matches[1].trim()
             }
         }
     if ($Eaddr){
         $Eaddr |
             ForEach-Object{
                 $parts = $_ -split "@"              # separate email user name from domain name
                 $Fname = $parts[0]                  # email user name may include periods (also other punctuation see RFC822)
                 $p2 = $parts[1] -split('\.')        # split the FQDN
                 $Lname = $p2[(0..($p2.count - 2))] -join "."    # use all but the top-level domain (e.g., edu, org, com, etc.)
             }
     }
     [PSCustomObject]@{
         Fname = $Fname
         Lname = $Lname
         Ename = $Eaddr
     }
 }
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

If I strip and cat the email address into a username, then this new way would make a more precise but gnarly username. The old way would be less precise and not so gnarly.
So just for laughs I tested it out and got....
```
Fname Lname Ename


joeblow nowhere.anywhere.wish.you.were.here joeblow@nowhere.anywhere.wish.you.were.here.org
```
I suddenly feel the need for some Pink Floyd. ;)

Thanks. You have been very helpful.

0 Votes 0 ·
RichMatheisen-8856 avatar image
0 Votes"
RichMatheisen-8856 answered bman-8694 commented

An EML file isn't as simple as you think it is. :-)

Here's a link to get you started: PowerShell-Parse-Eml-File

Since the EML file may (most likely will!) be in plain-text, RTF, or HTML (or some combination of those) dealing with the raw file is ugly. But since almost all email contains MIME, and MIME want there to be a text representation at the beginning of the message body, using the TextBody property from the function in the link is probably all you need.

Once you have the text, you can use the Select-String cmdlet with the -Context parameter to extract the data.

And, yes, you can "count the lines" as long as you ask for more than you think you'll need.

With the context in hand (it's an array) you can look for whatever pattern you need. For example, if the name is always the 2nd line after "BILLING ADDRESS" you can split that line using a space as a separator. Then take the 1st and last elements of the result as the first name and surname. It's not perfect, though. I see you have a "TX" as the state. More than likely you'll encounter Spanish naming convention problems with "first middle maternal paternal" patterns, and you won't know whether to use the maternal or paternal surname.



· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

I have tried a lot of Select-String attempts but nothing rings the bell.

$PREaddr = (Select-String -Path "$path\*.eml" -Pattern 'BILLING ADDRESS' -CaseSensitive -Context 0, 7) | Select-Object -Skip 3
Select-String -Path "$path\*.eml" -Pattern 'BILLING ADDRESS' -CaseSensitive -Context 0, 2 | select-object Line | ft -HideTableHeaders
Select-String -Path "$path\*.eml" -Pattern 'Congratulations' -CaseSensitive -Context 5, 0 | select-object Line | ft -HideTableHeaders
Select-String -Path "$path\*.eml" -Pattern 'BILLING ADDRESS' -CaseSensitive -Context 0, 2 | select-object -Skip 1


I have manipulated these lines in a lot of different ways but can't seem to get the outcome I'm after. I don't think the maternal or paternal will play a roll if the the username is attacked from the left then the right stopping at the first space in both iterations. Seems like it would be simple for someone who knows what their doing and I can only dream.

0 Votes 0 ·

Forgot. The .eml is specifically being sent as text only.

0 Votes 0 ·
bman-8694 avatar image
0 Votes"
bman-8694 answered bman-8694 published

203361-worky.gif




We clear?


worky.gif (260.8 KiB)
· 5
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Can you attach a sample EML file?

This is an example that uses just a plain text file as input:

 $m = select-string -Path c:\junk\billing.txt -Pattern "BILLING ADDRESS" -SimpleMatch -CaseSensitive -Context 0,20
 $GotIt = $false
 $m.Context.PostContext|
     ForEach-Object{
         if ($_ -match "Congratulations on the sale."){
             $GotIt = $true
         }
     }
 if ($GotIt){
     $names = $m.Context.PostContext[1] -split " "   # ASSUMES! name is on this line
     Switch ($names.count){
         2 {$Fname = $names[0]; $Lname = $names[1]; break}
         3 {$Fname = $names[0]; $Lname = $names[2]; break}
         4 {$Fname = $names[0]; $Lname = $names[3]; break} # maternal or paternal -- your choice -- maybe both if "Otto Von Bismark"! Good luck!
     }
     $m.Context.PostContext|
         ForEach-Object{
             if ($_ -match "(.+@.+\..+)"){
                 $Eaddr = $matches[1].trim()
             }
         }
     [PSCustomObject]@{
         Fname = $Fname
         Lname = $Lname
         Ename = $Eaddr
     }
 }

This the file content:

 BILLING ADDRESS
    
 Joe Some Blow
 123 Nowhere
 Someplace, TX 75075
 joeblow@nowhere.org
    
    
    
 Congratulations on the sale.

And this is the output:

 Fname Lname Ename
 ----- ----- -----
 Joe   Blow  joeblow@nowhere.org


0 Votes 0 ·
bman-8694 avatar image bman-8694 RichMatheisen-8856 ·

You nailed it! I never had a chance at getting this. Problem now is I have been thinking about the paternal maternal issue and I think I'm more concerned about John Smith than Otto Von Bismark. I think that maybe the username should be a stripped-down version of the email address. I still need this info to pipe into a csv for cross reference for troubleshooting for when things go wrong. Can you strip out the @ and .* and put the 2 remnants into their own variable's or leave me with a hint on how to?

0 Votes 0 ·

This is disappointing.

PS C:\Users\dcs1> $Eaddr
joeblow@nowhere.org
PS C:\Users\dcs1> $Eaddr1 = $Eaddr.TrimEnd("@")
PS C:\Users\dcs1> $Eaddr1
joeblow@nowhere.org
PS C:\Users\dcs1> $Eaddr1 = $Eaddr.TrimEnd('@')
PS C:\Users\dcs1> $Eaddr1
joeblow@nowhere.org
PS C:\Users\dcs1> $Eaddr1 = $Eaddr.Trim('@')
PS C:\Users\dcs1> $Eaddr1
joeblow@nowhere.org
PS C:\Users\dcs1> $Eaddr1 = $Eaddr.TrimEnd('.')
PS C:\Users\dcs1> $Eaddr1
joeblow@nowhere.org
PS C:\Users\dcs1> $Eaddr1 = $Eaddr.TrimEnd(".")
PS C:\Users\dcs1> $Eaddr1
joeblow@nowhere.org

0 Votes 0 ·
bman-8694 avatar image bman-8694 RichMatheisen-8856 ·

This is disappointing.

PS C:\Users\dcs1> $Eaddr
joeblow@nowhere.org
PS C:\Users\dcs1> $Eaddr1 = $Eaddr.TrimEnd("@")
PS C:\Users\dcs1> $Eaddr1
joeblow@nowhere.org
PS C:\Users\dcs1> $Eaddr1 = $Eaddr.TrimEnd('@')
PS C:\Users\dcs1> $Eaddr1
joeblow@nowhere.org
PS C:\Users\dcs1> $Eaddr1 = $Eaddr.Trim('@')
PS C:\Users\dcs1> $Eaddr1
joeblow@nowhere.org
PS C:\Users\dcs1> $Eaddr1 = $Eaddr.TrimEnd('.')
PS C:\Users\dcs1> $Eaddr1
joeblow@nowhere.org
PS C:\Users\dcs1> $Eaddr1 = $Eaddr.TrimEnd(".")
PS C:\Users\dcs1> $Eaddr1
joeblow@nowhere.org

0 Votes 0 ·
bman-8694 avatar image bman-8694 RichMatheisen-8856 ·

Thanks for your help!

0 Votes 0 ·
bman-8694 avatar image
0 Votes"
bman-8694 answered RichMatheisen-8856 commented

WooHoo!

$EaddrF = $Eaddr -replace "@.*"
$EaddrL = $Eaddr.split('@')[1].split('.')[0]


· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

So, what happens if there are two names in the email addresses FQDN? Do you take just the first? All but the last? The first two?

E.g., JoeBlow@cityhall.mytown.me.us

Extracting information from unformatted data is an art, not a science! :-)

0 Votes 0 ·