Windows PowerShell: Text, XML and CSV—Oh My
Windows PowerShell has a variety of ways of dealing with text formatting, which is powerful but often confusing.
One occasionally confusing aspect of Windows PowerShell is the variety of ways in which it handles text formats. Here’s a quick quiz: What are the differences between these three commands?
Get-Service | Out-File services.txt Get-Service | Export-CSV services.csv Get-Service | Export-CliXML services.xml
Technically, all three of these commands will produce a text file. You can open any of these three files in Windows Notepad and read the contents. Two of those files, however, use text to present data in a delimited format. There’s additional structure within the file that helps separate the information you’ve created.
The following three commands are legal and will run without error:
Get-Content services.txt Get-Content services.csv Get-Content services.xml
Only one of those commands makes sense, however. The first is the only sensible one. It just reads in a plain, non-delimited text file. It isn’t non-formatted, because if you look at the file it’s clearly formatted.
It’s formatted in a human sense, though. The text is in a nice columnar table. The data is laid out in a way that makes sense to our eyes and brains. It’s not organized in a way that lets a computer easily identify the individual bits of data.
This is a concept that troubles many Windows PowerShell newcomers. There’s a simple mnemonic I use to help with this. If you got it out of the shell with an Export command, bring it back into the shell with an Import command. If you didn’t use Export, then don’t use Import. Import and Export are paired verbs. They should always go together. So if you did this:
Get-Service | Export-CSV services.csv
Then you’d also do this:
The Export-CSV command takes data—in this case the data is information about services—and writes that data to a file. The data’s individual bits are delimited by commas because that’s what the CSV format uses. That means the service names, statuses and so forth are all separated by commas. When you read the data with Import-CSV, Windows PowerShell parses those commas, re-separates the data and reconstructs the original data. Compare the results with the results of this command:
That command reads the text, but doesn’t do anything with it. The screen fills up with comma-separated values, but Windows PowerShell doesn’t actually separate the values for you. You’re just getting the raw, unparsed text, which is all Get-Content ever does. Technically, Get-Content produces one String object for each line of the text file. That’s useful in certain situations, but it’s not usually what you need to do with a CSV file.
This whole situation is analogous to something you’ve probably experienced in Windows as well. Open a CSV file in Notepad and what do you get? You see a bunch of text. That’s basically what Get-Content is doing. Now open that same CSV file in Excel, and what do you see? It breaks down the CSV file, separating the data into columns of the spreadsheet. This is very similar to what Import-CSV is doing.
Why is all of this important? Let’s say you create your own CSV file. Go ahead and use Notepad to create a file that looks something like this (poor Greg Shields never does well in these examples):
Name,SamAccountName,Department,Title,City DonJ,DonJ,IT,CTO,Las Vegas GregS,GregS,Custodial,Janitor,Denver
Save that file as Users.csv on your hard drive, and go to a Windows PowerShell prompt. Without actually running them, can you guess what each of these two commands will do?
Get-Content Users.csv | New-ADUser Import-CSV Users.csv | New-ADUser
Both of these assume you’ve already run Import-Module ActiveDirectory to load the Microsoft Active Directory module, which comes with Windows Server 2008 R2 and is available in the Windows 7 Remote Server Administration Tools package.
The first command won’t work. That’s because Get-Content isn’t parsing the file. It’s just sending a bunch of raw strings to New-ADUser. And New-ADUser won’t be able to do anything useful with them.
The second command will work, because the shell will parse the CSV file, break it down into individual pieces of data and send those to New-ADUser. Because the pieces of data—Name, samAccountName, Department, Title and City—are named the same as the corresponding parameters of New-ADUser, the second cmdlet will create two new users having the information from the CSV file.
This is the difference between working with plain text (Out-File and Get-Content) and delimited data (Import/Export). Using XML files (Export-CliXML and Import-CliXML) is even better, because they can represent more complex data structures like hierarchical data. CSV files are a flat format that can only represent a single level of data.
Wrapping your head around these differences is one of the most important things you can do in the shell. Being able to manipulate data in the pipeline is one of the things that makes Windows PowerShell so powerful. Spend some time to ensure you understand all the differences—both subtle and not so subtle.
Don Jones is a Microsoft MVP Award recipient and author of “Learn Windows PowerShell in a Month of Lunches” (Manning Publications, 2011), a book designed to help any administrator become effective with Windows PowerShell. Jones also offers public and on-site Windows PowerShell training. Contact him through ConcentratedTech.com or bit.ly/AskDon.