Security Watch Where Is My PII?
Frank Simorjay
We all talk about PII (Personally Identifiable Information) being the most important information to protect. But before you can protect PII, you must thoroughly understand what PII you have collected on your PC. It's easy to say that everything on your computer is sensitive, but what do you really mean by everything?
To shed light on this, I started to look at the problem in a bit more detail, breaking down data types that may be sensitive and figuring out where the data may end up on your computer. First, just how sensitive information may be is often a personal judgment. For instance, some people feel threatened if their name shows up in a search result. Of course, unless you've been living under a rock, there is a good chance that someone has posted your name on the Internet in some form by now. To investigate, use your favorite search engine to search for your name online. Keep in mind that the more common your name, the harder it will be to find instances that refer specifically to you. And you might consider this a good thing.
If you are looking for yourself on the Internet, you might also want to check out some of the popular social networking sites, such as LinkedIn, Facebook, and YouTube. It is quite remarkable to see the Internet's ability to store and disperse private information that used to require diligent search efforts to uncover.
Knowing what information you need to protect is more of a science than it used to be. To help, I thought it would be interesting to see if your computer has potentially private information that you may not be aware of and that you may want to protect. While you might say that all personal information that can be used to steal your identity is sensitive, information can really be separated into two levels of detail. There is information that is readily available and information that is more private and generally considered critical to your personal identity.
Information that is readily available is not typically considered as PII. This includes your name and may also consist of your phone number, street address, e-mail address, gender, and in many cases your place of employment and some educational information. These items are readily available on the Internet and in public directories such as phone books. Disclosure of this information, such as accidentally allowing a spammer to pick up your e-mail address, can be annoying, but alone it would not lead to identity theft.
Sensitive information consists of more private data that provides a link to your identity. Data that you wouldn't want disclosed publicly includes your Social Security number (or other similar unique identifier provided by your government), bank account numbers, credit card numbers (particularly when accompanied by the expiration date and card member ID), your driver's license number, and your fingerprint (or other biometric-related information). When in the wrong hands, these items can be used in very damaging ways. It is important that you control where and how this information is recorded and stored, on the Internet and on your PC. To this end, I will now discuss a couple of simple methods to find any PII that may be stored on your system's hard drive.
Finding PII Data on Your Computer
PII information is scattered everywhere. In fact, if you were to go through your garbage, you would probably find some PII quite easily. Protecting this information requires diligence and a bit of care. I recommend that everyone invest in a good paper shredder and shred anything that has personal information on it.
But what about the PII lurking about on your PC? Finding this data can be as challenging as storing it securely. Windows Vista®, and several other desktop search tools, can help you find information on your system. But you need to know what information to look for.
To illustrate the problem, I'm using a couple of simple tools that will allow me to provide quick hands-on examples of what's at stake. I'm using scripts with Windows PowerShell®. Among the many things Windows PowerShell does, you'll find that it also provides excellent string-matching capabilities. For our purposes, I will be focusing on its ability to match regular expressions. Windows PowerShell (available at microsoft.com/powershell) is a powerful tool that has quickly become a standard for administrative tasks.
Additionally, I will use findstr.exe to provide a means to manage false positives, meaning the ability to ignore files that may contain strings that look interesting (due to the randomness of data strings in binary files) but are in fact of no interest here. In other words, non-text files can be ignored for this exercise.
I have selected two good PII data types: Social Security numbers and credit card information. This data should be easy to find if it is actually stored on your hard drive in clear text. The structure and pattern of both data types are unique enough to allow for a simple script to find the information. However, this data is also sensitive enough that I would ask why it needs to be stored on your PC. If you are inclined to store this information, you should ensure that it is protected. I'll cover ways to protect your PII in a moment. My discussion here is admittedly limited—there are other important PII data types that I haven't included here, such as user names and passwords.
Searching for a Social Security Number
Here is a simple string that will look for any information in files that consists of a standard U.S. Social Security number structured as XXX XX XXXX or XXX-XX-XXXX. Using Windows PowerShell, you can simply enter the following lines:
Get-ChildItem -rec -exclude *.exe,*.dll |
select-string " [0-9]{3}[-| ][0-9]{2}[-| ]
[0-9]{4}"
Or you can use findstr.exe to ensure that binary files are not read for the search using this:
Get-ChildItem -rec | ?{ findstr.exe
/mprc:. $_.FullName } | select-string
" [0-9]{3}[-| ][0-9]{2}[-| ][0-9]{4}"
In this sample, Get-ChildItem –rec conducts a recursive directory search of files that starts from the directory in which the command was executed. Findstr.exe searches for strings in files and Select-string is the Windows PowerShell string search function. (Findstr.exe provides similar functionality that I am not discussing here.) In addition, note that the leading space in the regular expression is deliberate. This helps to reduce false positives by eliminating unnecessary information, such as registry strings like HKLM\SOFTWARE\tool\XXX-XX-XXXX.
In my sample run, the search pattern returned a test sample file I put in a subdirectory, and it also found samples located in an XML file that outline file patterns for credit card and Social Security numbers (see Figure 1).
Figure 1 Results when searching for a number pattern (Click the image for a larger view)
I use the exclude capability in the first example to drop all .exe and .dll files since they can generate unnecessary noise. You may discover other file types that also cause false positives. If you do, you can use exclude to fine-tune the search process.
If you are searching only for a specific Social Security number, you can do the following (replacing "123 45 6789" with your Social Security number):
Get-ChildItem -rec | ?{ findstr.exe
/mprc:. $_.FullName } | select-string
"123 45 6789","123-45-6789"
The results of this search effort are shown in Figure 2.
Figure 2 Searching for a specific number (Click the image for a larger view)
Searching for Credit Card Information
Credit card information is a bit trickier since the formats vary. And I want to limit false positives (meaning results that look like a credit card number only by random chance). Nonetheless, the search will probably turn up some random sequences that are merely similar to credit card numbers.
I am using information that is provided in the essay "Anatomy of Credit Card Numbers" by Michael Gilleland as a reference when building these strings (seemerriampark.com/anatomycc.htm). For instance, my search string specifies that the first number must be a 4, 5, or 6 since this is defined as the major industry identifier of the credit card.
Here I have constructed simple strings that will search for Discover, MasterCard, and Visa cards. In Windows PowerShell, my search string looks like this:
Get-CchildItem -rec | ?{ findstr.exe
/mprc:. $_.FullName } | select-string
"[456][0-9]{15}","[456][0-9]{3}[-| ][0-9]{4}
[-| ][0-9]{4}[-| ][0-9]{4}"
In the sample shown in Figure 3, I used the exclude function to eliminate noise from .rtf, .rbl, and .h file types. Additionally, the sample code looks for credit card strings that have no spaces or dashes. This, unfortunately, may overload your display. So the following is an alternative command for the same function, but this one will not catch non-spaced or non-dashed card numbers:
Figure 3 Using exclude to eliminate noise from the results (Click the image for a larger view)
Get-ChildItem -rec | ?{ findstr.exe
/mprc:. $_.FullName } | select-string
"[456][0-9]{3}[-| ][0-9]{4}[-| ][0-9]{4}
[-| ][0-9]{4}"
Since American Express cards are considerably different, I have created a modified search string to locate that card's pattern. In Windows PowerShell, the search string looks like this:
Get-ChildItem -rec | ?{ findstr.exe
/mprc:. $_.FullName } | select-string
"3[47][0-9]{13}","3[47][0-9]{2}[-| ][0-9]{6}
[-| ][0-9]{5}"
Overload of data may affect this result also. This alternative command is the same function but will not catch non-spaced or non-dashed card numbers:
Get-childitem -rec | ?{ findstr.exe
/mprc:. $_.FullName } | select-string
"3[47][0-9]{2}[-| ][0-9]{6}[-| ][0-9]{5}"
When writing this column, I ran these searches on my own system and I was quite surprised to find several instances of my Social Security number saved in places where it should not have been stored. It turns out the information was located in a note I wrote a while ago and then forgot about. This made me rethink what I should and should not write down!
If you find that you do want to store this information but only in a safe way, try using a tool such as Password Safe (available at passwordsafe.sourceforge.net). Or encrypt your hard drive with a tool such as BitLockerTM Drive Encryption. Finally, the Data Encryption Toolkit for Mobile PCs provides tested guidance on protecting data on a mobile PC. These solutions will at least make it a bit more difficult for someone who happens to be trolling your PC for personal information.
Wrapping Up
Finding PII information is fairly simple. Being aware of the information is the tricky part. But keep in mind that a piece of malware or a malicious user who has gained access to your system can use similar discovery techniques to find information on your system just as easily. Be careful about when and where you enter PII information, and if you are inclined to store the information, be sure that you encrypt it.
I'd like to thank Matt Hainje for helping to troubleshoot my Windows PowerShell scripts.
Frank Simorjay is a Technical Program Manager for the Microsoft Solution Accelerator–Security and Compliance group. He designs security solutions for Microsoft customers, speaks at events such as the Secure World Exposition (of which he is a founder), provides security education and training, and has contributed to a variety of papers and books on security. His most recent work is the "Malware Removal Starter Kit."
© 2008 Microsoft Corporation and CMP Media, LLC. All rights reserved; reproduction in part or in whole without permission is prohibited.