How to: Query the Contents of Files in a Folder (LINQ)
This example shows how to query over all the files in a specified directory tree, open each file, and inspect its contents. This type of technique could be used to create indexes or reverse indexes of the contents of a directory tree. A simple string search is performed in this example. However, more complex types of pattern matching can be performed with a regular expression. For more information, see How to: Combine LINQ Queries with Regular Expressions.
Example
Module Module1
'QueryContents
Public Sub Main()
' Modify this path as necessary.
Dim startFolder = "c:\program files\Microsoft Visual Studio 9.0\VB\"
'Take a snapshot of the folder contents
Dim dir As New System.IO.DirectoryInfo(startFolder)
Dim fileList = dir.GetFiles("*.*", System.IO.SearchOption.AllDirectories)
Dim searchTerm = "Visual Studio"
' Search the contents of each file.
' A regular expression created with the RegEx class
' could be used instead of the Contains method.
Dim queryMatchingFiles = From file In fileList _
Where file.Extension = ".htm" _
Let fileText = GetFileText(file.FullName) _
Where fileText.Contains(searchTerm) _
Select file.FullName
Console.WriteLine("The term " & searchTerm & " was found in:")
' Execute the query.
For Each filename In queryMatchingFiles
Console.WriteLine(filename)
Next
' Keep the console window open in debug mode.
Console.WriteLine("Press any key to exit")
Console.ReadKey()
End Sub
' Read the contents of the file. This is done in a separate
' function in order to handle potential file system errors.
Function GetFileText(ByVal name As String) As String
' If the file has been deleted, the right thing
' to do in this case is return an empty string.
Dim fileContents = String.Empty
' If the file has been deleted since we took
' the snapshot, ignore it and return the empty string.
If System.IO.File.Exists(name) Then
fileContents = System.IO.File.ReadAllText(name)
End If
Return fileContents
End Function
End Module
class QueryContents
{
public static void Main()
{
// Modify this path as necessary.
string startFolder = @"c:\program files\Microsoft Visual Studio 9.0\";
// Take a snapshot of the file system.
System.IO.DirectoryInfo dir = new System.IO.DirectoryInfo(startFolder);
// This method assumes that the application has discovery permissions
// for all folders under the specified path.
IEnumerable<System.IO.FileInfo> fileList = dir.GetFiles("*.*", System.IO.SearchOption.AllDirectories);
string searchTerm = @"Visual Studio";
// Search the contents of each file.
// A regular expression created with the RegEx class
// could be used instead of the Contains method.
// queryMatchingFiles is an IEnumerable<string>.
var queryMatchingFiles =
from file in fileList
where file.Extension == ".htm"
let fileText = GetFileText(file.FullName)
where fileText.Contains(searchTerm)
select file.FullName;
// Execute the query.
Console.WriteLine("The term \"{0}\" was found in:", searchTerm);
foreach (string filename in queryMatchingFiles)
{
Console.WriteLine(filename);
}
// Keep the console window open in debug mode.
Console.WriteLine("Press any key to exit");
Console.ReadKey();
}
// Read the contents of the file.
static string GetFileText(string name)
{
string fileContents = String.Empty;
// If the file has been deleted since we took
// the snapshot, ignore it and return the empty string.
if (System.IO.File.Exists(name))
{
fileContents = System.IO.File.ReadAllText(name);
}
return fileContents;
}
}
Compiling the Code
Create a Visual Studio project that targets the .NET Framework version 3.5. By default, the project has a reference to System.Core.dll and a using directive (C#) or Imported namespace (Visual Basic) for the System.Linq namespace. In C# projects, add a using directive for the System.IO namespace.
Copy this code into your project.
Press F5 to compile and run the program.
Press any key to exit the console window.
Robust Programming
For intensive query operations over the contents of multiple types of documents and files, consider using the Windows Desktop Search engine.