Cómo: Buscar frases que contengan un conjunto especificado de palabras (LINQ)
En este ejemplo se muestra cómo buscar las frases de un archivo de texto que contienen coincidencias para cada palabra de un conjunto de palabras especificado. Aunque la matriz de términos de búsqueda está incluida en el código en este ejemplo, también se puede llenar dinámicamente en tiempo de ejecución. En este ejemplo, la consulta devuelve las frases que contienen las palabras "Historically", "data" e "integrated".
Ejemplo
Class FindSentences
Shared Sub Main()
Dim text As String = "Historically, the world of data and the world of objects " &
"have not been well integrated. Programmers work in C# or Visual Basic " &
"and also in SQL or XQuery. On the one side are concepts such as classes, " &
"objects, fields, inheritance, and .NET Framework APIs. On the other side " &
"are tables, columns, rows, nodes, and separate languages for dealing with " &
"them. Data types often require translation between the two worlds; there are " &
"different standard functions. Because the object world has no notion of query, a " &
"query can only be represented as a string without compile-time type checking or " &
"IntelliSense support in the IDE. Transferring data from SQL tables or XML trees to " &
"objects in memory is often tedious and error-prone."
' Split the text block into an array of sentences.
Dim sentences As String() = text.Split(New Char() {".", "?", "!"})
' Define the search terms. This list could also be dynamically populated at runtime
Dim wordsToMatch As String() = {"Historically", "data", "integrated"}
' Find sentences that contain all the terms in the wordsToMatch array
' Note that the number of terms to match is not specified at compile time
Dim sentenceQuery = From sentence In sentences
Let w = sentence.Split(New Char() {" ", ",", ".", ";", ":"},
StringSplitOptions.RemoveEmptyEntries)
Where w.Distinct().Intersect(wordsToMatch).Count = wordsToMatch.Count()
Select sentence
' Execute the query
For Each str As String In sentenceQuery
Console.WriteLine(str)
Next
' Keep console window open in debug mode.
Console.WriteLine("Press any key to exit.")
Console.ReadKey()
End Sub
End Class
' Output:
' Historically, the world of data and the world of objects have not been well integrated
class FindSentences
{
static void Main()
{
string text = @"Historically, the world of data and the world of objects " +
@"have not been well integrated. Programmers work in C# or Visual Basic " +
@"and also in SQL or XQuery. On the one side are concepts such as classes, " +
@"objects, fields, inheritance, and .NET Framework APIs. On the other side " +
@"are tables, columns, rows, nodes, and separate languages for dealing with " +
@"them. Data types often require translation between the two worlds; there are " +
@"different standard functions. Because the object world has no notion of query, a " +
@"query can only be represented as a string without compile-time type checking or " +
@"IntelliSense support in the IDE. Transferring data from SQL tables or XML trees to " +
@"objects in memory is often tedious and error-prone.";
// Split the text block into an array of sentences.
string[] sentences = text.Split(new char[] { '.', '?', '!' });
// Define the search terms. This list could also be dynamically populated at runtime.
string[] wordsToMatch = { "Historically", "data", "integrated" };
// Find sentences that contain all the terms in the wordsToMatch array.
// Note that the number of terms to match is not specified at compile time.
var sentenceQuery = from sentence in sentences
let w = sentence.Split(new char[] { '.', '?', '!', ' ', ';', ':', ',' },
StringSplitOptions.RemoveEmptyEntries)
where w.Distinct().Intersect(wordsToMatch).Count() == wordsToMatch.Count()
select sentence;
// Execute the query. Note that you can explicitly type
// the iteration variable here even though sentenceQuery
// was implicitly typed.
foreach (string str in sentenceQuery)
{
Console.WriteLine(str);
}
// Keep the console window open in debug mode.
Console.WriteLine("Press any key to exit");
Console.ReadKey();
}
}
/* Output:
Historically, the world of data and the world of objects have not been well integrated
*/
En primer lugar, la consulta divide el texto en frases y después divide las frases en una matriz de cadenas que contienen cada palabra. Para cada una de estas matrices, el método Distinct quita todas las palabras repetidas y, a continuación, la consulta realiza una operación Intersect en la matriz de palabras y la matriz wordstoMatch. Si el recuento de la intersección es igual al recuento de la matriz wordsToMatch, se encontraron todas las palabras y se devuelve la frase original.
En la llamada a Split, los signos de puntuación se utilizan como separadores para quitarlos de la cadena. De lo contrario, podría obtener una cadena "Historically," que no coincidiría con "Historically" en la matriz wordsToMatch. Puede que sea necesario utilizar separadores adicionales, dependiendo de los tipos de puntuación que contenga el texto de origen.
Compilar el código
Cree un proyecto de Visual Studio dirigido a .NET Framework versión 3.5. De manera predeterminada, el proyecto incluye una referencia a System.Core.dll y una directiva using (C#) o una instrucción Imports (Visual Basic) para el espacio de nombres System.Linq. En los proyectos de C#, agregue una directiva using para el espacio de nombres System.IO.
Copie este código en el proyecto.
Presione F5 para compilar y ejecutar el programa.
Presione cualquier tecla para salir de la ventana de consola.