查询表达式基础

2025-02-05

本文介绍与 C# 中的查询表达式相关的基本概念。

什么是查询？它有什么作用？

查询是一组指令，描述要从给定数据源（或源）检索的数据以及返回的数据应具有的形状和组织。查询与它生成的结果不同。

通常情况下，源数据按逻辑方式组织为相同类型的元素的序列。例如，SQL 数据库表包含行的序列。在 XML 文件中，存在 XML 元素的“序列”（尽管 XML 元素在树结构按层次结构进行组织）。内存中集合包含对象的序列。

从应用程序的角度来看，原始源数据的特定类型和结构并不重要。应用程序始终将源数据视为 IEnumerable<T> 或 IQueryable<T> 集合。例如，在 LINQ to XML 中，源数据显示为 IEnumerable<XElement>。

对于此源序列，查询可能会执行三种操作之一：

检索元素的子集以生成新序列，而不修改各个元素。然后，查询可能以各种方式对返回的序列进行排序或分组，如下面的示例所示（假定 scores 是 int[]）：
```
IEnumerable<int> highScoresQuery =
    from score in scores
    where score > 80
    orderby score descending
    select score;
```
如前面的示例所示检索元素的序列，但是将它们转换为新类型的对象。例如，查询可能仅从数据源中的某些客户记录中检索出姓氏。或者可以检索完整记录，然后用于构造其他内存中对象类型甚至是 XML 数据，再生成最终的结果序列。下面的示例演示从 int 到 string 的投影。请注意 highScoresQuery 的新类型。
```
IEnumerable<string> highScoresQuery2 =
    from score in scores
    where score > 80
    orderby score descending
    select $"The score is {score}";
```
检索有关源数据的单独值，如：
- 与特定条件匹配的元素数。
- 具有最大或最小值的元素。
- 与某个条件匹配的第一个元素，或指定元素集中特定值的总和。例如，下面的查询从 scores 整数数组返回大于 80 的分数的数量：
```
var highScoreCount = (
    from score in scores
    where score > 80
    select score
).Count();
```
  在前面的示例中，请注意在调用 Enumerable.Count 方法之前，在查询表达式两边使用了括号。也可以通过使用新变量存储具体结果。
```
IEnumerable<int> highScoresQuery3 =
    from score in scores
    where score > 80
    select score;

var scoreCount = highScoresQuery3.Count();
```

在上面的示例中，查询在 Count 调用中执行，因为 Count 必须循环访问结果才能确定 highScoresQuery 返回的元素数。

查询表达式是什么？

查询表达式是以查询语法表示的查询。查询表达式是一流的语言构造。它如同任何其他表达式一样，可以在 C# 表达式有效的任何上下文中使用。查询表达式由一组用类似于 SQL 或 XQuery 的声明性语法所编写的子句组成。每个子句依次包含一个或多个 C# 表达式，而这些表达式可能本身就是查询表达式，或者包含查询表达式。

查询表达式必须以 from 子句开头，且必须以 select 或 group 子句结尾。在第一个 from 子句与最后一个 select 或 group 子句之间，可以包含以下这些可选子句中的一个或多个：where、orderby、join、let，甚至是其他 from 子句。还可以使用 into 关键字启用 join 或 group 子句的结果，作为同一查询表达式中更多查询子句的源。

查询变量

在 LINQ 中，查询变量是存储查询而不是查询结果的任何变量。更具体地说，查询变量始终是可枚举类型，在 foreach 语句或对其 IEnumerator.MoveNext() 方法的直接调用中循环访问时会生成元素序列。

注意

本文中的示例使用以下数据源和示例数据。

record City(string Name, long Population);
record Country(string Name, double Area, long Population, List<City> Cities);
record Product(string Name, string Category);

static readonly City[] cities = [
    new City("Tokyo", 37_833_000),
    new City("Delhi", 30_290_000),
    new City("Shanghai", 27_110_000),
    new City("São Paulo", 22_043_000),
    new City("Mumbai", 20_412_000),
    new City("Beijing", 20_384_000),
    new City("Cairo", 18_772_000),
    new City("Dhaka", 17_598_000),
    new City("Osaka", 19_281_000),
    new City("New York-Newark", 18_604_000),
    new City("Karachi", 16_094_000),
    new City("Chongqing", 15_872_000),
    new City("Istanbul", 15_029_000),
    new City("Buenos Aires", 15_024_000),
    new City("Kolkata", 14_850_000),
    new City("Lagos", 14_368_000),
    new City("Kinshasa", 14_342_000),
    new City("Manila", 13_923_000),
    new City("Rio de Janeiro", 13_374_000),
    new City("Tianjin", 13_215_000)
];

static readonly Country[] countries = [
    new Country ("Vatican City", 0.44, 526, [new City("Vatican City", 826)]),
    new Country ("Monaco", 2.02, 38_000, [new City("Monte Carlo", 38_000)]),
    new Country ("Nauru", 21, 10_900, [new City("Yaren", 1_100)]),
    new Country ("Tuvalu", 26, 11_600, [new City("Funafuti", 6_200)]),
    new Country ("San Marino", 61, 33_900, [new City("San Marino", 4_500)]),
    new Country ("Liechtenstein", 160, 38_000, [new City("Vaduz", 5_200)]),
    new Country ("Marshall Islands", 181, 58_000, [new City("Majuro", 28_000)]),
    new Country ("Saint Kitts & Nevis", 261, 53_000, [new City("Basseterre", 13_000)])
];

下面的代码示例演示一个简单查询表达式，它具有一个数据源、一个筛选子句、一个排序子句并且不转换源元素。 select 子句标志着查询的结束。

// Data source.
int[] scores = [90, 71, 82, 93, 75, 82];

// Query Expression.
IEnumerable<int> scoreQuery = //query variable
    from score in scores //required
    where score > 80 // optional
    orderby score descending // optional
    select score; //must end with select or group

// Execute the query to produce the results
foreach (var testScore in scoreQuery)
{
    Console.WriteLine(testScore);
}

// Output: 93 90 82 82

在前面的示例中，scoreQuery 是一个 查询变量， 有时仅被称作查询。查询变量不存储在 foreach 循环生成中的任何实际结果数据。并且当 foreach 语句执行时，查询结果不会通过查询变量 scoreQuery 返回。而是通过迭代变量 testScore 返回。可以在第二个 scoreQuery 循环中迭代 foreach 变量。只要既没有修改它，也没有修改数据源，便会生成相同结果。

查询变量可以存储采用查询语法、方法语法或是两者的组合进行表示的查询。在以下示例中，queryMajorCities 和 queryMajorCities2 都是查询变量：

City[] cities = [
    new City("Tokyo", 37_833_000),
    new City("Delhi", 30_290_000),
    new City("Shanghai", 27_110_000),
    new City("São Paulo", 22_043_000)
];

//Query syntax
IEnumerable<City> queryMajorCities =
    from city in cities
    where city.Population > 30_000_000
    select city;

// Execute the query to produce the results
foreach (City city in queryMajorCities)
{
    Console.WriteLine(city);
}

// Output:
// City { Name = Tokyo, Population = 37833000 }
// City { Name = Delhi, Population = 30290000 }

// Method-based syntax
IEnumerable<City> queryMajorCities2 = cities.Where(c => c.Population > 30_000_000);
// Execute the query to produce the results
foreach (City city in queryMajorCities2)
{
    Console.WriteLine(city);
}
// Output:
// City { Name = Tokyo, Population = 37833000 }
// City { Name = Delhi, Population = 30290000 }

另一方面，以下两个示例演示不是查询变量的变量（即使各自使用查询进行初始化）。它们不是查询变量，因为它们存储结果：

var highestScore = (
    from score in scores
    select score
).Max();

// or split the expression
IEnumerable<int> scoreQuery =
    from score in scores
    select score;

var highScore = scoreQuery.Max();
// the following returns the same result
highScore = scores.Max();

var largeCitiesList = (
    from country in countries
    from city in country.Cities
    where city.Population > 10000
    select city
).ToList();

// or split the expression
IEnumerable<City> largeCitiesQuery =
    from country in countries
    from city in country.Cities
    where city.Population > 10000
    select city;
var largeCitiesList2 = largeCitiesQuery.ToList();

查询变量的显式和隐式类型

本文档通常提供查询变量的显式类型以便显示查询变量与 select 子句之间的类型关系。但是，还可以使用 var 关键字指示编译器在编译时推断查询变量（或任何其他局部变量）的类型。例如，本文前面演示的查询示例也可以使用隐式类型化进行表示：

var queryCities =
    from city in cities
    where city.Population > 100000
    select city;

在前面的示例中，var 的使用是可选的。 queryCities 是隐式或显式类型的 IEnumerable<City>。

开始查询表达式

查询表达式必须以 from 子句开头。它指定数据源以及范围变量。范围变量表示遍历源序列时，源序列中的每个连续元素。范围变量基于数据源中元素的类型进行强类型化。在下面的示例中，因为 countries 是 Country 对象的数组，所以范围变量也类型化为 Country。因为范围变量是强类型，所以可以使用点运算符访问该类型的任何可用成员。

IEnumerable<Country> countryAreaQuery =
    from country in countries
    where country.Area > 20 //sq km
    select country;

范围变量一直处于范围中，直到查询使用分号或 continuation 子句退出。

查询表达式可能会包含多个 from 子句。在源序列中的每个元素本身是集合或包含集合时，可使用更多 from 子句。例如，假设具有 Country 对象的集合，其中每个对象都包含名为 City 的 Cities 对象集合。若要查询每个 City 中的 Country 对象，请使用两个 from 子句，如下所示：

IEnumerable<City> cityQuery =
    from country in countries
    from city in country.Cities
    where city.Population > 10000
    select city;

有关详细信息，请参阅 from 子句。

结束查询表达式

查询表达式必须以 group 子句或 select 子句结尾。

group 子句

使用 group 子句可生成按指定键组织的组的序列。键可以是任何数据类型。例如，以下查询会创建包含一个或多个 Country 对象，并且其关键值是数值为国家/地区名称首字母的 char 类型。

var queryCountryGroups =
    from country in countries
    group country by country.Name[0];

有关分组的详细信息，请参阅组语句。

select 子句

使用 select 子句可生成所有其他类型的序列。简单 select 子句只生成类型与数据源中包含的对象相同的对象的序列。在此示例中，数据源包含 Country 对象。 orderby 子句只按新顺序对元素进行排序，而 select 子句生成重新排序的 Country 对象的序列。

IEnumerable<Country> sortedQuery =
    from country in countries
    orderby country.Area
    select country;

select 子句可以用于将源数据转换为新类型的序列。此转换也称为投影。在下面的示例中，select 子句对只包含原始元素中的字段子集的匿名类型序列进行投影。新对象使用对象初始值设定项进行初始化。

var queryNameAndPop =
    from country in countries
    select new
    {
        Name = country.Name,
        Pop = country.Population
    };

因此，在此示例中，var 是必需的，因为查询会生成匿名类型。

有关使用 select 子句转换源数据的各种方法的更多信息，请参阅 select 子句。

使用“into”延续

可以在 into 或 select 子句中使用 group 关键字创建存储查询的临时标识符。如果在分组或选择操作之后必须对查询执行额外查询操作，则可以使用 into 子句。在下面的示例中，countries 按 1000 万范围，根据人口进行分组。创建这些组之后，更多子句会筛选出一些组，然后按升序对组进行排序。若要执行这些额外操作，需要由 countryGroup 表示的延续。

// percentileQuery is an IEnumerable<IGrouping<int, Country>>
var percentileQuery =
    from country in countries
    let percentile = (int)country.Population / 1_000
    group country by percentile into countryGroup
    where countryGroup.Key >= 20
    orderby countryGroup.Key
    select countryGroup;

// grouping is an IGrouping<int, Country>
foreach (var grouping in percentileQuery)
{
    Console.WriteLine(grouping.Key);
    foreach (var country in grouping)
    {
        Console.WriteLine(country.Name + ":" + country.Population);
    }
}

有关详细信息，请参阅 into。

筛选、排序和联接

在开头 from 子句与结尾 select 或 group 子句之间，所有其他子句（where、join、orderby、from、let）都是可选的。任何可选子句都可以在查询正文中使用零次或多次。

where 子句

使用 where 子句可基于一个或多个谓词表达式，从源数据中筛选出元素。以下示例中的 where 子句具有一个谓词及两个条件。

IEnumerable<City> queryCityPop =
    from city in cities
    where city.Population is < 15_000_000 and > 10_000_000
    select city;

有关详细信息，请参阅 where 子句。

orderby 子句

使用 orderby 子句可按升序或降序对结果进行排序。还可以指定次要排序顺序。下面的示例使用 country 属性对 Area 对象执行主要排序。然后使用 Population 属性执行次要排序。

IEnumerable<Country> querySortedCountries =
    from country in countries
    orderby country.Area, country.Population descending
    select country;

ascending 关键字是可选的；如果未指定任何顺序，则它是默认排序顺序。有关详细信息，请参阅 orderby 子句。

join 子句

使用 join 子句可基于每个元素中指定的键之间的相等比较，将一个数据源中的元素与另一个数据源中的元素进行关联和/或合并。在 LINQ 中，联接操作是对元素属于不同类型的对象序列执行。联接了两个序列之后，必须使用 select 或 group 语句指定要存储在输出序列中的元素。你还可以使用匿名类型将每个关联元素集中的属性合并为一个用于输出序列的新类型。下面的示例关联其 prod 属性与 Category 字符串数组中一个类别匹配的 categories 对象。筛选出其 Category 不与 categories 中的任何字符串匹配的产品。select 语句会投影其属性取自 cat 和 prod 的新类型。

var categoryQuery =
    from cat in categories
    join prod in products on cat equals prod.Category
    select new
    {
        Category = cat,
        Name = prod.Name
    };

还可以通过将 join 操作的结果存储到一个临时变量中，然后使用 into 关键字来执行组联接。有关详细信息，请参阅 join 子句。

let 子句

使用 let 子句可将表达式（如方法调用）的结果存储在新范围变量中。在下面的示例中，范围变量 firstName 存储 Split 返回的字符串数组的第一个元素。

string[] names = ["Svetlana Omelchenko", "Claire O'Donnell", "Sven Mortensen", "Cesar Garcia"];
IEnumerable<string> queryFirstNames =
    from name in names
    let firstName = name.Split(' ')[0]
    select firstName;

foreach (var s in queryFirstNames)
{
    Console.Write(s + " ");
}

//Output: Svetlana Claire Sven Cesar

有关详细信息，请参阅 let 子句。

查询表达式中的子查询

一个查询子句本身可能包含一个查询表达式，有时称为 子查询。每个子查询都以自己的 from 子句开头，该子句不一定指向第一个 from 子句中的相同数据源。例如，下面的查询演示在 select 语句用于检索分组操作结果的查询表达式。

var queryGroupMax =
    from student in students
    group student by student.Year into studentGroup
    select new
    {
        Level = studentGroup.Key,
        HighestScore = (
            from student2 in studentGroup
            select student2.ExamScores.Average()
        ).Max()
    };

有关详细信息，请参阅对分组操作执行子查询。

通过