Join Operations in LINQ

A join of two data sources is the association of objects in one data source with objects that share a common attribute in another data source.

Joining is an important operation in queries that target data sources whose relationships to each other can't be followed directly. In object-oriented programming, joining could mean a correlation between objects that isn't modeled, such as the backwards direction of a one-way relationship. An example of a one-way relationship is a Student class that has a property of type Department that represents the major, but the Department class doesn't have a property that is a collection of Student objects. If you have a list of Department objects and you want to find all the students in each department, you could use a join operation to find them.

The join methods provided in the LINQ framework are Join and GroupJoin. These methods perform equijoins, or joins that match two data sources based on equality of their keys. (For comparison, Transact-SQL supports join operators other than equals, for example the less than operator.) In relational database terms, Join implements an inner join, a type of join in which only those objects that have a match in the other data set are returned. The GroupJoin method has no direct equivalent in relational database terms, but it implements a superset of inner joins and left outer joins. A left outer join is a join that returns each element of the first (left) data source, even if it has no correlated elements in the other data source.

The following illustration shows a conceptual view of two sets and the elements within those sets that are included in either an inner join or a left outer join.

Two overlapping circles showing inner/outer.

Methods

Method Name Description C# Query Expression Syntax More Information
Join Joins two sequences based on key selector functions and extracts pairs of values. join … in … on … equals … Enumerable.Join

Queryable.Join
GroupJoin Joins two sequences based on key selector functions and groups the resulting matches for each element. join … in … on … equals … into … Enumerable.GroupJoin

Queryable.GroupJoin

The following examples in this article use the common data sources for this area:

public enum GradeLevel
{
    FirstYear = 1,
    SecondYear,
    ThirdYear,
    FourthYear
};

public class Student
{
    public required string FirstName { get; init; }
    public required string LastName { get; init; }
    public required int ID { get; init; }

    public required GradeLevel Year { get; init; }
    public required List<int> Scores { get; init; }

    public required int DepartmentID { get; init; }
}

public class Teacher
{
    public required string First { get; init; }
    public required string Last { get; init; }
    public required int ID { get; init; }
    public required string City { get; init; }
}
public class Department
{
    public required string Name { get; init; }
    public int ID { get; init; }

    public required int TeacherID { get; init; }
}

Each Student has a grade level, a primary department, and a series of scores. A Teacher also has a City property that identifies the campus where the teacher holds classes. A Department has a name, and a reference to a Teacher who serves as the department head.

The following example uses the join … in … on … equals … clause to join two sequences based on specific value:

var query = from student in students
            join department in departments on student.DepartmentID equals department.ID
            select new { Name = $"{student.FirstName} {student.LastName}", DepartmentName = department.Name };

foreach (var item in query)
{
    Console.WriteLine($"{item.Name} - {item.DepartmentName}");
}

The preceding query can be expressed using method syntax as shown in the following code:

var query = students.Join(departments,
    student => student.DepartmentID, department => department.ID,
    (student, department) => new { Name = $"{student.FirstName} {student.LastName}", DepartmentName = department.Name });

foreach (var item in query)
{
    Console.WriteLine($"{item.Name} - {item.DepartmentName}");
}

The following example uses the join … in … on … equals … into … clause to join two sequences based on specific value and groups the resulting matches for each element:

IEnumerable<IEnumerable<Student>> studentGroups = from department in departments
                    join student in students on department.ID equals student.DepartmentID into studentGroup
                    select studentGroup;

foreach (IEnumerable<Student> studentGroup in studentGroups)
{
    Console.WriteLine("Group");
    foreach (Student student in studentGroup)
    {
        Console.WriteLine($"  - {student.FirstName}, {student.LastName}");
    }
}

The preceding query can be expressed using method syntax as shown in the following example:

// Join department and student based on DepartmentId and grouping result
IEnumerable<IEnumerable<Student>> studentGroups = departments.GroupJoin(students,
    department => department.ID, student => student.DepartmentID,
    (department, studentGroup) => studentGroup);

foreach (IEnumerable<Student> studentGroup in studentGroups)
{
    Console.WriteLine("Group");
    foreach (Student student in studentGroup)
    {
        Console.WriteLine($"  - {student.FirstName}, {student.LastName}");
    }
}

Perform inner joins

In relational database terms, an inner join produces a result set in which each element of the first collection appears one time for every matching element in the second collection. If an element in the first collection has no matching elements, it doesn't appear in the result set. The Join method, which is called by the join clause in C#, implements an inner join. The following examples show you how to perform four variations of an inner join:

  • A simple inner join that correlates elements from two data sources based on a simple key.
  • An inner join that correlates elements from two data sources based on a composite key. A composite key, which is a key that consists of more than one value, enables you to correlate elements based on more than one property.
  • A multiple join in which successive join operations are appended to each other.
  • An inner join that is implemented by using a group join.

Single key join

The following example matches Teacher objects with Deparment objects whose TeacherId matches that Teacher. The select clause in C# defines how the resulting objects look. In the following example, the resulting objects are anonymous types that consist of the department name and the name of the teacher that leads the department.

var query = from department in departments
            join teacher in teachers on department.TeacherID equals teacher.ID
            select new
            {
                DepartmentName = department.Name,
                TeacherName = $"{teacher.First} {teacher.Last}"
            };

foreach (var departmentAndTeacher in query)
{
    Console.WriteLine($"{departmentAndTeacher.DepartmentName} is managed by {departmentAndTeacher.TeacherName}");
}

You achieve the same results using the Join method syntax:

var query = teachers
    .Join(departments, teacher => teacher.ID, department => department.TeacherID,
        (teacher, department) =>
        new { DepartmentName = department.Name, TeacherName = $"{teacher.First} {teacher.Last}" });

foreach (var departmentAndTeacher in query)
{
    Console.WriteLine($"{departmentAndTeacher.DepartmentName} is managed by {departmentAndTeacher.TeacherName}");
}

The teachers who aren't department heads don't appear in the final results.

Composite key join

Instead of correlating elements based on just one property, you can use a composite key to compare elements based on multiple properties. Specify the key selector function for each collection to return an anonymous type that consists of the properties you want to compare. If you label the properties, they must have the same label in each key's anonymous type. The properties must also appear in the same order.

The following example uses a list of Teacher objects and a list of Student objects to determine which teachers are also students. Both of these types have properties that represent the first and family name of each person. The functions that create the join keys from each list's elements return an anonymous type that consists of the properties. The join operation compares these composite keys for equality and returns pairs of objects from each list where both the first name and the family name match.

// Join the two data sources based on a composite key consisting of first and last name,
// to determine which employees are also students.
IEnumerable<string> query =
    from teacher in teachers
    join student in students on new
    {
        FirstName = teacher.First,
        LastName = teacher.Last
    } equals new
    {
        student.FirstName,
        student.LastName
    }
    select teacher.First + " " + teacher.Last;

string result = "The following people are both teachers and students:\r\n";
foreach (string name in query)
{
    result += $"{name}\r\n";
}
Console.Write(result);

You can use the Join method, as shown in the following example:

IEnumerable<string> query = teachers
    .Join(students,
        teacher => new { FirstName = teacher.First, LastName = teacher.Last },
        student => new { student.FirstName, student.LastName },
        (teacher, student) => $"{teacher.First} {teacher.Last}"
 );

Console.WriteLine("The following people are both teachers and students:");
foreach (string name in query)
{
    Console.WriteLine(name);
}

Multiple join

Any number of join operations can be appended to each other to perform a multiple join. Each join clause in C# correlates a specified data source with the results of the previous join.

The first join clause matches students and departments based on a Student object's DepartmentID matching a Department object's ID. It returns a sequence of anonymous types that contain the Student object and Department object.

The second join clause correlates the anonymous types returned by the first join with Teacher objects based on that teacher's ID matching the department head ID. It returns a sequence of anonymous types that contain the student's name, the department name, and the department leader's name. Because this operation is an inner join, only those objects from the first data source that have a match in the second data source are returned.

// The first join matches Department.ID and Student.DepartmentID from the list of students and
// departments, based on a common ID. The second join matches teachers who lead departments
// with the students studying in that department.
var query = from student in students
    join department in departments on student.DepartmentID equals department.ID
    join teacher in teachers on department.TeacherID equals teacher.ID
    select new {
        StudentName = $"{student.FirstName} {student.LastName}",
        DepartmentName = department.Name,
        TeacherName = $"{teacher.First} {teacher.Last}"
    };

foreach (var obj in query)
{
    Console.WriteLine($"""The student "{obj.StudentName}" studies in the department run by "{obj.TeacherName}".""");
}

The equivalent using multiple Join method uses the same approach with the anonymous type:

var query = students
    .Join(departments, student => student.DepartmentID, department => department.ID,
        (student, department) => new { student, department })
    .Join(teachers, commonDepartment => commonDepartment.department.TeacherID, teacher => teacher.ID,
        (commonDepartment, teacher) => new
        {
            StudentName = $"{commonDepartment.student.FirstName} {commonDepartment.student.LastName}",
            DepartmentName = commonDepartment.department.Name,
            TeacherName = $"{teacher.First} {teacher.Last}"
        });

foreach (var obj in query)
{
    Console.WriteLine($"""The student "{obj.StudentName}" studies in the department run by "{obj.TeacherName}".""");
}

Inner join by using grouped join

The following example shows you how to implement an inner join by using a group join. The list of Department objects is group-joined to the list of Student objects based on the Department.ID matching the Student.DepartmentID property. The group join creates a collection of intermediate groups, where each group consists of a Department object and a sequence of matching Student objects. The second from clause combines (or flattens) this sequence of sequences into one longer sequence. The select clause specifies the type of elements in the final sequence. That type is an anonymous type that consists of the student's name and the matching department name.

var query1 =
    from department in departments
    join student in students on department.ID equals student.DepartmentID into gj
    from subStudent in gj
    select new
    {
        DepartmentName = department.Name,
        StudentName = $"{subStudent.FirstName} {subStudent.LastName}"
    };
Console.WriteLine("Inner join using GroupJoin():");
foreach (var v in query1)
{
    Console.WriteLine($"{v.DepartmentName} - {v.StudentName}");
}

The same results can be achieved using GroupJoin method, as follows:

var queryMethod1 = departments
    .GroupJoin(students, department => department.ID, student => student.DepartmentID,
        (department, gj) => new { department, gj })
    .SelectMany(departmentAndStudent => departmentAndStudent.gj,
        (departmentAndStudent, subStudent) => new
        {
            DepartmentName = departmentAndStudent.department.Name,
            StudentName = $"{subStudent.FirstName} {subStudent.LastName}"
        });

Console.WriteLine("Inner join using GroupJoin():");
foreach (var v in queryMethod1)
{
    Console.WriteLine($"{v.DepartmentName} - {v.StudentName}");
}

The result is equivalent to the result set obtained by using the join clause without the into clause to perform an inner join. The following code demonstrates this equivalent query:

var query2 = from department in departments
    join student in students on department.ID equals student.DepartmentID
    select new
    {
        DepartmentName = department.Name,
        StudentName = $"{student.FirstName} {student.LastName}"
    };

Console.WriteLine("The equivalent operation using Join():");
foreach (var v in query2)
{
    Console.WriteLine($"{v.DepartmentName} - {v.StudentName}");
}

To avoid chaining, the single Join method can be used as presented here:

var queryMethod2 = departments.Join(students, departments => departments.ID, student => student.DepartmentID,
    (department, student) => new
    {
        DepartmentName = department.Name,
        StudentName = $"{student.FirstName} {student.LastName}"
    });

Console.WriteLine("The equivalent operation using Join():");
foreach (var v in queryMethod2)
{
    Console.WriteLine($"{v.DepartmentName} - {v.StudentName}");
}

Perform grouped joins

The group join is useful for producing hierarchical data structures. It pairs each element from the first collection with a set of correlated elements from the second collection.

Note

Each element of the first collection appears in the result set of a group join regardless of whether correlated elements are found in the second collection. In the case where no correlated elements are found, the sequence of correlated elements for that element is empty. The result selector therefore has access to every element of the first collection. This differs from the result selector in a non-group join, which cannot access elements from the first collection that have no match in the second collection.

Warning

Enumerable.GroupJoin has no direct equivalent in traditional relational database terms. However, this method does implement a superset of inner joins and left outer joins. Both of these operations can be written in terms of a grouped join. For more information, see Entity Framework Core, GroupJoin.

The first example in this article shows you how to perform a group join. The second example shows you how to use a group join to create XML elements.

Group join

The following example performs a group join of objects of type Department and Student based on the Deoartment.ID matching the Student.DepartmentID property. Unlike a non-group join, which produces a pair of elements for each match, the group join produces only one resulting object for each element of the first collection, which in this example is a Department object. The corresponding elements from the second collection, which in this example are Student objects, are grouped into a collection. Finally, the result selector function creates an anonymous type for each match that consists of Department.Name and a collection of Student objects.

var query = from department in departments
    join student in students on department.ID equals student.DepartmentID into studentGroup
    select new
    {
        DepartmentName = department.Name,
        Students = studentGroup
    };

foreach (var v in query)
{
    // Output the department's name.
    Console.WriteLine($"{v.DepartmentName}:");

    // Output each of the students in that department.
    foreach (Student? student in v.Students)
    {
        Console.WriteLine($"  {student.FirstName} {student.LastName}");
    }
}

In the above example, query variable contains the query that creates a list where each element is an anonymous type that contains the department's name and a collection of students that study in that department.

The equivalent query using method syntax is shown in the following code:

var query = departments.GroupJoin(students, department => department.ID, student => student.DepartmentID,
    (department, Students) => new { DepartmentName = department.Name, Students });

foreach (var v in query)
{
    // Output the department's name.
    Console.WriteLine($"{v.DepartmentName}:");

    // Output each of the students in that department.
    foreach (Student? student in v.Students)
    {
        Console.WriteLine($"  {student.FirstName} {student.LastName}");
    }
}

Group join to create XML

Group joins are ideal for creating XML by using LINQ to XML. The following example is similar to the previous example except that instead of creating anonymous types, the result selector function creates XML elements that represent the joined objects.

XElement departmentsAndStudents = new("DepartmentEnrollment",
    from department in departments
    join student in students on department.ID equals student.DepartmentID into studentGroup
    select new XElement("Department",
        new XAttribute("Name", department.Name),
        from student in studentGroup
        select new XElement("Student",
            new XAttribute("FirstName", student.FirstName),
            new XAttribute("LastName", student.LastName)
        )
    )
);

Console.WriteLine(departmentsAndStudents);

The equivalent query using method syntax is shown in the following code:

XElement departmentsAndStudents = new("DepartmentEnrollment",
    departments.GroupJoin(students, department => department.ID, student => student.DepartmentID,
        (department, Students) => new XElement("Department",
            new XAttribute("Name", department.Name),
            from student in Students
            select new XElement("Student",
                new XAttribute("FirstName", student.FirstName),
                new XAttribute("LastName", student.LastName)
            )
        )
    )
);

Console.WriteLine(departmentsAndStudents);

Perform left outer joins

A left outer join is a join in which each element of the first collection is returned, regardless of whether it has any correlated elements in the second collection. You can use LINQ to perform a left outer join by calling the DefaultIfEmpty method on the results of a group join.

The following example demonstrates how to use the DefaultIfEmpty method on the results of a group join to perform a left outer join.

The first step in producing a left outer join of two collections is to perform an inner join by using a group join. (See Perform inner joins for an explanation of this process.) In this example, the list of Department objects is inner-joined to the list of Student objects based on a Department object's ID that matches the student's DepartmentID.

The second step is to include each element of the first (left) collection in the result set even if that element has no matches in the right collection. This is accomplished by calling DefaultIfEmpty on each sequence of matching elements from the group join. In this example, DefaultIfEmpty is called on each sequence of matching Student objects. The method returns a collection that contains a single, default value if the sequence of matching Student objects is empty for any Department object, ensuring that each Department object is represented in the result collection.

Note

The default value for a reference type is null; therefore, the example checks for a null reference before accessing each element of each Student collection.

var query =
    from student in students
    join department in departments on student.DepartmentID equals department.ID into gj
    from subgroup in gj.DefaultIfEmpty()
    select new
    {
        student.FirstName,
        student.LastName,
        Department = subgroup?.Name ?? string.Empty
    };

foreach (var v in query)
{
    Console.WriteLine($"{v.FirstName:-15} {v.LastName:-15}: {v.Department}");
}

The equivalent query using method syntax is shown in the following code:

var query = students.GroupJoin(departments, student => student.DepartmentID, department => department.ID,
    (student, department) => new { student, subgroup = department.DefaultIfEmpty() })
    .Select(gj => new
    {
        gj.student.FirstName,
        gj.student.LastName,
        Department = gj.subgroup?.FirstOrDefault()?.Name ?? string.Empty
    });

foreach (var v in query)
{
    Console.WriteLine($"{v.FirstName:-15} {v.LastName:-15}: {v.Department}");
}

See also