Share via

Script not Working for large data in C#

Naji Afache 20 Reputation points
2026-06-17T22:46:52.22+00:00

The arrayGroup below will be containing almost 190,000 int arrays.

so filtering them to retrieve groups of arrays in the array group that have two or more common digits is taking forever.

I am using the linq expression below, any ideas to make this more efficient?

List<int[]> arrayGroup = new List<int[]>();

var filteredGroups = arrayGroup

.SelectMany(arr1 => arrayGroup

.Where(arr2 => arr1 != arr2) // Don't compare array to itself

.Select(arr2 => new

{

Source = arr1,

Target = arr2,

Common = arr1.Intersect(arr2).Distinct().Count()

})

)

.Where(match => match.Common >= 2)

.GroupBy(match => match.Source)

.Select(group => group.Key)

.ToList();

Developer technologies | C#
Developer technologies | C#

An object-oriented and type-safe programming language that has its roots in the C family of languages and includes support for component-oriented programming.

0 comments No comments

2 answers

Sort by: Most helpful
  1. Nancy Vo (WICLOUD CORPORATION) 5,865 Reputation points Microsoft External Staff Moderator
    2026-06-18T06:21:06.04+00:00

    Hello @Naji Afache ,

    Thanks for your question.

    The current code is O(n²) - with 190k items that's completely infeasible. Even if each array is tiny, you're doing ~36 trillion comparisons.

    Since you only care about ≥ 2 common elements, index every possible pair of numbers. Any two arrays that share the same pair automatically have intersection ≥ 2. This reduces the problem dramatically if your arrays are reasonably small.

    using System;
    using System.Collections.Generic;
    using System.Linq;
    
    public class ArrayGroupAnalyzer
    {
        public List<int[]> FindArraysWithSharedPairs(List<int[]> arrayGroup)
        {
            if (arrayGroup == null || arrayGroup.Count < 2)
                return new List<int[]>();
    
            var resultSet = new HashSet<int>();
    
            var pairIndex = new Dictionary<(int, int), List<int>>(capacity: arrayGroup.Count * 20);
    
            for (int i = 0; i < arrayGroup.Count; i++)
            {
                var arr = arrayGroup[i];
                if (arr == null || arr.Length < 2) continue;
    
                var seen = new HashSet<int>(arr);
                var sorted = seen.ToArray();
                Array.Sort(sorted);
    
                for (int a = 0; a < sorted.Length; a++)
                {
                    for (int b = a + 1; b < sorted.Length; b++)
                    {
                        var pair = (sorted[a], sorted[b]);
    
                        if (!pairIndex.TryGetValue(pair, out var list))
                        {
                            list = new List<int>(4);
                            pairIndex[pair] = list;
                        }
                        list.Add(i);
                    }
                }
            }
    
            foreach (var kvp in pairIndex)
            {
                var indices = kvp.Value;
                if (indices.Count >= 2)
                {
                    foreach (var idx in indices)
                    {
                        resultSet.Add(idx);
                    }
                }
            }
    
            return resultSet
                .OrderBy(idx => idx)
                .Select(idx => arrayGroup[idx])
                .ToList();
        }
    }
    

    I hope this addresses your question. If this response was helpful, please consider following the guidance to provide feedback.

    Was this answer helpful?

    0 comments No comments

  2. AI answer

    2026-06-17T22:47:10.6033333+00:00

    The current query compares every array with every other array and, for each pair, computes Intersect and Distinct().Count(). With ~190,000 arrays this is effectively an O(n²) algorithm and will be extremely slow regardless of LINQ syntax.

    To make it efficient, avoid pairwise comparison and instead index arrays by their digits.

    Example approach:

    1. Build a map from each digit to the arrays that contain it.
    2. For each array, count how many other arrays share at least two digits with it using that map.

    Sample code (no LINQ-heavy nested queries):

    List<int[]> arrayGroup = new List<int[]>();
    
    // Map: digit -> arrays containing that digit
    var digitIndex = new Dictionary<int, List<int[]>>();
    
    foreach (var arr in arrayGroup)
    {
        foreach (var d in arr.Distinct())
        {
            if (!digitIndex.TryGetValue(d, out var list))
            {
                list = new List<int[]>();
                digitIndex[d] = list;
            }
            list.Add(arr);
        }
    }
    
    var result = new List<int[]>();
    
    foreach (var source in arrayGroup)
    {
        var sharedCounts = new Dictionary<int[], int>(ReferenceEqualityComparer<int[]>.Instance);
    
        foreach (var d in source.Distinct())
        {
            if (!digitIndex.TryGetValue(d, out var list))
                continue;
    
            foreach (var target in list)
            {
                if (ReferenceEquals(source, target))
                    continue;
    
                if (!sharedCounts.TryGetValue(target, out var count))
                    count = 0;
    
                count++;
                sharedCounts[target] = count;
            }
        }
    
        // Check if any target shares at least 2 digits
        if (sharedCounts.Values.Any(c => c >= 2))
            result.Add(source);
    }
    

    ReferenceEqualityComparer<int[]> is a comparer that compares arrays by reference (so each array instance is a key). If needed, implement it as:

    public sealed class ReferenceEqualityComparer<T> : IEqualityComparer<T>
        where T : class
    {
        public static readonly ReferenceEqualityComparer<T> Instance = new();
    
        private ReferenceEqualityComparer() { }
    
        public bool Equals(T x, T y) => ReferenceEquals(x, y);
    
        public int GetHashCode(T obj) => RuntimeHelpers.GetHashCode(obj);
    }
    

    This approach avoids the full cross-product and uses the digit index to only touch arrays that actually share digits, which is far more scalable than the original nested SelectMany / Intersect query.


    References:

    AI-generated content may be incorrect. Read our transparency notes for more information.

    Was this answer helpful?

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.