Regex.Matches return no duplicates?

StewartBW 925 Reputation points
2024-07-22T18:50:26.8233333+00:00

Hello

Dim blah As MatchCollection = Regex.Matches(text, "([a-zA-Z0-9_-.]+)@([a-zA-Z0-9_-.]+).([a-zA-Z]{2,9})", RegexOptions.CultureInvariant Or RegexOptions.IgnoreCase Or RegexOptions.Multiline)

How may I force Regex.Matches not to return duplicate items, so MatchCollection will not contain duplicates (case insensitive)?

If not possible, remove duplicates from MatchCollection?

Efficiency and speed is crucial :(

Thanks :)

C#
C#
An object-oriented and type-safe programming language that has its roots in the C family of languages and includes support for component-oriented programming.
10,655 questions
VB
VB
An object-oriented programming language developed by Microsoft that is implemented on the .NET Framework. Previously known as Visual Basic .NET.
2,670 questions
0 comments No comments
{count} votes

Accepted answer
  1. Viorel 114.8K Reputation points
    2024-07-22T19:33:04.0666667+00:00

    If you are interested in a regular expression that excludes duplicates, try this: (?<m>([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,9}))(?!.*\<m>).

    It can be compared with DistinctBy:

    Dim matches = Regex.Matches(Text, "your original expression...", RegexOptions.CultureInvariant Or RegexOptions.IgnoreCase Or RegexOptions.Multiline).DistinctBy(Function(m) m.Value, StringComparer.CurrentCultureIgnoreCase)
    

    The experiments with typical data will show the fastest method.

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Marcin Policht 18,345 Reputation points MVP
    2024-07-22T19:29:08.5166667+00:00

    Try the following

    Dim matches As MatchCollection = Regex.Matches(text, "([a-zA-Z0-9_-.]+)@([a-zA-Z0-9_-.]+)\.([a-zA-Z]{2,9})", RegexOptions.CultureInvariant Or RegexOptions.IgnoreCase Or RegexOptions.Multiline)
    Dim uniqueMatches As New HashSet(Of String)(StringComparer.OrdinalIgnoreCase)
    For Each match As Match In matches
        If match.Success Then
            uniqueMatches.Add(match.Value)
        End If
    Next
    ' Now uniqueMatches contains only unique email addresses, case-insensitively
    
    

    If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.

    hth

    Marcin

    0 comments No comments