How to get duplicate values in one excel coloumn and count using c# and excel interoperbility

RAMAN RAGAVAN 51 Reputation points
2021-09-03T18:41:01.707+00:00

hi there, I have to get count of repeated values in a particular column dynamically using c#

lets say like

coloumn A
........................
MOUSE
CAT
CAT
TIGER
CAT

here i need to select each column cell value and find how many times that value occurs

C#
C#
An object-oriented and type-safe programming language that has its roots in the C family of languages and includes support for component-oriented programming.
10,479 questions
Excel Management
Excel Management
Excel: A family of Microsoft spreadsheet software with tools for analyzing, charting, and communicating data.Management: The act or process of organizing, handling, directing or controlling something.
1,669 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. P a u l 10,406 Reputation points
    2021-09-03T18:51:27.587+00:00

    I'm not sure what format you're dealing with as input, but if you can get each cell into a collection of strings it should just involve grouping by the value:

    var columnA = new string[] { "MOUSE", "CAT", "CAT", "TIGER", "CAT" };
    
    var groups = columnA.GroupBy(a => a)
        .Select(grp => $"{grp.Key} occurs {grp.Count()} times");
    
    Console.WriteLine(string.Join("\n", groups));
    

    Produces:

    MOUSE occurs 1 times
    CAT occurs 3 times
    TIGER occurs 1 times


  2. Karen Payne MVP 35,286 Reputation points
    2021-09-04T00:53:05.167+00:00

    A recommendation would be, if working with .xlsx style of Excel is to look at SpreadSheetLight (free and on NuGet with two versions one for .NET Framework, one for .NET Core Framework).

    See also

    129306-excelfinddups.png

    Conceptual starter which works source code.

    using System;  
    using System.Collections.Generic;  
    using SpreadsheetLight;  
      
    namespace SpreadsheetLightDataGridViewExport.Classes  
    {  
        public class SpreadSheetLightOperations  
        {  
            /// <summary>  
            /// Find duplicate string values and return their row index  
            /// </summary>  
            /// <param name="fileName">Excel file to read</param>  
            /// <param name="sheetName">WorkSheet to work with</param>  
            /// <param name="search">Text to find</param>  
            /// <param name="columnIndex">Column index to search</param>  
            /// <returns>  
            /// Named Value Tuple  
            /// items      - list of indices  
            /// exception  - reports run time exceptions, 99 percent of the time it's from open the file outside of the  
            ///              application while working on it in code.  
            /// </returns>  
            public static (List<int> items, Exception exception) FindDuplicates(string fileName, string sheetName, string search, int columnIndex)  
            {  
                var indicesList = new List<int>();  
      
                try  
                {  
                    using (var document = new SLDocument(fileName, sheetName))  
                    {  
                          
                        var stats = document.GetWorksheetStatistics();  
                        var test = stats.EndRowIndex;  
                        for (int index = 1; index < stats.EndRowIndex + 1; index++)  
                        {  
                            if (document.GetCellValueAsString(index, columnIndex).EqualsIgnoreCase(search))  
                            {  
                                indicesList.Add(index);  
                            }  
                        }  
                    }  
      
                    return (indicesList, null);  
                      
                }  
                catch (Exception exception)  
                {  
                    return (indicesList, exception);  
                }  
                 
            }  
        }  
    }  
      
    

    Form code source code

    using System;  
    using System.IO;  
    using System.Linq;  
    using System.Windows.Forms;  
    using SpreadsheetLightDataGridViewExport.Classes;  
      
    namespace SpreadsheetLightDataGridViewExport  
    {  
        public partial class PoliciesForm : Form  
        {  
            public PoliciesForm()  
            {  
                InitializeComponent();  
            }  
      
            private void PoliciesForm_Load(object sender, EventArgs e)  
            {  
                PoliciesListBox.DataSource = Enumerable.Range(1, 6).Select(x => $"OLD0{x}").ToList();  
            }  
      
            private void FindDuplicatesButton_Click(object sender, EventArgs e)  
            {  
                ResultsListBox.Items.Clear();  
                  
                var policy = PoliciesListBox.Text;  
                //ExcelOperations  
      
                var fileName = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "FindDups.xlsx");  
                var sheetName = "Polices";  
      
                /*  
                 * pass in file name, text to find and the column index, in this case A column  
                 */  
                var (items, exception) = SpreadSheetLightOperations  
                    .FindDuplicates(fileName, sheetName, policy, 1);  
                  
                if (exception != null)  
                {  
                    MessageBox.Show($"Error\n{exception.Message}");  
                }  
                else  
                {  
                    if (items.Count >1)  
                    {  
                        foreach (var item in items)  
                        {  
                            ResultsListBox.Items.Add(item.ToString());  
                        }  
                    }  
                    else  
                    {  
                        MessageBox.Show($"No rows found for {PoliciesListBox.Text}");  
                    }  
                }  
            }  
        }  
    }  
      
    
    0 comments No comments