How to sum duplicate row specific column values using SpreadsheetLight ?

Boom Ba 166 Reputation points
2023-01-25T00:38:52.2066667+00:00

I've got some excel worksheets like

User's image

I'm trying to export the data to a SQLite database using SpreadsheetLight library. Now these sheets may have some duplicate rows (i.e. the columns BP_NO, NAME & BL_NO have the same value as previous row). In that case only one record will be added to the database and the AMT column of the database will hold the sum of the duplicate rows AMT column value.

I've tried

public static void CreatePurchaseTableDUP(string worksheetName)

    {

        DataTable dt = new DataTable();

        dt.Columns.Add("BP_NO");

        dt.Columns.Add("BP_DAT");

        dt.Columns.Add("NAME");

        dt.Columns.Add("BL_NO");

        dt.Columns.Add("BL_DATE");

        dt.Columns.Add("AMT");

        string filePath = @"G:\Downloads\BPRdtls.xlsx";

        using (SLDocument sl1 = new SLDocument(filePath, worksheetName))

        {

            for (int i = 6; i <= LastRow(filePath, worksheetName); i++)

            {

                string t = "";

                if ( 

                    sl1.GetCellValueAsString(i, 1).Trim() == sl1.GetCellValueAsString(i+1, 1).Trim()

                    && sl1.GetCellValueAsString(i, 3).Trim() == sl1.GetCellValueAsString(i+1, 3).Trim()

                    )

                {

                    t = (sl1.GetCellValueAsDecimal(i, 8)+ sl1.GetCellValueAsDecimal(i+1, 8)).ToString("0.00").Trim();

                    DataRow dr = dt.NewRow();

                    dt.Rows.Add(

                        sl1.GetCellValueAsString(i, 1).Trim(),

                        sl1.GetCellValueAsDateTime(i, 2).ToString("dd-MM-yyyy").Trim(),

                        sl1.GetCellValueAsString(i, 3).Trim(),

                        sl1.GetCellValueAsString(i, 5).Trim(),

                        sl1.GetCellValueAsDateTime(i, 6).ToString("dd-MM-yyyy").Trim(),

                        t

                    );

                    dt.AcceptChanges();

                }

                else

                {

                    t=sl1.GetCellValueAsDecimal(i, 8).ToString("0.00").Trim();

                    DataRow dr = dt.NewRow();

                    dt.Rows.Add(

                        sl1.GetCellValueAsString(i, 1).Trim(),

                        sl1.GetCellValueAsDateTime(i, 2).ToString("dd-MM-yyyy").Trim(),

                        sl1.GetCellValueAsString(i, 3).Trim(),

                        sl1.GetCellValueAsString(i, 5).Trim(),

                        sl1.GetCellValueAsDateTime(i, 6).ToString("dd-MM-yyyy").Trim(),

                        t

                    );

                    dt.AcceptChanges();

                }

            }

        }

        using (SQLiteConnection conn = new SQLiteConnection(@"Data Source="+Path.GetFullPath("./Purchase.db")))

        {

            conn.Open();

            using (var cmd = new SQLiteCommand(conn))

            {

                cmd.CommandText = "DROP TABLE IF EXISTS ledgerTable";

                cmd.ExecuteNonQuery();

                cmd.CommandText = @"CREATE TABLE ledgerTable(ID INTEGER PRIMARY KEY, BP_NO TEXT, BP_DAT TEXT, NAME TEXT, BL_NO TEXT, BL_DATE TEXT, AMT TEXT, UNIQUE(NAME, BL_NO, BL_DATE))";

                cmd.ExecuteNonQuery();

                string str;

                SQLiteCommand com;

                foreach (DataRow row in dt.Rows) 

                {

                    str = "INSERT OR IGNORE INTO ledgerTable(BP_NO, BP_DAT, NAME, BL_NO, BL_DATE, AMT)values(@BP_NO, @BP_DAT, @NAME, @BL_NO, @BL_DATE, @AMT)";

                    com = new SQLiteCommand(str, conn);

                    com.Parameters.AddWithValue("@BP_NO", row.Field<string>(0));

                    com.Parameters.AddWithValue("@BP_DAT", row.Field<string>(1));

                    com.Parameters.AddWithValue("@NAME", row.Field<string>(2));

                    com.Parameters.AddWithValue("@BL_NO", row.Field<string>(3));

                    com.Parameters.AddWithValue("@BL_DATE", row.Field<string>(4));

                    com.Parameters.AddWithValue("@AMT", row.Field<string>(5));

                    com.ExecuteNonQuery();

                }

            }

            conn.Close();

        }

    }

where to get the last filled row I've used the below method

public static int LastRow(string fpath, string sheetName)
        {
            using (SLDocument sl1 = new SLDocument(fpath, sheetName))
            {

                SLWorksheetStatistics st = sl1.GetWorksheetStatistics();

                int lastRow = 0;

                for (int i = 6; i <= st.EndRowIndex; i++)
                {
                    if (!string.IsNullOrEmpty(sl1.GetCellValueAsString(i, 1).Trim()))
                    {
                        lastRow =i;
                    }
                }
                return lastRow;
            }
        }

But it only works for 2 consecutive duplicate rows, if the number of duplicate rows are more 2 then it doesn't work.

How can I achieve this ? Can LINQ be used somehow to do this ?

In addition, if the duplicate rows are not one after the other and are scattered throughout the sheet how can I do the exact same thing ?

Windows Presentation Foundation
Windows Presentation Foundation
A part of the .NET Framework that provides a unified programming model for building line-of-business desktop applications on Windows.
2,678 questions
C#
C#
An object-oriented and type-safe programming language that has its roots in the C family of languages and includes support for component-oriented programming.
10,288 questions
0 comments No comments
{count} vote

Accepted answer
  1. Reza Aghaei 4,936 Reputation points MVP
    2023-01-25T01:07:56.62+00:00

    Your solution consists of 3 parts:

    1. Read data from source
    2. Process data (normalize, aggregate, remove duplicates, etc.)
    3. Transfer processed data to destination

    SpreadsheetLight will be used in the first phase. Entity framework, or SQLiteCommand will be used for the third phase. The second phase, is the problem you are trying to solve in this question.

    I assume reading the sheet or writing to db is not the main concern of the question, so I'll focus on aggregating duplicates. No matter where the data is coming from, define a model for your data which could be a C# model or a DataTable, then parse the input data to your model. Then you can simply use a LINQ GroupBy to remove the duplicate data, and replace those duplicates with sum.

    Example 1 - Using a C# model to process data

    Here is a very simple example to demonstrate how to use a simple C# class for modeling and processing data to remove duplicates and aggregate results using GroupBy:

    using System;
    using System.Collections.Generic;
    using System.Linq;
    
    namespace SampleConsoleApp
    {
        internal class Program
        {
            static void Main(string[] args)
            {
                var input = new List<DataItem> {
                    new DataItem(){ A= "Lorem", B = "Ipsum", Amount = 100 },
                    new DataItem(){ A= "Lorem", B = "Ipsum", Amount = 100 },
                    new DataItem(){ A = "Dolor", B = "Sit", Amount = 100 },
                    new DataItem(){ A = "Dolor", B = "Sit", Amount = 100 },
                    new DataItem(){ A = "Dolor", B = "Sit", Amount = 100 },
                    new DataItem(){ A = "Consectetuer", B = "Adipiscing", Amount = 100 },
                };
    
                var result = input.GroupBy(
                    x => new { A = x.A, B = x.B },
                    x => new { Amount = x.Amount })
                    .Select(g => new DataItem()
                    {
                        A = g.Key.A,
                        B = g.Key.B,
                        Amount = g.Sum(x => x.Amount)
                    })
                    .ToList();
    
                foreach (var item in result)
                {
                    Console.WriteLine($"A:{item.A}, B:{item.B}, Amount:{item.Amount}");
                }
                Console.ReadLine();
            }
        }
        public class DataItem
        {
            public string A { get; set; }
            public string B { get; set; }
            public int Amount { get; set; }
        }
    }
    

    Example 2 - Using a DataTable and model to process data

    In this example, I've used a DataTable to model the data. The basics are the same as the other example and I've shown how to use a simple DataTable for modeling and processing data to remove duplicates and aggregate results using GroupBy:

    using System;
    using System.Data;
    using System.Linq;
    
    namespace SampleConsoleApp
    {
        internal class Program
        {
            static void Main(string[] args)
            {
                var input = new DataTable();
                input.Columns.Add("A", typeof(string));
                input.Columns.Add("B", typeof(string));
                input.Columns.Add("Amount", typeof(int));
                input.Rows.Add("Lorem", "Ipsum", 100);
                input.Rows.Add("Lorem", "Ipsum", 100);
                input.Rows.Add("Lorem", "Ipsum", 100);
                input.Rows.Add("Dolor", "Sit", 100);
                input.Rows.Add("Dolor", "Sit", 100);
                input.Rows.Add("Consectetuer", "Adipiscing", 100);
    
                var result = input.Clone();
                input.Rows.Cast<DataRow>().GroupBy(
                    x => new { A = x.Field<string>("A"), B = x.Field<string>("B") },
                    x => new { Amount = x.Field<int>("Amount") })
                    .Select(g => new
                    {
                        A = g.Key.A,
                        B = g.Key.B,
                        Amount = g.Sum(x => x.Amount)
                    })
                    .ToList()
                    .ForEach(x =>
                    {
                        result.Rows.Add(x.A, x.B, x.Amount);
                    });
    
                foreach (DataRow item in result.Rows)
                {
                    Console.WriteLine($"A:{item["A"]}, B:{item["B"]}, Amount:{item["Amount"]}");
                }
                Console.ReadLine();
            }
        }
    }
    

    Look at the very similar approach that I used in both examples. And here is the result in both cases:

    A:Lorem, B:Ipsum, Amount:200
    A:Dolor, B:Sit, Amount:300
    A:Consectetuer, B:Adipiscing, Amount:100
    

1 additional answer

Sort by: Most helpful
  1. Karen Payne MVP 35,116 Reputation points
    2023-01-25T10:06:01.74+00:00

    You need to use a GroupBy specifying all or some columns e.g.

    DataTable dt = new();
    dt.Columns.Add("id", typeof(int));
    dt.Columns.Add("Name", typeof(string));
    dt.Columns.Add("Dept", typeof(string));
    
    dt.Rows.Add(1, "Test1", "Sample1");
    dt.Rows.Add(2, "Test2", "Sample2");
    dt.Rows.Add(3, "Test3", "Sample3");
    dt.Rows.Add(4, "Test4", "Sample4");  
    dt.Rows.Add(5, "Test4", "Sample4"); 
    dt.Rows.Add(6, "Test4", "Sample4");  
    dt.Rows.Add(7, "Test4", "Sample5");
    
    DataTable dt2 = dt.AsEnumerable()
        .OrderBy(row => row.Field<int>("id"))
        .GroupBy(row => new
        {
            name = row.Field<string>("Name"), 
            dept = row.Field<string>("Dept")
        })
        .Select(x => x.First())
        .CopyToDataTable();