How to sum duplicate row specific column values using SpreadsheetLight ?

Boom Ba 166 Reputation points
2023-01-25T00:38:52.2066667+00:00

I've got some excel worksheets like

User's image

I'm trying to export the data to a SQLite database using SpreadsheetLight library. Now these sheets may have some duplicate rows (i.e. the columns BP_NO, NAME & BL_NO have the same value as previous row). In that case only one record will be added to the database and the AMT column of the database will hold the sum of the duplicate rows AMT column value.

I've tried

public static void CreatePurchaseTableDUP(string worksheetName)

    {

        DataTable dt = new DataTable();

        dt.Columns.Add("BP_NO");

        dt.Columns.Add("BP_DAT");

        dt.Columns.Add("NAME");

        dt.Columns.Add("BL_NO");

        dt.Columns.Add("BL_DATE");

        dt.Columns.Add("AMT");

        string filePath = @"G:\Downloads\BPRdtls.xlsx";

        using (SLDocument sl1 = new SLDocument(filePath, worksheetName))

        {

            for (int i = 6; i <= LastRow(filePath, worksheetName); i++)

            {

                string t = "";

                if ( 

                    sl1.GetCellValueAsString(i, 1).Trim() == sl1.GetCellValueAsString(i+1, 1).Trim()

                    && sl1.GetCellValueAsString(i, 3).Trim() == sl1.GetCellValueAsString(i+1, 3).Trim()

                    )

                {

                    t = (sl1.GetCellValueAsDecimal(i, 8)+ sl1.GetCellValueAsDecimal(i+1, 8)).ToString("0.00").Trim();

                    DataRow dr = dt.NewRow();

                    dt.Rows.Add(

                        sl1.GetCellValueAsString(i, 1).Trim(),

                        sl1.GetCellValueAsDateTime(i, 2).ToString("dd-MM-yyyy").Trim(),

                        sl1.GetCellValueAsString(i, 3).Trim(),

                        sl1.GetCellValueAsString(i, 5).Trim(),

                        sl1.GetCellValueAsDateTime(i, 6).ToString("dd-MM-yyyy").Trim(),

                        t

                    );

                    dt.AcceptChanges();

                }

                else

                {

                    t=sl1.GetCellValueAsDecimal(i, 8).ToString("0.00").Trim();

                    DataRow dr = dt.NewRow();

                    dt.Rows.Add(

                        sl1.GetCellValueAsString(i, 1).Trim(),

                        sl1.GetCellValueAsDateTime(i, 2).ToString("dd-MM-yyyy").Trim(),

                        sl1.GetCellValueAsString(i, 3).Trim(),

                        sl1.GetCellValueAsString(i, 5).Trim(),

                        sl1.GetCellValueAsDateTime(i, 6).ToString("dd-MM-yyyy").Trim(),

                        t

                    );

                    dt.AcceptChanges();

                }

            }

        }

        using (SQLiteConnection conn = new SQLiteConnection(@"Data Source="+Path.GetFullPath("./Purchase.db")))

        {

            conn.Open();

            using (var cmd = new SQLiteCommand(conn))

            {

                cmd.CommandText = "DROP TABLE IF EXISTS ledgerTable";

                cmd.ExecuteNonQuery();

                cmd.CommandText = @"CREATE TABLE ledgerTable(ID INTEGER PRIMARY KEY, BP_NO TEXT, BP_DAT TEXT, NAME TEXT, BL_NO TEXT, BL_DATE TEXT, AMT TEXT, UNIQUE(NAME, BL_NO, BL_DATE))";

                cmd.ExecuteNonQuery();

                string str;

                SQLiteCommand com;

                foreach (DataRow row in dt.Rows) 

                {

                    str = "INSERT OR IGNORE INTO ledgerTable(BP_NO, BP_DAT, NAME, BL_NO, BL_DATE, AMT)values(@BP_NO, @BP_DAT, @NAME, @BL_NO, @BL_DATE, @AMT)";

                    com = new SQLiteCommand(str, conn);

                    com.Parameters.AddWithValue("@BP_NO", row.Field<string>(0));

                    com.Parameters.AddWithValue("@BP_DAT", row.Field<string>(1));

                    com.Parameters.AddWithValue("@NAME", row.Field<string>(2));

                    com.Parameters.AddWithValue("@BL_NO", row.Field<string>(3));

                    com.Parameters.AddWithValue("@BL_DATE", row.Field<string>(4));

                    com.Parameters.AddWithValue("@AMT", row.Field<string>(5));

                    com.ExecuteNonQuery();

                }

            }

            conn.Close();

        }

    }

where to get the last filled row I've used the below method

public static int LastRow(string fpath, string sheetName)
        {
            using (SLDocument sl1 = new SLDocument(fpath, sheetName))
            {

                SLWorksheetStatistics st = sl1.GetWorksheetStatistics();

                int lastRow = 0;

                for (int i = 6; i <= st.EndRowIndex; i++)
                {
                    if (!string.IsNullOrEmpty(sl1.GetCellValueAsString(i, 1).Trim()))
                    {
                        lastRow =i;
                    }
                }
                return lastRow;
            }
        }

But it only works for 2 consecutive duplicate rows, if the number of duplicate rows are more 2 then it doesn't work.

How can I achieve this ? Can LINQ be used somehow to do this ?

In addition, if the duplicate rows are not one after the other and are scattered throughout the sheet how can I do the exact same thing ?

Developer technologies Windows Presentation Foundation
Developer technologies C#
0 comments No comments
{count} vote

Accepted answer
  1. Reza Aghaei 4,986 Reputation points MVP Volunteer Moderator
    2023-01-25T01:07:56.62+00:00

    Your solution consists of 3 parts:

    1. Read data from source
    2. Process data (normalize, aggregate, remove duplicates, etc.)
    3. Transfer processed data to destination

    SpreadsheetLight will be used in the first phase. Entity framework, or SQLiteCommand will be used for the third phase. The second phase, is the problem you are trying to solve in this question.

    I assume reading the sheet or writing to db is not the main concern of the question, so I'll focus on aggregating duplicates. No matter where the data is coming from, define a model for your data which could be a C# model or a DataTable, then parse the input data to your model. Then you can simply use a LINQ GroupBy to remove the duplicate data, and replace those duplicates with sum.

    Example 1 - Using a C# model to process data

    Here is a very simple example to demonstrate how to use a simple C# class for modeling and processing data to remove duplicates and aggregate results using GroupBy:

    using System;
    using System.Collections.Generic;
    using System.Linq;
    
    namespace SampleConsoleApp
    {
        internal class Program
        {
            static void Main(string[] args)
            {
                var input = new List<DataItem> {
                    new DataItem(){ A= "Lorem", B = "Ipsum", Amount = 100 },
                    new DataItem(){ A= "Lorem", B = "Ipsum", Amount = 100 },
                    new DataItem(){ A = "Dolor", B = "Sit", Amount = 100 },
                    new DataItem(){ A = "Dolor", B = "Sit", Amount = 100 },
                    new DataItem(){ A = "Dolor", B = "Sit", Amount = 100 },
                    new DataItem(){ A = "Consectetuer", B = "Adipiscing", Amount = 100 },
                };
    
                var result = input.GroupBy(
                    x => new { A = x.A, B = x.B },
                    x => new { Amount = x.Amount })
                    .Select(g => new DataItem()
                    {
                        A = g.Key.A,
                        B = g.Key.B,
                        Amount = g.Sum(x => x.Amount)
                    })
                    .ToList();
    
                foreach (var item in result)
                {
                    Console.WriteLine($"A:{item.A}, B:{item.B}, Amount:{item.Amount}");
                }
                Console.ReadLine();
            }
        }
        public class DataItem
        {
            public string A { get; set; }
            public string B { get; set; }
            public int Amount { get; set; }
        }
    }
    

    Example 2 - Using a DataTable and model to process data

    In this example, I've used a DataTable to model the data. The basics are the same as the other example and I've shown how to use a simple DataTable for modeling and processing data to remove duplicates and aggregate results using GroupBy:

    using System;
    using System.Data;
    using System.Linq;
    
    namespace SampleConsoleApp
    {
        internal class Program
        {
            static void Main(string[] args)
            {
                var input = new DataTable();
                input.Columns.Add("A", typeof(string));
                input.Columns.Add("B", typeof(string));
                input.Columns.Add("Amount", typeof(int));
                input.Rows.Add("Lorem", "Ipsum", 100);
                input.Rows.Add("Lorem", "Ipsum", 100);
                input.Rows.Add("Lorem", "Ipsum", 100);
                input.Rows.Add("Dolor", "Sit", 100);
                input.Rows.Add("Dolor", "Sit", 100);
                input.Rows.Add("Consectetuer", "Adipiscing", 100);
    
                var result = input.Clone();
                input.Rows.Cast<DataRow>().GroupBy(
                    x => new { A = x.Field<string>("A"), B = x.Field<string>("B") },
                    x => new { Amount = x.Field<int>("Amount") })
                    .Select(g => new
                    {
                        A = g.Key.A,
                        B = g.Key.B,
                        Amount = g.Sum(x => x.Amount)
                    })
                    .ToList()
                    .ForEach(x =>
                    {
                        result.Rows.Add(x.A, x.B, x.Amount);
                    });
    
                foreach (DataRow item in result.Rows)
                {
                    Console.WriteLine($"A:{item["A"]}, B:{item["B"]}, Amount:{item["Amount"]}");
                }
                Console.ReadLine();
            }
        }
    }
    

    Look at the very similar approach that I used in both examples. And here is the result in both cases:

    A:Lorem, B:Ipsum, Amount:200
    A:Dolor, B:Sit, Amount:300
    A:Consectetuer, B:Adipiscing, Amount:100
    

1 additional answer

Sort by: Most helpful
  1. Karen Payne MVP 35,586 Reputation points Volunteer Moderator
    2023-01-25T10:06:01.74+00:00

    You need to use a GroupBy specifying all or some columns e.g.

    DataTable dt = new();
    dt.Columns.Add("id", typeof(int));
    dt.Columns.Add("Name", typeof(string));
    dt.Columns.Add("Dept", typeof(string));
    
    dt.Rows.Add(1, "Test1", "Sample1");
    dt.Rows.Add(2, "Test2", "Sample2");
    dt.Rows.Add(3, "Test3", "Sample3");
    dt.Rows.Add(4, "Test4", "Sample4");  
    dt.Rows.Add(5, "Test4", "Sample4"); 
    dt.Rows.Add(6, "Test4", "Sample4");  
    dt.Rows.Add(7, "Test4", "Sample5");
    
    DataTable dt2 = dt.AsEnumerable()
        .OrderBy(row => row.Field<int>("id"))
        .GroupBy(row => new
        {
            name = row.Field<string>("Name"), 
            dept = row.Field<string>("Dept")
        })
        .Select(x => x.First())
        .CopyToDataTable();
    

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.