How to sum duplicate row specific column values using SpreadsheetLight ?

Question

How to sum duplicate row specific column values using SpreadsheetLight ?

Boom Ba 166

I've got some excel worksheets like

User's image

I'm trying to export the data to a SQLite database using SpreadsheetLight library. Now these sheets may have some duplicate rows (i.e. the columns BP_NO, NAME & BL_NO have the same value as previous row). In that case only one record will be added to the database and the AMT column of the database will hold the sum of the duplicate rows AMT column value.

I've tried

public static void CreatePurchaseTableDUP(string worksheetName)

    {

        DataTable dt = new DataTable();

        dt.Columns.Add("BP_NO");

        dt.Columns.Add("BP_DAT");

        dt.Columns.Add("NAME");

        dt.Columns.Add("BL_NO");

        dt.Columns.Add("BL_DATE");

        dt.Columns.Add("AMT");

        string filePath = @"G:\Downloads\BPRdtls.xlsx";

        using (SLDocument sl1 = new SLDocument(filePath, worksheetName))

        {

            for (int i = 6; i <= LastRow(filePath, worksheetName); i++)

            {

                string t = "";

                if ( 

                    sl1.GetCellValueAsString(i, 1).Trim() == sl1.GetCellValueAsString(i+1, 1).Trim()

                    && sl1.GetCellValueAsString(i, 3).Trim() == sl1.GetCellValueAsString(i+1, 3).Trim()

                    )

                {

                    t = (sl1.GetCellValueAsDecimal(i, 8)+ sl1.GetCellValueAsDecimal(i+1, 8)).ToString("0.00").Trim();

                    DataRow dr = dt.NewRow();

                    dt.Rows.Add(

                        sl1.GetCellValueAsString(i, 1).Trim(),

                        sl1.GetCellValueAsDateTime(i, 2).ToString("dd-MM-yyyy").Trim(),

                        sl1.GetCellValueAsString(i, 3).Trim(),

                        sl1.GetCellValueAsString(i, 5).Trim(),

                        sl1.GetCellValueAsDateTime(i, 6).ToString("dd-MM-yyyy").Trim(),

                        t

                    );

                    dt.AcceptChanges();

                }

                else

                {

                    t=sl1.GetCellValueAsDecimal(i, 8).ToString("0.00").Trim();

                    DataRow dr = dt.NewRow();

                    dt.Rows.Add(

                        sl1.GetCellValueAsString(i, 1).Trim(),

                        sl1.GetCellValueAsDateTime(i, 2).ToString("dd-MM-yyyy").Trim(),

                        sl1.GetCellValueAsString(i, 3).Trim(),

                        sl1.GetCellValueAsString(i, 5).Trim(),

                        sl1.GetCellValueAsDateTime(i, 6).ToString("dd-MM-yyyy").Trim(),

                        t

                    );

                    dt.AcceptChanges();

                }

            }

        }

        using (SQLiteConnection conn = new SQLiteConnection(@"Data Source="+Path.GetFullPath("./Purchase.db")))

        {

            conn.Open();

            using (var cmd = new SQLiteCommand(conn))

            {

                cmd.CommandText = "DROP TABLE IF EXISTS ledgerTable";

                cmd.ExecuteNonQuery();

                cmd.CommandText = @"CREATE TABLE ledgerTable(ID INTEGER PRIMARY KEY, BP_NO TEXT, BP_DAT TEXT, NAME TEXT, BL_NO TEXT, BL_DATE TEXT, AMT TEXT, UNIQUE(NAME, BL_NO, BL_DATE))";

                cmd.ExecuteNonQuery();

                string str;

                SQLiteCommand com;

                foreach (DataRow row in dt.Rows) 

                {

                    str = "INSERT OR IGNORE INTO ledgerTable(BP_NO, BP_DAT, NAME, BL_NO, BL_DATE, AMT)values(@BP_NO, @BP_DAT, @NAME, @BL_NO, @BL_DATE, @AMT)";

                    com = new SQLiteCommand(str, conn);

                    com.Parameters.AddWithValue("@BP_NO", row.Field<string>(0));

                    com.Parameters.AddWithValue("@BP_DAT", row.Field<string>(1));

                    com.Parameters.AddWithValue("@NAME", row.Field<string>(2));

                    com.Parameters.AddWithValue("@BL_NO", row.Field<string>(3));

                    com.Parameters.AddWithValue("@BL_DATE", row.Field<string>(4));

                    com.Parameters.AddWithValue("@AMT", row.Field<string>(5));

                    com.ExecuteNonQuery();

                }

            }

            conn.Close();

        }

    }

where to get the last filled row I've used the below method

public static int LastRow(string fpath, string sheetName)
        {
            using (SLDocument sl1 = new SLDocument(fpath, sheetName))
            {

                SLWorksheetStatistics st = sl1.GetWorksheetStatistics();

                int lastRow = 0;

                for (int i = 6; i <= st.EndRowIndex; i++)
                {
                    if (!string.IsNullOrEmpty(sl1.GetCellValueAsString(i, 1).Trim()))
                    {
                        lastRow =i;
                    }
                }
                return lastRow;
            }
        }

But it only works for 2 consecutive duplicate rows, if the number of duplicate rows are more 2 then it doesn't work.

How can I achieve this ? Can LINQ be used somehow to do this ?

In addition, if the duplicate rows are not one after the other and are scattered throughout the sheet how can I do the exact same thing ?

Accepted answer

1 additional answer

Your answer

Answer 1

Your solution consists of 3 parts:

Read data from source
Process data (normalize, aggregate, remove duplicates, etc.)
Transfer processed data to destination

SpreadsheetLight will be used in the first phase. Entity framework, or SQLiteCommand will be used for the third phase. The second phase, is the problem you are trying to solve in this question.

I assume reading the sheet or writing to db is not the main concern of the question, so I'll focus on aggregating duplicates. No matter where the data is coming from, define a model for your data which could be a C# model or a DataTable, then parse the input data to your model. Then you can simply use a LINQ GroupBy to remove the duplicate data, and replace those duplicates with sum.

Example 1 - Using a C# model to process data

Here is a very simple example to demonstrate how to use a simple C# class for modeling and processing data to remove duplicates and aggregate results using GroupBy:

using System;
using System.Collections.Generic;
using System.Linq;

namespace SampleConsoleApp
{
    internal class Program
    {
        static void Main(string[] args)
        {
            var input = new List<DataItem> {
                new DataItem(){ A= "Lorem", B = "Ipsum", Amount = 100 },
                new DataItem(){ A= "Lorem", B = "Ipsum", Amount = 100 },
                new DataItem(){ A = "Dolor", B = "Sit", Amount = 100 },
                new DataItem(){ A = "Dolor", B = "Sit", Amount = 100 },
                new DataItem(){ A = "Dolor", B = "Sit", Amount = 100 },
                new DataItem(){ A = "Consectetuer", B = "Adipiscing", Amount = 100 },
            };

            var result = input.GroupBy(
                x => new { A = x.A, B = x.B },
                x => new { Amount = x.Amount })
                .Select(g => new DataItem()
                {
                    A = g.Key.A,
                    B = g.Key.B,
                    Amount = g.Sum(x => x.Amount)
                })
                .ToList();

            foreach (var item in result)
            {
                Console.WriteLine($"A:{item.A}, B:{item.B}, Amount:{item.Amount}");
            }
            Console.ReadLine();
        }
    }
    public class DataItem
    {
        public string A { get; set; }
        public string B { get; set; }
        public int Amount { get; set; }
    }
}

Example 2 - Using a DataTable and model to process data

In this example, I've used a DataTable to model the data. The basics are the same as the other example and I've shown how to use a simple DataTable for modeling and processing data to remove duplicates and aggregate results using GroupBy:

using System;
using System.Data;
using System.Linq;

namespace SampleConsoleApp
{
    internal class Program
    {
        static void Main(string[] args)
        {
            var input = new DataTable();
            input.Columns.Add("A", typeof(string));
            input.Columns.Add("B", typeof(string));
            input.Columns.Add("Amount", typeof(int));
            input.Rows.Add("Lorem", "Ipsum", 100);
            input.Rows.Add("Lorem", "Ipsum", 100);
            input.Rows.Add("Lorem", "Ipsum", 100);
            input.Rows.Add("Dolor", "Sit", 100);
            input.Rows.Add("Dolor", "Sit", 100);
            input.Rows.Add("Consectetuer", "Adipiscing", 100);

            var result = input.Clone();
            input.Rows.Cast<DataRow>().GroupBy(
                x => new { A = x.Field<string>("A"), B = x.Field<string>("B") },
                x => new { Amount = x.Field<int>("Amount") })
                .Select(g => new
                {
                    A = g.Key.A,
                    B = g.Key.B,
                    Amount = g.Sum(x => x.Amount)
                })
                .ToList()
                .ForEach(x =>
                {
                    result.Rows.Add(x.A, x.B, x.Amount);
                });

            foreach (DataRow item in result.Rows)
            {
                Console.WriteLine($"A:{item["A"]}, B:{item["B"]}, Amount:{item["Amount"]}");
            }
            Console.ReadLine();
        }
    }
}

Look at the very similar approach that I used in both examples. And here is the result in both cases:

A:Lorem, B:Ipsum, Amount:200
A:Dolor, B:Sit, Amount:300
A:Consectetuer, B:Adipiscing, Amount:100

Boom Ba 166 Reputation points

2023-01-25T05:41:23.5733333+00:00

@Reza Aghaei , I'll keep in mind your suggestion.

BTW, is there a way to detect duplicate rows using SpreadsheetLight and then use it somehow for doing what I'm trying ?
Reza Aghaei 4,986 Reputation points MVP Volunteer Moderator

2023-01-25T08:43:43.22+00:00

It's not responsibility of SpreadsheetLight to do that. Your solution consists of 3 parts:

Read data from source

Process data

Transfer processed data to destination

SpreadsheetLight will be used in the first phase.

Entity framework, or SQLiteCommand will be used for the third phase.

The second phase, is the problem you are trying to solve in this question, and I've tried to answer in my post.
Reza Aghaei 4,986 Reputation points MVP Volunteer Moderator

2023-01-25T13:32:03.8733333+00:00

Updated the example, with a little more description and added DataTable as well. The logic is pretty similar to what I used with C# models.
Reza Aghaei 4,986 Reputation points MVP Volunteer Moderator

2023-01-25T15:25:49.0666667+00:00

Looks good to me. Of course there are always room for improvements, but first you need to make things working and then look in the performance if necessary. Don't worry about the performance if it's a one-off operation, or it's a maintenance task that happens every now and then.

One improvement as per documentations:

SQLite doesn't have any special way to bulk insert data. To get optimal performance when inserting or updating data, ensure that you do the following:

Use a transaction.

Reuse the same parameterized command. Subsequent executions will reuse the compilation of the first one.

You will find the example in the link.
Reza Aghaei 4,986 Reputation points MVP Volunteer Moderator

2023-01-25T15:30:34.56+00:00

Let's take the possible improvements out of scope of this question to keep it focused on processing data (removing duplicates and aggregating data).

If it's the first time that you want to mark an answer as accepted, I think you may want to take a look at this topic Accept an answer on Microsoft Q&A and apply the same practice in this and other questions that you have asked once a post answers your question.

Answer 2

Karen Payne MVP 35,586 Volunteer Moderator

You need to use a GroupBy specifying all or some columns e.g.

DataTable dt = new();
dt.Columns.Add("id", typeof(int));
dt.Columns.Add("Name", typeof(string));
dt.Columns.Add("Dept", typeof(string));

dt.Rows.Add(1, "Test1", "Sample1");
dt.Rows.Add(2, "Test2", "Sample2");
dt.Rows.Add(3, "Test3", "Sample3");
dt.Rows.Add(4, "Test4", "Sample4");  
dt.Rows.Add(5, "Test4", "Sample4"); 
dt.Rows.Add(6, "Test4", "Sample4");  
dt.Rows.Add(7, "Test4", "Sample5");

DataTable dt2 = dt.AsEnumerable()
    .OrderBy(row => row.Field<int>("id"))
    .GroupBy(row => new
    {
        name = row.Field<string>("Name"), 
        dept = row.Field<string>("Dept")
    })
    .Select(x => x.First())
    .CopyToDataTable();

Boom Ba 166 Reputation points

2023-01-25T10:25:43.5133333+00:00

Hi @Karen Payne MVP

Are you suggesting that I need to create another datatable before the SQLite operations in my code and use GroupBy specifying the columns based on which I'm deciding which row is duplicate or not ?

How do I add the values of the column AMT ?

I'm a bit puzzled!

Can you elaborate your solution please !!
Karen Payne MVP 35,586 Reputation points Volunteer Moderator

2023-01-25T15:45:29.1466667+00:00

The solution creates a new DataTable via CopyToDataTable not you. In regards to ATM, perhaps a DataColumn.Expression.

Share via

How to sum duplicate row specific column values using SpreadsheetLight ?

Example 1 - Using a C# model to process data

Example 2 - Using a DataTable and model to process data

1 additional answer

Your answer