OpenXML is very slow (more than 2 hours) to read .xlsx file with 2 lakh rows and 8 columns

R, Divya 0 Reputation points
2023-05-06T09:44:25.05+00:00

I generated an excel file (.xlsx) using OpenXML in .Net 6 application. The file has 2 lakh rows and 8 columns.

I modified the file by updating few columns and I am trying to read the .xlsx file using OpenXML in .Net 6 application. It takes more than 2 hours to read 2 lakh rows.

But, if I try to read the unmodified version of OpenXML generated .xlsx file, it takes only 2 mins to read the 2 lakh rows. why is it very slow if I modify the data in the file? could you please help immediately?

Note: I tried with both DOM and SAX approaches, it does not help.

Office
Office
A suite of Microsoft productivity software that supports common business tasks, including word processing, email, presentations, and data management and analysis.
1,658 questions
ASP.NET Core
ASP.NET Core
A set of technologies in the .NET Framework for building web applications and XML web services.
4,568 questions
Office Development
Office Development
Office: A suite of Microsoft productivity software that supports common business tasks, including word processing, email, presentations, and data management and analysis.Development: The process of researching, productizing, and refining new or existing technologies.
3,934 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Zhi Lv - MSFT 32,336 Reputation points Microsoft Vendor
    2023-05-09T07:20:18.02+00:00

    Hi @R, Divya

    When we open the input excel file in Microsoft Excel Application and modify any cell values and then try to read the same file in C# Console App using OpenXML library, the time to read 4 lakh records is growing even more than few hours.

    After modifying the excel file, whether the excel file size changed to much? And when read the excel file, whether the memory usage is very high?

    As far as I know, when using SpreadsheetDocument and SheetData classes, which load the entire file into memory and can cause performance issues for large files. So, if the issue relates the memory, you can employ an OpenXMLReader to read the XML in the file one element at a time, without having to load the entire file into memory.

    Code like this:

        // The SAX approach.
        static void ReadExcelFileSAX(string fileName)
        {
            using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(fileName, false))
            {
                WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
                WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
    
                OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);
                string text;
                while (reader.Read())
                {
                    if (reader.ElementType == typeof(CellValue))
                    {
                        text = reader.GetText();
                        Console.Write(text + " ");
                    }
                }
                Console.WriteLine();
                Console.ReadKey();
            }
        }
    

    More detail information, see Parse and read a large spreadsheet document (Open XML SDK) (#Approaches to Parsing Open XML Files).


    If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".

    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    Best regards,

    Dillion


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.