שתף באמצעות


Datatable to Excel or CSV without for Loop, Need the fastest Way.

Question

Sunday, November 8, 2015 7:13 PM

Datatable to Excel or CSV without for Loop, Need the fastest Way .

I dont want to use for loop as the data is huge ,

  Dim dt1 As DataTable = result.CopyToDataTable()

Just export dt1 to excel or CSV

All replies (13)

Monday, November 9, 2015 12:44 PM ✅Answered

Frank : Spire.XLS for .NET is Working as per requirement ..I need the Alternative which is freeware or Open Source API to Export to CSV or MS Access.

Exporting to Excel is Causing 1 Problem i.e Limitation of Rows...So I need CSV or MS Access which is Open Source or Freeware.

Reed is probably right - I thought about a solution using PLINQ and another using Parallel.For, but DB brought out that there's no promise that the order will be maintained, so I didn't pursue it further.

Well, it sounded like a good idea at the time. ;-)

***** EDIT *****

As an addendum here, I tried it both ways and Reed is right: The parallelization took LONGER than standard For Each iteration.

If I had eight hours to chop down a tree, I'd spend six sharpening my axe. -- Abraham Lincoln


Monday, November 9, 2015 4:32 PM ✅Answered

....

Reed...Thank You,,This is Actual working absolutely Fine for Me as of Now....But when the CSV file is 40 GB, I think this Might be Slow,...What do you suggest.

There isn't much to suggest. 40GB is a very large file.  I would suggest a fast i7 processor, lots of RAM and a solid state drive.

Be sure to check out the suggestion by KevinInstructor; I'm not familiar with the method but it may have some internal optimizations which improve the output time.

What do you intend to do with a 40GB output file once you have it?  What program would consume this file?  Do you have an option of chunking the data into, say, ten 4GB files?  It seems odd to generate a single file of such size... I can't imagine how it would be used.

Reed Kimble - "When you do things right, people won't be sure you've done anything at all"


Sunday, November 8, 2015 7:23 PM

Regardless, a loop is going to be used. You may not do it yourself, but one is going o be used.


Sunday, November 8, 2015 7:32 PM

Datatable to Excel or CSV without for Loop, Need the fastest Way .

I dont want to use for loop as the data is huge ,

  Dim dt1 As DataTable = result.CopyToDataTable()

Just export dt1 to excel or CSV

Consider parallel processing whether that's using PLINQ or the parallel equivalent of a For-Next loop.

I know you said you didn't want to use iteration but with parallel processing, you might be amazed at how quickly it can go (depending on your hardware of course).

If I had eight hours to chop down a tree, I'd spend six sharpening my axe. -- Abraham Lincoln


Sunday, November 8, 2015 8:43 PM

Datatable to Excel or CSV without for Loop, Need the fastest Way .

I dont want to use for loop as the data is huge ,

  Dim dt1 As DataTable = result.CopyToDataTable()

Just export dt1 to excel or CSV

Consider parallel processing whether that's using PLINQ or the parallel equivalent of a For-Next loop.

I know you said you didn't want to use iteration but with parallel processing, you might be amazed at how quickly it can go (depending on your hardware of course).

If I had eight hours to chop down a tree, I'd spend six sharpening my axe. -- Abraham Lincoln

Can you please give an Example for using this Parallel.ForEach based on Above Example.. and save the Table to Excel based on Dim dt1 As DataTable = result.CopyToDataTable()


Sunday, November 8, 2015 8:51 PM | 1 vote

Hello,

The fastest method I know of is via embedded expressions and xml literals which requires zero loops on a DataTable. The format for the Excel method I am speaking of produces an xml version of workbooks and worksheets and by default Excel will open an xml file formatted for Excel from Windows Explorer. If this interest you and using Framework 3.5 or high I can upload a sample project. A native format can be done with Open Office API but requires more code yet is fast. Without having worked with Open Office API I would not suggest it, only mention it.

Please remember to mark the replies as answers if they help and unmark them if they provide no help, this will help others who are looking for solutions to the same or similar problem. Contact via my webpage under my profile but do not reply to forum questions.
Microsoft Developer tools
Developer’s Guide to Windows 10 video series


Sunday, November 8, 2015 9:26 PM

Can you please give an Example for using this Parallel.ForEach based on Above Example.. and save the Table to Excel based on Dim dt1 As DataTable = result.CopyToDataTable()

I have no idea how to save a DataTable to Excel.

The theory is sound though; look at the link I showed (and that's a puny amount - worth using parallelization but it's not really that many at all).

If I had eight hours to chop down a tree, I'd spend six sharpening my axe. -- Abraham Lincoln


Sunday, November 8, 2015 11:32 PM | 1 vote

A hundred thousand rows doesn't sound like much.  Unless you have a ton of huge fields in each row it shouldn't take any time at all to write a CSV file with a simple loop.  Even with a lot of fields it still seems like you would have to be talking about more data than could reasonably be manipulated in a data table before you'd see a really long execution time when generating a CSV file.

How long does this code take to run?

Public Sub DataTableToCSV(table As DataTable, filePath As String)
    Using writer As New IO.StreamWriter(filePath)
        Dim lastColIndex As Integer = table.Columns.Count - 1
        Dim cellvalue As String
        Dim firstField, firstRow As Boolean
        firstRow = True
        For Each row As DataRow In table.Rows
            If Not firstRow Then writer.WriteLine()
            firstField = True
            For i = 0 To lastColIndex
                If Not firstField Then writer.Write(",")
                If Not row.IsNull(i) Then
                    cellvalue = row.Item(i).ToString
                    If cellvalue.Contains(",") Then
                        writer.Write(ControlChars.Quote)
                        writer.Write(cellvalue)
                        writer.Write(ControlChars.Quote)
                    Else
                        writer.Write(cellvalue)
                    End If
                End If
                firstField = False
            Next
            firstRow = False
        Next
    End Using
End Sub

Reed Kimble - "When you do things right, people won't be sure you've done anything at all"


Sunday, November 8, 2015 11:37 PM | 1 vote

Frank,

The problem with parallelizing this kind of operation is that the output is sequential.  So while you could potentially perform serialization of the individual rows in a parallel fashion, you would still have to wait for all parallel loops to finish and then order the output.  The ordering process combined with the intrinsic overhead of the parallelization may outweigh any benefit gained from utilizing the multiple threads.

So while you are right that the theory is sound, the application may not be viable in this kind of situation.

Reed Kimble - "When you do things right, people won't be sure you've done anything at all"


Sunday, November 8, 2015 11:40 PM

Frank,

The problem with parallelizing this kind of operation is that the output is sequential.  So while you could potentially perform serialization of the individual rows in a parallel fashion, you would still have to wait for all parallel loops to finish and then order the output.  The ordering process combined with the intrinsic overhead of the parallelization may outweigh any benefit gained from utilizing the multiple threads.

So while you are right that the theory is sound, the application may not be viable in this kind of situation.

Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

You may be right - but in trying to set up a test (I know PLINQ, not Parallel.For), I can't even get it to work. This isn't the first time I've tried and I can't get past the first thing that always catches me: An ArgumentOutOfRange exception thrown.

I've posted my own question about it because I really do want to get my head around that - even if it's not applicable here.

If I had eight hours to chop down a tree, I'd spend six sharpening my axe. -- Abraham Lincoln


Monday, November 9, 2015 7:26 AM

Frank : Spire.XLS for .NET is Working as per requirement ..I need the Alternative which is freeware or Open Source API to Export to CSV or MS Access.

Exporting to Excel is Causing 1 Problem i.e Limitation of Rows...So I need CSV or MS Access which is Open Source or Freeware.


Monday, November 9, 2015 7:36 AM

A hundred thousand rows doesn't sound like much.  Unless you have a ton of huge fields in each row it shouldn't take any time at all to write a CSV file with a simple loop.  Even with a lot of fields it still seems like you would have to be talking about more data than could reasonably be manipulated in a data table before you'd see a really long execution time when generating a CSV file.

How long does this code take to run?

Public Sub DataTableToCSV(table As DataTable, filePath As String)
    Using writer As New IO.StreamWriter(filePath)
        Dim lastColIndex As Integer = table.Columns.Count - 1
        Dim cellvalue As String
        Dim firstField, firstRow As Boolean
        firstRow = True
        For Each row As DataRow In table.Rows
            If Not firstRow Then writer.WriteLine()
            firstField = True
            For i = 0 To lastColIndex
                If Not firstField Then writer.Write(",")
                If Not row.IsNull(i) Then
                    cellvalue = row.Item(i).ToString
                    If cellvalue.Contains(",") Then
                        writer.Write(ControlChars.Quote)
                        writer.Write(cellvalue)
                        writer.Write(ControlChars.Quote)
                    Else
                        writer.Write(cellvalue)
                    End If
                End If
                firstField = False
            Next
            firstRow = False
        Next
    End Using
End Sub

Reed Kimble - "When you do things right, people won't be sure you've done anything at all"

Reed...Thank You,,This is Actual working absolutely Fine for Me as of Now....But when the CSV file is 40 GB, I think this Might be Slow,...What do you suggest.


Monday, November 9, 2015 7:43 AM

Hello,

The fastest method I know of is via embedded expressions and xml literals which requires zero loops on a DataTable. The format for the Excel method I am speaking of produces an xml version of workbooks and worksheets and by default Excel will open an xml file formatted for Excel from Windows Explorer. If this interest you and using Framework 3.5 or high I can upload a sample project. A native format can be done with Open Office API but requires more code yet is fast. Without having worked with Open Office API I would not suggest it, only mention it.

Please remember to mark the replies as answers if they help and unmark them if they provide no help, this will help others who are looking for solutions to the same or similar problem. Contact via my webpage under my profile but do not reply to forum questions.
Microsoft Developer tools
Developer’s Guide to Windows 10 video series

Kevininstructor: please provide me the link, if its a freeware let me try Open Office API, Above was just an Example...I have 40 GB CSV...I need to filter data and datatable should be exported to CSV or Excel or Access...I am Interested in trying on ur Solution.