VB.NET - Linq is too slow to handle a large amount of data stored in list(of T)

Steeve 1 Reputation point
2022-10-28T09:42:32.713+00:00

Hello,

I have a list(of T) of more than 20,000 data, I want to group them by slices of hours, minutes, etc. To do a CandelStick Charting.
In this slice, we must collect the max, the min, the start value of the slice and the end value of the slice.
It takes me 1.7 sec to process 20,000 data. There is 1 data every second.

In the example below, I group them by minutes.

Here is the class containing the data type.

Public Class CPrice  
  
    Sub New()  
    End Sub  
  
    Private _Date As DateTime = Now           
    Public Property [Date] As DateTime  
        Get  
            Return _Date  
        End Get  
        Set(ByVal value As DateTime)  
            _Date = value  
        End Set  
    End Property  
  
    Private _Prix As Decimal  =0     
    Public Property Prix As Decimal  
        Get  
            Return _Prix  
        End Get  
        Set(ByVal value As Decimal)  
            _Prix = value  
        End Set  
    End Property  
  
End Class  

The routine for regrouping. (here by 60 sec.), does it in 1.7 sec.

Function CandelCreate(ByVal Pr As List(Of CPrice)) As DataPointCollection  
  
        Dim IntervalSec As Long = 60   
        Dim DateMin As DateTime = Pr(0).Date  
        Dim DateCpt As DateTime = DateMin  
  
       Dim RetSerie As New Series("Ret")  
        RetSerie.YValuesPerPoint = 4  
  
        Dim DateDepart As DateTime = New Date(Year(DateMin), Month(DateMin), DateAndTime.Day(DateMin), Hour(DateMin), Minute(DateMin), 0)  
        DateCpt = DateDepart  
        Do   
  
            Dim Res0 = Pr.Where(Function(d) d.Date >= DateCpt And d.Date < DateCpt.AddSeconds(IntervalSec)).Select(Function(x) x.Prix)  
  
            Dim MaxInInterVal As Decimal = Res0.Max   
            Dim MinInInterVal As Decimal = Res0.Min   
            Dim OpenInInterval As Decimal = Res0.First   
            Dim CloseInInterval As Decimal = Res0.Last   
  
            RetSerie.Points.AddXY(DateCpt, MaxInInterVal, MinInInterVal, OpenInInterval, CloseInInterval)  
  
  
                        DateCpt = DateCpt.AddSeconds(IntervalSec)  
        Loop While DateCpt > DateDepart And DateCpt <= Pr(Pr.Count - 1).Date  
  
        Return RetSerie.Points  
  
    End Function  
  

Is there a way to speed up the processing?

Thanks for your help.

Steeve

VB
VB
An object-oriented programming language developed by Microsoft that is implemented on the .NET Framework. Previously known as Visual Basic .NET.
2,568 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Petteys, Kevin 91 Reputation points
    2022-10-28T14:40:21.583+00:00

    In my opinion, these types of queries should be done in the database. I use Linq for small datasets that are already in memory. Databases where designed specifically for querying data efficiently .

    1 person found this answer helpful.
    0 comments No comments

  2. Bruce (SqlWork.com) 55,366 Reputation points
    2022-10-28T15:13:44.627+00:00

    You are iterating thru the 20k list for every interval. I’d try one pass thru the list and using a keyed hash calc the intervals. Or linq group by interval. I tried a sample group by and it ran 200K rows in 32 milliseconds

    (C#, don't know vb anymore):

    public List<DataPoint> CandelCreate(List<CPrice> list, int interval) => list.GroupBy(   
        r =>  Math.Floor((r.Date - DateTime.MinValue).TotalSeconds / (double) interval),   
        r => r.Pric,   
        (k, p) => new DataPoint   
        {   
            IntervalStart = DateTime.MinValue.AddSeconds(k * interval),   
            Max = p.Max(),   
            Min = p.Min(),   
            First = p.First(),   
            Last = p.Last()   
        }).ToList();   
    
    1 person found this answer helpful.
    0 comments No comments