Packet loss on udp multicast over IPv4

Chris Taker 0 Reputation points
2023-01-30T02:44:13.3666667+00:00

HI,

I am using Windows 10 Enterprise x64 build 21H2 and suffering from udp multicast packet loss.

To explain better I am using AES67 streams with having randomly packet loss.

This happens on 2 different PC's with the same Windows builds. Also I have changed NIC cables etc. but still the same.

Any help would be appreciated.

Regards,

Chris

Windows for business | Windows Client for IT Pros | Networking | Network connectivity and file sharing
{count} votes

6 answers

Sort by: Most helpful
  1. Chris Taker 0 Reputation points
    2023-01-30T16:16:19.8533333+00:00

    Hi,

    I have checked with Wireshark both ends and it doesn't seem to be a loss from there. It is after the NIC driver this, at least it seems to be like it.

    Please let me know the steps to measure and share the logs to look at it.

    P.S. I am noticing also when the CPU load is slight high it does lose packets also.

    Best regards,

    Chris


  2. Chris Taker 0 Reputation points
    2023-02-03T01:56:49.06+00:00

    Hi Gary,

    I am uploading the receiver .etl file to give it a check.

    https://we.tl/t-rD3lYtCv4v

    Regards,

    Chris

    0 comments No comments

  3. Gary Nebbett 6,216 Reputation points
    2023-02-03T18:55:14.3733333+00:00

    Hello Chris,

    There are various points at which a message could be lost and various ways of detecting such a loss.

    The RTP sequence numbers provide a mechanism to detect losses between sender and receiver; the trace shows that one packet was lost and that more than 240 thousand packets were sent - no problem there.

    The WFP provider shows how many packets were rejected by WFP filtering (which includes firewall filtering) - no packets were rejected.

    The TCPIP provider would show how many packets were rejected by problems discovered in the TCP/IP stack, but I could not identify any RTP packets that were rejected by the intended recipient.

    The Winsock-AFD provider would show how many packets were rejected by overflowing application socket buffering limits, but no such events were present in the trace.

    The receiving application could reject packets because the RTP timestamp was outside of acceptable limits - this is an application decision that is not detectable via tracing of Microsoft components.

    Normally I would say that we need to shift focus to the receiving application but there are events in the trace data that I can't understand/explain - they are contrary to my understanding of what might be happening (e.g. data receive events at the Winsock-AFD level without TCPIP or PktMon receive events that could plausibly account for them).

    I currently have two requests:

    1. Can you quantify how many packets your application believes are being lost per second?
    2. You wrote "To add this also. When I use a hardware device to send the Aes67 stream to the receiver, which is a PC, it does not have packet loss." - can you generate and share a trace (created on the receiver) of such an experience?

    Gary


  4. Gary Nebbett 6,216 Reputation points
    2023-02-04T10:54:48.0666667+00:00

    Hello Chris,

    The trace (on the receiver) when the hardware sender is sending will be very helpful - my current guess is that the behaviour that you are seeing is due to an unavoidable characteristic of Windows.

    If we look at an analysis of the RTP packets in the first trace that you made (on the Windows sender), we see this:

    User's image

    The sender is sending an RTP packet every millisecond (on average). The "average" behaviour is pretty good (539248 packets sent in 539230 milliseconds), but the interval between packets is not uniformly one millisecond.

    This variance in the interval between RTP packets might be the cause of the behaviour that you observe. My expectation is that a trace of the hardware sender will show a much more regular interval between the RTP packets.

    Normally, the Windows "clock" ticks once every 15.625 milliseconds (i.e. 64 times per second). There is a Windows API (https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timebeginperiod) that can reduce the interval between clock ticks and the minimal interval that can be specified is one millisecond. The documentation for the API says:

    Setting a higher resolution can improve the accuracy of time-out intervals in wait functions.

    The sending application needs to send an RTP packet and wait for one millisecond before sending the next - more accuracy in this wait beyond what is observed in the trace data is probably not possible.

    Let's see what the trace of the hardware sender reveals before thinking further ahead...

    Gary

    0 comments No comments

  5. Chris Taker 0 Reputation points
    2023-02-07T18:58:15.3366667+00:00

    Hi Gary,

    I think I have found the cause to all this. When the SSD is active either reading or writing it create the packet loss. What it was not suit correct is when the packet loss happens on the log file of the app I am seeing "error in sequence of stream" and after that I tried to instant see the SSD performance, which is excellent. However I still have not manage to resolve this even changing SSD drive or controller. The most interesting is that on second PC happens the same.

    Maybe when the SSD is active create a latency and the packet lost there from kernel?

    It is a bit frustrating this but is there any change to do to not have full priority somehow to the disk?

    For the hardware sender in the next days I will make the log files.

    Regards,

    Chris

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.