Packet loss on udp multicast over IPv4

Question

Packet loss on udp multicast over IPv4

Chris Taker 0

HI,

I am using Windows 10 Enterprise x64 build 21H2 and suffering from udp multicast packet loss.

To explain better I am using AES67 streams with having randomly packet loss.

This happens on 2 different PC's with the same Windows builds. Also I have changed NIC cables etc. but still the same.

Any help would be appreciated.

Regards,

Chris

Gary Nebbett 6,216 Reputation points

2023-01-30T13:03:51.2233333+00:00

Hello Chris,

Have you made and analysed a network trace of the traffic to/from your PCs?

I think that that would be the best "next step" in understanding the problem. If you are prepared to share the captured data, then I would look at it.

Gary

6 answers

Your answer

Gary Nebbett 6,216 Reputation points

2023-01-30T13:03:51.2233333+00:00

Hello Chris,

Have you made and analysed a network trace of the traffic to/from your PCs?

I think that that would be the best "next step" in understanding the problem. If you are prepared to share the captured data, then I would look at it.

Gary

Answer 1

Chris Taker 0

Hi,

I have checked with Wireshark both ends and it doesn't seem to be a loss from there. It is after the NIC driver this, at least it seems to be like it.

Please let me know the steps to measure and share the logs to look at it.

P.S. I am noticing also when the CPU load is slight high it does lose packets also.

Best regards,

Chris

Gary Nebbett 6,216 Reputation points

2023-01-30T16:43:07.3833333+00:00

Hello Chris,

Does that mean that you were able to correlate packets sent by the sender with packets detected by Wireshark at the receiver? If so, that is a great first step.

The command to start a trace that I would suggest is pktmon start --capture --comp nics --trace --provider Microsoft-Windows-WFP --keywords 0x3FFFFFFFFFFF --provider Microsoft-Windows-TCPIP --keywords 0x3FFFFFFFFFFF --level 17 --file-name why.etl

That command should capture the first 128 bytes of each packet sent/received (similar to a Wireshark capture), messages from WFP (Windows Filtering Platform - the technology behind Windows Firewall) and internal operations of the TCPIP stack.

The command pktmon stop stops the trace.

I am not familiar with AES67 and the specification does not seem to be freely available (https://www.aes.org/publications/standards/search.cfm?docID=96 - available for $100 to non-AES members). Let's see what is possible without access to that source.

Gary

Chris Taker 0

Thanks Gary for your response.

https://we.tl/t-tvhFTmc5JJ

On the link above is the PktMon.etl, please let me know if it is correct.

I have this below is that correct?

C:\Windows\system32>pktmon start --capture --comp nics --trace --provider Microsoft-Windows-WFP

Logger Parameters:
    Logger name:        PktMon
    Logging mode:       Circular
    Log file:           C:\Windows\system32\PktMon.etl
    Max file size:      512 MB
    Memory used:        768 MB

Collected Data:
    Packet counters, packet capture, events

Capture Type:
    All packets

Monitored Components:
    Network adapters

Packet Filters:
    None

C:\Windows\system32>pktmon start --keywords 0x3FFFFFFFFFFF --provider Microsoft-Windows-TCPIP
Error: Parameter '--keywords' modifies parameter '--provider'.

C:\Windows\system32>pktmon start --keywords 0x3FFFFFFFFFFF --level 17 --file-name why.etl
Error: Parameter '--keywords' modifies parameter '--provider'.

Gary Nebbett 6,216 Reputation points

2023-01-31T12:58:33.1933333+00:00
Hello Chris,

The "command" to start a trace is the single string below - it should not be split into 3 parts:

pktmon start --capture --comp nics --trace --provider Microsoft-Windows-WFP --keywords 0x3FFFFFFFFFFF --provider Microsoft-Windows-TCPIP --keywords 0x3FFFFFFFFFFF --level 17 --provider Microsoft-Windows-Winsock-AFD --keywords 0x3FFFFFFFFFFF --file-name why.etl

C:\Users\Gary\Home\2023>pktmon start --capture --comp nics --trace --provider Microsoft-Windows-WFP --keywords 0x3FFFFFFFFFFF --provider Microsoft-Windows-TCPIP --keywords 0x3FFFFFFFFFFF --level 17 --provider Microsoft-Windows-Winsock-AFD --keywords 0x3FFFFFFFFFFF --file-name why.etl Logger Parameters: Logger name: PktMon Logging mode: Circular Log file: C:\Users\Gary\Home\2023\why.etl Max file size: 512 MB Memory used: 256 MB Collected Data: Packet counters, packet capture, events Capture Type: All packets Monitored Components: Network adapters Packet Filters: None C:\Users\Gary\Home\2023>pktmon stop Flushing logs... Merging metadata... Log file: C:\Users\Gary\Home\2023\why.etl (No events lost)

The trace should be made on the receiving system.

When you wrote the following, how did you determine that there did not seem to be a loss?

I have checked with Wireshark both ends and it doesn't seem to be a loss from there. It is after the NIC driver this, at least it seems to be like it.

The reason that I ask is that there were 539248 RTP (Real Time Protocol) packets in the PktMon.etl trace and no automatic mechanism that I could find to detect if there were any "holes" in the RTP sequence numbers. With a bit of effort, I managed to check the trace and no holes were present (but the trace was made on the sender, so that is to be expected).

How are you determining that packets are being lost - is it just your acoustic impression of the received signal?

At the moment, I see the following steps:

Perform the pktmon trace on the receiver to see if any RTP packets are dropped/lost in transit or in the Windows kernel.

If no (or only a few) packets are dropped in the pktmon trace, perform a different type of trace to see if any packets are discarded from user-mode buffers.

If the scheduling of the user-mode application seems to be the culprit, consider whether I have enough know-how or ideas to proceed further.

Gary
Chris Taker 0 Reputation points

2023-02-02T00:45:27.4766667+00:00

Hi Gary,

Indeed the command I run it on the sender PC.

For the Wireshark I compare the time I have packet loss from the sender and the receiver and that is why I assume is before NIC, maybe I am wrong though.

To see if I have packet loss I see have notifications on the app of the software I am running.

To add this also. When I use a hardware device to send the Aes67 stream to the receiver, which is a PC, it does not have packet loss. Only when I send from the PC with the drivers. Also I have tested two different drivers with the same packet loss.

Also to notice this, when the CPU load is above 10% it start intensively to loss packets. When the PC is on idle it lose packets but more rare. That is why I am telling that something is wrong with priorities.

When was using Win 10 build 1809 until May 2022 was not having this issue, after the June updates starting have the packet loss. It is the same as 21H2 now days.

Maybe some "trick" to have priority on NIC card and not "halt" when the CPU load is more intensive? Also I have enabled the Ultimate power plan on both the PC's.

My PC has 24 CPU threads and 48 GB ram the sender one.

The pktmon I will run it on the receiver later today and I will upload the result.

Regards,

Chris

Answer 2

Chris Taker 0

Hi Gary,

I am uploading the receiver .etl file to give it a check.

https://we.tl/t-rD3lYtCv4v

Regards,

Chris

Answer 3

Hello Chris,

There are various points at which a message could be lost and various ways of detecting such a loss.

The RTP sequence numbers provide a mechanism to detect losses between sender and receiver; the trace shows that one packet was lost and that more than 240 thousand packets were sent - no problem there.

The WFP provider shows how many packets were rejected by WFP filtering (which includes firewall filtering) - no packets were rejected.

The TCPIP provider would show how many packets were rejected by problems discovered in the TCP/IP stack, but I could not identify any RTP packets that were rejected by the intended recipient.

The Winsock-AFD provider would show how many packets were rejected by overflowing application socket buffering limits, but no such events were present in the trace.

The receiving application could reject packets because the RTP timestamp was outside of acceptable limits - this is an application decision that is not detectable via tracing of Microsoft components.

Normally I would say that we need to shift focus to the receiving application but there are events in the trace data that I can't understand/explain - they are contrary to my understanding of what might be happening (e.g. data receive events at the Winsock-AFD level without TCPIP or PktMon receive events that could plausibly account for them).

I currently have two requests:

Can you quantify how many packets your application believes are being lost per second?
You wrote "To add this also. When I use a hardware device to send the Aes67 stream to the receiver, which is a PC, it does not have packet loss." - can you generate and share a trace (created on the receiver) of such an experience?

Gary

Chris Taker 0 Reputation points

2023-02-03T19:13:23.19+00:00

Hi Gary,

Thank you for your response.

For packet loss is not per second it depends from when starting the app it starting losing one packet after 30 seconds and after that random but not every second. That is why I am saying depends on cpu load, HDD activity etc. It has not a stable pattern of the packet loss and that is why is difficult to find what is going on. The only thing I seeing is like I said from above CPU load HDD etc. and something more simple, open My computer folder and close it lose one packet which is strange enough.

As for the hardware sender I will make a trace and send also for compare.

I have try also some things on Windows side but all failed, example disable some services etc. with the same response.

It feels like the Windows are not stable or not prioritize correct the RTP packets? or somewhere create instant latency and have all this?

I am out of ideas also.

Regards,

Chris

Answer 4

Hello Chris,

The trace (on the receiver) when the hardware sender is sending will be very helpful - my current guess is that the behaviour that you are seeing is due to an unavoidable characteristic of Windows.

If we look at an analysis of the RTP packets in the first trace that you made (on the Windows sender), we see this:

User's image

The sender is sending an RTP packet every millisecond (on average). The "average" behaviour is pretty good (539248 packets sent in 539230 milliseconds), but the interval between packets is not uniformly one millisecond.

This variance in the interval between RTP packets might be the cause of the behaviour that you observe. My expectation is that a trace of the hardware sender will show a much more regular interval between the RTP packets.

Normally, the Windows "clock" ticks once every 15.625 milliseconds (i.e. 64 times per second). There is a Windows API (https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timebeginperiod) that can reduce the interval between clock ticks and the minimal interval that can be specified is one millisecond. The documentation for the API says:

Setting a higher resolution can improve the accuracy of time-out intervals in wait functions.

The sending application needs to send an RTP packet and wait for one millisecond before sending the next - more accuracy in this wait beyond what is observed in the trace data is probably not possible.

Let's see what the trace of the hardware sender reveals before thinking further ahead...

Gary

Answer 5

Hi Gary,

I think I have found the cause to all this. When the SSD is active either reading or writing it create the packet loss. What it was not suit correct is when the packet loss happens on the log file of the app I am seeing "error in sequence of stream" and after that I tried to instant see the SSD performance, which is excellent. However I still have not manage to resolve this even changing SSD drive or controller. The most interesting is that on second PC happens the same.

Maybe when the SSD is active create a latency and the packet lost there from kernel?

It is a bit frustrating this but is there any change to do to not have full priority somehow to the disk?

For the hardware sender in the next days I will make the log files.

Regards,

Chris

Share via

Packet loss on udp multicast over IPv4

6 answers

Your answer