Making Sense of Traceroute
If you’ve poked around the PNRP netsh contexts (and you probably have if you read this blog) I bet you’ve run into the traceroute command. I’m talking about netsh p2p pnrp peer traceroute. The help text will tell you that the command “resolves a peer name with path tracing” but it won’t explain the format of the output, making it very hard to use the command for anything constructive. In this article, I’ll explain the output of the command. Next time, I’ll walk you through some real debugging situations where I helped customers diagnose problems with traceroute.
My Setup
On my desktop I register the name 0.tylersReg: |
C:\>netshnetsh>p2p pnrp peer add reg 0.tylersReg Global_Ok.
|
On my laptop I resolve 0.tylersReg using traceroute: |
C:\>netsh p2p pnrp peer traceroute 0.tylersReg Global_ Resolve started... Found: Addresses: [2001:0000:4136:e38e:3493:3b5a:bc55:afc2]%0:0 tcp Extended payload (string): Extended payload (binary): Resolve Path:[2001:0000:4136:e38e:38ca:1cbc:e7ec:157a]:3540, (0), (0) Accepted[2001:0000:4136:e388:0819:1d1b:b88f:def0]:3540, (4), (47) Accepted[2002:ce7e:3b87:0000:0000:0000:ce7e:3b87]:3540, (6), (2000) Rejected (Unreachable)[2001:0000:4136:e388:0819:1d1b:b88f:def0]:3540, (4), (31) Accepted[2001:0000:4136:e390:3cca:079d:e768:2771]:3540, (5), (1094) Accepted[2001:0000:4136:e38c:0851:f227:7c94:6422]:3540, (9), (15) Accepted[2001:0000:4136:e38e:3493:3b5a:bc55:afc2]:3540, (129), (47) Rejected (Dead end)[2001:0000:4136:e38e:3493:3b5a:bc55:afc2]:3540, (0), (47) |
Success! I found the name.
The resolve path lists all the nodes PNRP spoke to in finding my target. Each entry in the list represents one node. You can count 8 entries in the list.
I’ll explain what the columns in the list mean by breaking down an example:
[2001:0000:4136:e390:3cca:079d:e768:2771]:3540, (5), (1094)
Accepted
Do you see this entry in the resolve path? It’s the fifth one from the top.
The Fields
1. Service endpoint of the node I’m talking to
[2001:0000:4136:e390:3cca:079d:e768:2771]:3540, (5), (1094)
Accepted
This is the ipv6 address of the node I’m talking to and the port on which his PNRP service is listening. The standard PNRP service port is 3540.
2. Number of bits in common between this node’s ID and the ID I’m searching for
[2001:0000:4136:e390:3cca:079d:e768:2771]:3540, (5) , (1094)
Accepted
PNRP IDs are 256 bit numbers, remember? When I search for a name, I first convert that name into a PNRP ID. When I talk to other computers in the cloud, I compare their PNRP IDs with the one I’m looking for. As I go farther along with my search, I expect to hit nodes having IDs numerically close to my target. They should have more and more significant bits in common.
In the example above, my target (0.tylersReg on my desktop) and the current hop share the same 5 most significant bits.
Here’s another example taken from the traceroute.
[2001:0000:4136:e38e:3493:3b5a:bc55:afc2]:3540, (129) , (47)
Rejected (Dead end)
This time I matched 129 bits. That’s a lot! In fact, this hop is my target. Remember that a 256 bit PNRP ID has two parts. The most significant 128 bits (called the p2p ID) are a hash of the name. The bottom 128 bits (service location) help us do some locality optimization (among other things). You can think of the the service location bits as being random so it’s only important that we match the top 128 bits.
If a hop in the traceroute has 128 bits or more in common with my target, it is my target.
You’re probably wondering why this hop isn’t the very last one in the list. Go check. It’s the second last hop in the list that matches 129 bits. What gives?
The last two hops in the traceroute are actually the same node. First PNRP finds its target. Then it asks that target to formally prove that it is, in fact, publishing the name it was looking for. This produces an extra hop in the traceroute.
3. Round trip time
[2001:0000:4136:e390:3cca:079d:e768:2771]:3540, (5), (1094)
Accepted
This is the number of milliseconds that it took for me to receive a response to my lookup message. In this example, the round trip time was 1094ms.
PNRP moves on if a hop takes more than 2 seconds, so hops having a round trip time greater than 2000 likely timed out. More on this in a moment …
4. Hop status
[2001:0000:4136:e390:3cca:079d:e768:2771]:3540, (5), (1094)
Accepted
This is how PNRP interprets the hop. There are a number of possibilities.
Accepted: This hop helped me get closer to my target
Rejected: This hop didn’t help and PNRP will ignore it
Rejected (Unreachable): This node didn’t respond in time. It might have powered down, or there might be connectivity problems preventing us from talking.
There are more possibilities, but we’ll discuss these in another blog post.
Let’s take a look at one more hop before we go:
[2002:ce7e:3b87:0000:0000:0000:ce7e:3b87]:3540, (6), (2000)
Rejected (Unreachable)
This hop timed out. See, it took more than 2000ms for the node to get back to me, so PNRP rejected the hop and moved on.
We’ll dig even deeper next time.
Have fun!
-Tyler
Sorry I’ve been gone for a while. I’ve been busy working on some neat new stuff. I can’t wait to tell you about it!