A Minecraft, IPv6, PMTUD and MSS Clamping adventure.

My child likes minecraft. They play heavily modded versions via Prism Launcher which normally just works great. But not today. Today they asked me to take a look at their installation because minecraft wasn’t starting anymore and always failed to contact api.minecraftservices.com. As I just yesterday rolled out IPv6 to my clients I immediately had a bad gut feeling that this might most likely be an issue introduced by IPv6, so I quickly tested my thesis with curl by retrieving just the HTTP headers (-I) from the server using specifically IPv4

curl -4I https://api.minecraftservices.com

and IPv6

curl -6I https://api.minecraftservices.com

which immediately yielded strange results. While the IPv4 test smoothly delivered results, IPv6 was hanging. So I tested availability with nc

nc -6v api.minecraftservices.com 443 </dev/null
Connection to api.minecraftservices.com port 443 [tcp/https] succeeded!

which left me puzzled but let me explain what that means: for some reason the HTTP call fails but a mere TCP handshake succeeds proving that there absolutely is at least a bare minimum of connectivity. I went back to curl but this time more verbose in order to track the issue with the HTTP call further down

curl -6vI https://api.minecraftservices.com

that showed that everything was working fine up until a point in the TLS protocol where the connection seemingly just paused. So I went on to have a peek with tcpdump on my firewalls WAN interface which showed the following packet sequence of a nicely working TCP session:

2001:db8::cafe.59435 > 2603:1061:14:133::1.443: SWE 3197130911:3197130911(0) win 65535 <mss 1440,nop,wscale 6,nop,nop,timestamp 2758558645 0,sackOK,eol> [flowlabel 0x50d00]
2603:1061:14:133::1.443 > 2001:db8::cafe.59435: SE 130303006:130303006(0) ack 3197130912 win 65535 <mss 1400,sackOK,timestamp 2486300142 2758558645,nop,wscale 9> [flowlabel 0xd7bb0]
2001:db8::cafe.59435 > 2603:1061:14:133::1.443: . ack 1 win 2061 <nop,nop,timestamp 2758558689 2486300142> [flowlabel 0x50d00]
2001:db8::cafe.59435 > 2603:1061:14:133::1.443: P 1:336(335) ack 1 win 2061 <nop,nop,timestamp 2758558689 2486300142> [class 0x2] [flowlabel 0x50d00]
2603:1061:14:133::1.443 > 2001:db8::cafe.59435: . ack 336 win 131 <nop,nop,timestamp 2486300186 2758558689> [flowlabel 0xd7bb0]
2603:1061:14:133::1.443 > 2001:db8::cafe.59435: P 1:100(99) ack 336 win 131 <nop,nop,timestamp 2486300187 2758558689> [class 0x2] [flowlabel 0xd7bb0]
2001:db8::cafe.59435 > 2603:1061:14:133::1.443: . ack 100 win 2060 <nop,nop,timestamp 2758558735 2486300187> [flowlabel 0x50d00]
2001:db8::cafe.59435 > 2603:1061:14:133::1.443: P 336:342(6) ack 100 win 2060 <nop,nop,timestamp 2758558735 2486300187> [class 0x2] [flowlabel 0x50d00]
2001:db8::cafe.59435 > 2603:1061:14:133::1.443: P 342:710(368) ack 100 win 2060 <nop,nop,timestamp 2758558736 2486300187> [class 0x2] [flowlabel 0x50d00]
2603:1061:14:133::1.443 > 2001:db8::cafe.59435: . ack 710 win 133 <nop,nop,timestamp 2486300233 2758558735> [flowlabel 0xd7bb0]

but then the trouble started:

2603:1061:14:133::1.443 > 2001:db8::cafe.59435: P 2956:4196(1240) ack 710 win 133 <nop,nop,timestamp 2486300234 2758558735> [class 0x2] [flowlabel 0xd7bb0]
2603:1061:14:133::1.443 > 2001:db8::cafe.59435: P 7052:7414(362) ack 710 win 133 <nop,nop,timestamp 2486300236 2758558735> [class 0x2] [flowlabel 0xd7bb0]

after seeing bytes 1:100 all of a sudden bytes 2956:4196 and 7052:7414 arrive without ever seeing 100:2956. Which my client duly signaled to the sender with repeating the ACK for 1:100 and SACK for 2956:4192 and 7052:7414

2001:db8::cafe.59435 > 2603:1061:14:133::1.443: . ack 100 win 2060 <nop,nop,timestamp 2758558783 2486300233,nop,nop,sack 1 {2956:4196} > [flowlabel 0x50d00]
2001:db8::cafe.59435 > 2603:1061:14:133::1.443: . ack 100 win 2060 <nop,nop,timestamp 2758558783 2486300233,nop,nop,sack 2 {7052:7414} {2956:4196} > [flowlabel 0x50d00]

but api.minecraftservices.com seemingly never resent the missing packets

2603:1061:14:133::1.443 > 2001:db8::cafe.59435: F 7414:7414(0) ack 710 win 133 <nop,nop,timestamp 2486305186 2758558783> [flowlabel 0xfda74]
2001:db8::cafe.59435 > 2603:1061:14:133::1.443: . ack 100 win 2060 <nop,nop,timestamp 2758563735 2486300233,nop,nop,sack 2 {7052:7415} {2956:4196} > [flowlabel 0x50d00]
2001:db8::cafe.59435 > 2603:1061:14:133::1.443: . ack 100 win 2060 [flowlabel 0x50d00]
2603:1061:14:133::1.443 > 2001:db8::cafe.59435: . ack 710 win 133 <nop,nop,timestamp 2486365231 2758563735> [flowlabel 0xc3403]
2001:db8::cafe.59435 > 2603:1061:14:133::1.443: . ack 100 win 2060 [flowlabel 0x50d00]
2603:1061:14:133::1.443 > 2001:db8::cafe.59435: . ack 710 win 133 <nop,nop,timestamp 2486425278 2758563735> [flowlabel 0xb2c71]
2001:db8::cafe.59435 > 2603:1061:14:133::1.443: R 710:710(0) ack 100 win 2060 [flowlabel 0x50d00]

So there were huge gaps. Unable to wrap my head around the packet trace I consulted claude with a short description of the issue and the packet trace. Claude soon was suspecting that PMTUD was broken or my router was dropping the ICMP TOOBIG messages. So I let him check my pf(4) config and he agreed that PMTUD should be working as I had the necessary rule in my rulebase to pass the messages around

pass inet6 proto icmp6 icmp6-type { unreach, toobig, echoreq, echorep }

Quickly he came up with the following hypothesis: if something along the path drops the ICMP messages, or they have never been generated to begin with, my signaled MSS of 1440 bytes would mean trouble as the packets could outgrow what I’d be able to receive (1500 bytes) which can be calculated as follows: 1440 (MSS) + 20 (TCP header) + 40 (IP6 header) + 8 (PPPoE) > 1500. As mentioned before we never saw the bytes 100:2956 which is a total of 2856 bytes that most likely had been split into 2 packets of a payload of 1440 (my advertised MSS) and 1416 bytes. Packets exceeding the allowed size would also perfectly explain why the TCP handshake with nc(1) succeeded while curl(1) failed.

Now that’s a lot of information to process and on a topic which I often find to be not well understood even among fellow IT colleagues, so let me explain a thing or two.

IPv6 relies heavily on PMTUD as IPv6 prohibits any routing party to change the size of the packet. If a system encounters a packet that’s too large it’s supposed to signal the problem to the sender with an ICMPv6 message of type 2: Packet Too Big (PTB). The sender then reduces the size of subsequent packets it sends to that destination – for TCP by lowering the effective segment size, for protocols that permit it, by fragmenting at the IP layer. As it is impossible to know the size of the maximum transferable unit size (MTU, typically 1500 on your LAN or 9000 if you are using jumbo frames) somebody invented a protocol for exactly that: path MTU discovery (PMTUD) which is (as of the time of this writing) standardized in RFC8201.

Amongst other things, the MTU effectively limits the size of UDP datagrams. But, as you might have noticed, the Prism Launcher tried to reach api.minecraftservices.com via TCP. TCP doesn’t fragment at the IP layer instead it negotiates a maximum segment size (MSS) at connection setup, derived from the interface MTU. In the first packet of the above trace you can see that I was signaling my MSS as 1440 while the other party advertised 1400 as its MSS. So api.minecraftservices.com was working under the assumption that I was able to receive a payload of 1440 bytes. But as I was showing with my calculation above, this would exceed my possibility to receive that packet as, at the very minimum, the last leg of the flow would need additional 8 bytes for the PPPoE encapsulation leading to a packet too large.

The solution to broken PMTUD is a mechanism known as MSS clamping which reduces the advertised mss of a SYN to the specified length, in my case I clamped the MSS at 1432 bytes (1500 (MTU) - 8 (PPPoE) - 40 (IP6 header)) - 20 (TCP header) = 1432, which can be done via the following configuration in your pf.conf(5):

match on egress inet6 scrub (max-mss 1432)

Sure enough, after applying the adjusted rulebase everything was working as expected. But who’s to blame? Me? Microsoft? Somebody else? I can’t say for sure but I can tell that PMTUD is not completely broken on my end as I am able to successfully discover the PMTU between myself and www.heise.de which I measured with ttl:

sudo ./ttl --pmtud --json -c 50 www.heise.de | jq .pmtud
{
  "min_size": 1280,
  "max_size": 1492,
  "current_size": 1500,
  "successes": 0,
  "failures": 0,
  "discovered_mtu": 1492,
  "phase": "Complete"
}

But no such luck with api.minecraftservices.com

sudo ./ttl --pmtud --json -c 50 api.minecraftservices.com | jq .pmtud
{
  "min_size": 1280,
  "max_size": 1500,
  "current_size": 1500,
  "successes": 0,
  "failures": 0,
  "discovered_mtu": null,
  "phase": "WaitingForDestination"
}

which leads me to believe that either microsoft or a party in between the two of us, which is not part of the path between www.heise.de and me, drops the ICMP messages but this is but a guess. Anyway, my child can enjoy their minecraft and I got to have a fun little troubleshooting session.

Hope this helps someone.

Keyboard Navigation

Navigation

Global Actions