I run collectd stats on many of my servers and one thing I enable on some (but not all) of them is the protocols plugin. This plugins pulls from the /proc/net/snmp
counters and submits them to the stats server.
You can view the data in a human friendly way on linux by running netstat -s
, which should give you something like this:
ben@metropolis:~$ netstat -s
Ip:
151680 total packets received
0 forwarded
0 incoming packets discarded
150066 incoming packets delivered
163892 requests sent out
8 outgoing packets dropped
Icmp:
10710 ICMP messages received
19 input ICMP message failed.
ICMP input histogram:
destination unreachable: 381
timeout in transit: 9388
echo replies: 941
11668 ICMP messages sent
0 ICMP messages failed
{etc etc}
However I always thought it was strange when I saw this:
IcmpMsg:
InType0: 941
InType3: 364
InType11: 9388
OutType3: 384
OutType69: 11265
The point of interest here is OutType69
; since 69 is not a valid ICMP code -it’s in the reserved range- and I am apparently sending a lot of them but not getting any back. I never really looked into this before since I always had more important things to do when I noticed it on my servers, especially since it never seemed to increment.
However this weekend I thought I should look into it more and decided to take a stab into the issue to see what was happening.
RFC792 (The main RFC for ICMP) ) lists the ICMP types as the following (with addendums for clarity):
ICMP TYPE NUMBERS
The Internet Control Message Protocol (ICMP) has many messages that
are identified by a "type" field.
Type Name Reference
---- ------------------------- ---------
0 Echo Reply [RFC792]
1 Unassigned [JBP]
2 Unassigned [JBP]
3 Destination Unreachable [RFC792]
4 Source Quench [RFC792]
5 Redirect [RFC792]
6 Alternate Host Address [JBP]
7 Unassigned [JBP]
8 Echo [RFC792]
9 Router Advertisement [RFC1256]
10 Router Selection [RFC1256]
11 Time Exceeded [RFC792]
12 Parameter Problem [RFC792]
13 Timestamp [RFC792]
14 Timestamp Reply [RFC792]
15 Information Request [RFC792]
16 Information Reply [RFC792]
17 Address Mask Request [RFC950]
18 Address Mask Reply [RFC950]
19 Reserved (for Security) [Solo]
20-29 Reserved (for Robustness Experiment) [ZSu]
30 Traceroute [RFC1393]
31 Datagram Conversion Error [RFC1475]
32 Mobile Host Redirect [David Johnson]
33 IPv6 Where-Are-You [Bill Simpson]
34 IPv6 I-Am-Here [Bill Simpson]
35 Mobile Registration Request [Bill Simpson]
36 Mobile Registration Reply [Bill Simpson]
37 Domain Name Request [Simpson]
38 Domain Name Reply [Simpson]
39 SKIP [Markson]
40 Photuris [Simpson]
41-255 Reserved [JBP]
Since type 69 is in the “Reserved” section, this means that something on my end is spitting out ICMP packets that are wrong.
Looking into things that could possibly doing it (and anything that could possibly trigger it that infrequently and at random sporadic times) I was stumped, so I took my concerns to the kernel code to see how it even gets this information.
First thing I did is grep for anything that could be using 0x45
or 69
in the kernel code. Unfortunately for this approach, there are many hex encoded data tables in the kernel, making the results a lot of effort to look though.
I aborted that quickly and went looking for the code in more precise ways. At that time, I had no idea how the kernel code handles ICMP packets and therefore didn’t really know what I was looking for.
Arriving at /net/ipv4/icmp.c
and doing a grep for “stats” brought me to these results:
/*
* Maintain the counters used in the SNMP statistics for outgoing ICMP
*/
void icmp_out_count(struct net *net, unsigned char type)
{
ICMPMSGOUT_INC_STATS(net, type);
ICMP_INC_STATS(net, ICMP_MIB_OUTMSGS);
}
ICMPMSGOUT_INC_STATS
is interesting. However it is fed its type upstream so the bug couldn’t be here. icmp_out_count
is used in 3 files:
linux/net/ipv4/ip_output.c
linux/net/ipv4/ping.c
linux/net/ipv4/raw.c
In ip_output.c
and raw.c
it is called in the same way however ping.c
calls it in differently:
if (!err) {
icmp_out_count(sock_net(sk), user_icmph.type);
return len;
}
In this case, the type is passed to the function which would mean that the program calling it would have to be supplying crap. This is certainly possible, but since I couldn’t see any type 69 packets in WireShark I thought it unlikely.
Let’s take a look at ip_output.c
and raw.c
now (both of the files are the same):
if (iph->protocol == IPPROTO_ICMP)
icmp_out_count(net, ((struct icmphdr *)
skb_transport_header(skb))->type);
Okay, so this one is a little more prone to failure since it depends on a few things being right.
First of all since both of these are to do with general output rather than ICMP packets only, this is going to see more mileage (raw sockets and general IP output).
I will assume iph->protocol
has the correct value at this point and isn’t triggered randomly, otherwise I would see this much more often, and you would generally see more posts online about why the SNMP counters are so wildly full of strange values.
However if we start looking at how packets are formed, the cast ((struct icmphdr *)skb_transport_header(skb)
) begins to look more suspicious:
In the case of ICMP, the part that is being sent into the SNMP counters is at the first 8 bits of the packet, in the case of IPv4 there the first two nibbles are Version (a static var) and Internet Header Length - i.e., how long the IP part of the header is. This is as good as static, unless you use Router Alert in IGMP or anything that appends IP options to packets - which basically never happens in the grand scheme of things.
IHL’s are made rather strangely, but basically they are how many 32 bit words long the packet is. Since packets are (other than as above) practically always 20 bytes, that is 5 32 bit words.
Now we have 0x40
plus 0x05
meaning that the final byte on the wire is 0x45
. or decimal 69!
So, what is clearly happening is that when skb_transport_header(skb)
is given the packet, it gives back the packet beginning at the IP part. meaning that when it is cast into an icmphdr
the IPv4 version and IHL is turned into the ICMP type, that is then given to icmp_out_count
and thus we get the reason why I keep seeing 69 in my stats.
After looking into it, it happens that I was running MTR when I was seeing all the type 69 packets. MTR uses raw sockets to send out its ICMP packets, and that ran into this bug and thus when the counters were updated, it was updated as 69 rather than the actual ICMP type it was sending!
To prove that this is the case, I hacked on a sample program that sends an IPv4 ICMP packet using raw sockets to send a slightly altered IPHL, to see if my netstat -s
changed.
(link to program)
ben@metropolis:~/Downloads$ gcc icmp4.c -o icmpt.o
icmp4.c: In function ‘main’:
icmp4.c:129:3: warning: large integer implicitly truncated to unsigned type [-Woverflow]
iphdr.ip_hl = /* IP4_HDRLEN / sizeof (uint32_t); */ 0x46;
^
ben@metropolis:~/Downloads$ sudo ./icmpt.o
Index for interface eth0 is 2
ben@metropolis:~/Downloads$ netstat -s | grep -A 6 IcmpMsg:
IcmpMsg:
InType0: 941
InType3: 475
InType11: 9388
OutType3: 487
OutType69: 11265
OutType70: 1
Bingo! OutType70! That confirms that the IPv4 version and the IPHL header are being used as the ICMP message value used in the SNMP counter code.
The skb_transport_header
function is documented to have this quirk, in a few guides like “How to filter network packets…” it is explained that this function won’t always return what someone might be expecting:
When a packet goes in from wire, it travels from physical layer, data link layer, network layer upwards, therefore it might not go through the functions defined in netfilter for skb_transport_header to work. The code above doesn’t work in this case, but there is a simple hack:
udp_header = (struct udphdr *)(skb_transport_header(skb)+20);
In the case of linux/net/ipv4/raw.c
(the path I’m pretty sure that I am going through in my case) it is not that simple and in the rare case that someone does send extra IP Header options, we need to be ready for that, lucky for us just above this code is iphlen = iph->ihl * 4;
meaning we can reuse that so that this statement becomes:
if (iph->protocol == IPPROTO_ICMP)
icmp_out_count(net, ((struct icmphdr *)
- skb_transport_header(skb))->type);
+ skb_transport_header(skb) + iphlen)->type);
Or at least I think so - I don’t program kernels daily, but I am pretty sure this will fix the problem here.
I’ve sent a patch upstream to see if this is the right way to fix it, and if it is, hey I just got code in the Kernel!