Someone -- I thought it was Kristof, but he claims not to have this problem
so it must be someone else -- told me a while ago that Telenet's DHCP server
"exhibits weird behaviour". That sort of mystery certainly gets the
hyperactive mind interested.
For totally unrelated reasons, I found myself looking at a packet capture of
DHCP traffic on a Telenet connection. Indeed, there was something very
strange in there. The DHCP client would get a perfectly fine lease with
perfectly reasonably renewing and rebinding times. When the renewing timer
(T1) expired, the client would unicast a DHCPREQEST to the server and
expect a unicast DHCPACK back. Only the DHCPACK would never arrive,
and the client would retransmit the unicast DHCPREQUEST messages until the
rebinding timer (T2) expired. At that time, the client would broadcast a
DHCPREQUEST after which the DHCPACK would arrive.
The fact that the DHCPACK messages came through the DHCP relay server put
me on a side-track briefly. I discovered that the DHCP server (mentioned in
option 54) would not respond to my DHCP requests. While it makes perfect
sense to protect a DHCP server from clients, you do want your clients to be
able to get packets to them somehow.
I sent some packet captures to a contact inside Telenet (thanks ;-) I couldn't
imagine trying to explain this to a helldesk!) wondering if they'd put too
sharp an access control list between me and the DHCP server (recently --
because I hadn't seen the problem before). After some digging, they found
that I was sending my unicast DHCPREQUEST messages with a random source
port number. From my reading of the RFC, this is "allowed", but no one else
does it. It turns out that Telenet does some sanity checking (sensible
precaution) on DHCP messages before allowing them to go to the DHCP server.
This sanity checking does not like (or recognize, presumably) DHCP messages
with a source port other than bootpc (68).
FreeBSD's dhclient is a rather old version of ISC's reference
implementation, simplified by OpenBSD. I found that OpenBSD has had a patch
for a couple of years that purported to fix this behaviour. When I ported
this patch to FreeBSD however, I found that sendmsg would return
EINVAL, which was not documented to ever happen.
Again I wondered how people without source code to their operating systems get
through the day? Do they resort to alcohol and panic at this stage? I used
DDB to set a breakpoint on sendmsg and stepped through briefly,
expecting it to blow up somewhere quickly when copying in the iovec or so. No
such luck however, and I found myself in sosend_generic, which is not so
much fun to step through without symbol information, so I set up remote
debugging so I could use ddd.
Eventually, I found my way to rip_output and found that my EINVAL came
from here:
if (((ip->ip_hl != (sizeof (*ip) >> 2)) && inp->inp_options)
|| (ip->ip_len > m->m_pkthdr.len)
|| (ip->ip_len < (ip->ip_hl << 2))) {
INP_RUNLOCK(inp);
m_freem(m);
return (EINVAL);
}
Oh dear...:
(gdb) p m->M_dat.MH.MH_pkthdr.len
$6 = 328
(gdb) p ip->ip_len
$7 = 18433
Obviously (to the trained -- or strained -- eye which sees this kind of thing
often), 18433 and 328 are strikingly similar. Indeed - it helps if you put
the bytes in the right order!
For hysterical raisins, the raw socket interface on BSD-derived network stacks
expects the ip_len field of the IP header included when IP_HDRINCL is
sent to be in host byte order. dhclient used to only send packets with
headers through the BPF, which will put the packet on the wire exactly as
given (ie: the ip_len needs to be in the right order). For reasons which
don't seem to be explained in CVS history, OpenBSD decided to change this
behaviour in their network stack (making it differ from every other network
stack and many books written about sockets).
To make a very long story short: I committed revision 198352 to make
dhclient on FreeBSD work in networks which put sharp teeth between DHCP
clients and servers. Debugging the problem also kept me out of trouble for a
couple of hours.
I'm told that finding the cause of weird errors in the protocol stack is now
significantly easier with DTrace. I will have to find some time to play with
that. While ddd "works", it's not exactly the most pleasant tool to work
with.
Entirely aside: I'm still not convinced that "sharp teeth" should care about
the source port of unicast DHCPREQUEST messages, but I'm happy to accept
that if everyone uses port 68, there's no reason to gratuitously differ from
that. Thanks to the Telenetists for helping me look into this.