[ipxe-devel] iPXE DHCP bug
Laurent Apollis
laurent.apollis at iguanesolutions.com
Tue Dec 1 11:02:48 UTC 2015
Hello,
We encountered a weird bug with dhcp with a recent build of iPXE.
Here are the symptoms :
The BIOS PXE make its dhcp request. So far so good :
"""
Nov 21 16:17:11 ig-dhcpws-02 dhcpd: DHCPDISCOVER from
90:b1:1c:4d:ed:32 via 10.5.1.1
Nov 21 16:17:11 ig-dhcpws-02 dhcpd: DHCPOFFER on 10.5.1.20 to
90:b1:1c:4d:ed:32 via 10.5.1.1
Nov 21 16:17:15 ig-dhcpws-02 dhcpd: DHCPREQUEST for 10.5.1.20
(10.5.0.10) from 90:b1:1c:4d:ed:32 via 10.5.1.1
Nov 21 16:17:15 ig-dhcpws-02 dhcpd: DHCPACK on 10.5.1.20 to
90:b1:1c:4d:ed:32 via 10.5.1.1
"""
Then the BIOS load the iPXE firmware from our tftp server and re
launch its own DHCP request :
"""
Oct 21 16:17:20 ig-dhcpws-02 dhcpd: DHCPDISCOVER from
90:b1:1c:4d:ed:32 via 10.5.1.1
Oct 21 16:17:20 ig-dhcpws-02 dhcpd: DHCPOFFER on 10.5.1.20 to
90:b1:1c:4d:ed:32 via 10.5.1.1
"""
And that's it. iPXE stuck and we never see the DHCPREQUEST. We were
not sure if the firmware never received the DHCPOFFER or if the
firmware never answer back the DHCPREQUEST. So we did a tcpdump on the
switch port and we saw pretty much the same thing :
"""
16:17:11.516875 64:64:9b:a5:06:81 > 00:50:56:93:df:85, ethertype
IPv4 (0x0800), length 590: 10.5.0.2.67 > 10.5.0.10.67: BOOTP/DHCP,
Request from 90:b1:1c:4d:ed:32, length 548
16:17:11.517216 00:50:56:93:df:85 > 00:00:5e:00:01:01, ethertype
IPv4 (0x0800), length 343: 10.5.0.10.67 > 10.5.1.1.67: BOOTP/DHCP,
Reply, length 301
16:17:15.561568 64:64:9b:a5:06:81 > 00:50:56:93:df:85, ethertype
IPv4 (0x0800), length 590: 10.5.0.2.67 > 10.5.0.10.67: BOOTP/DHCP,
Request from 90:b1:1c:4d:ed:32, length 548
16:17:15.561744 00:50:56:93:df:85 > 00:00:5e:00:01:01, ethertype
IPv4 (0x0800), length 343: 10.5.0.10.67 > 10.5.1.1.67: BOOTP/DHCP,
Reply, length 301
16:17:20.549277 64:64:9b:a5:06:81 > 00:50:56:93:df:85, ethertype
IPv4 (0x0800), length 438: 10.5.0.2.67 > 10.5.0.10.67: BOOTP/DHCP,
Request from 90:b1:1c:4d:ed:32, length 396
16:17:20.549459 00:50:56:93:df:85 > 00:00:5e:00:01:01, ethertype
IPv4 (0x0800), length 343: 10.5.0.10.67 > 10.5.1.1.67: BOOTP/DHCP,
Reply, length 301
"""
We start thinking that the issue was from the iPXE firmware itself (it
never sends the DHCPREQUEST). So we try rolling back commit by commit
to build each time a new iPXE firmware. We finally found the
problematic commit wich is the following one :
d73982f098db9fdedb28a3826eb97a6832eac1e4 - [dhcp] Defer discovery if
link is blocked
What is weird is that we saw the DHCPDISCOVERY in our logs, but it's
the DHCPREQUEST that was never sent. It appears to be related to the
"blocked link" new concept introduced few commits before on :
f3812395a261b80fe77d19ebb9045e790c434773 - [netdevice] Add a generic
concept of a "blocked link"
We only had this bug on somes servers, not all. For what it is worth.
Each time it's a server with a particular LOM implementation : Dell
blades with integrated switch and a Dell R520 with the BMC set on
shared on the LOM1. May be it is affecting the detection of the
"blocked link" ?
Best Regards,
Laurent
More information about the ipxe-devel
mailing list