[ipxe-devel] iPXE DHCP bug

Laurent Apollis laurent.apollis at iguanesolutions.com
Tue Dec 1 11:02:48 UTC 2015


Hello,

We encountered a weird bug with dhcp with a recent build of iPXE.
Here are the symptoms :
The BIOS PXE make its dhcp request. So far so good :

"""
    Nov 21 16:17:11 ig-dhcpws-02 dhcpd: DHCPDISCOVER from
90:b1:1c:4d:ed:32 via 10.5.1.1
    Nov 21 16:17:11 ig-dhcpws-02 dhcpd: DHCPOFFER on 10.5.1.20 to
90:b1:1c:4d:ed:32 via 10.5.1.1
    Nov 21 16:17:15 ig-dhcpws-02 dhcpd: DHCPREQUEST for 10.5.1.20
(10.5.0.10) from 90:b1:1c:4d:ed:32 via 10.5.1.1
    Nov 21 16:17:15 ig-dhcpws-02 dhcpd: DHCPACK on 10.5.1.20 to
90:b1:1c:4d:ed:32 via 10.5.1.1
"""

Then the BIOS load the iPXE firmware from our tftp server and re
launch its own DHCP request :
"""
    Oct 21 16:17:20 ig-dhcpws-02 dhcpd: DHCPDISCOVER from
90:b1:1c:4d:ed:32 via 10.5.1.1
    Oct 21 16:17:20 ig-dhcpws-02 dhcpd: DHCPOFFER on 10.5.1.20 to
90:b1:1c:4d:ed:32 via 10.5.1.1
"""

And that's it. iPXE stuck and we never see the DHCPREQUEST. We were
not sure if the firmware never received the DHCPOFFER or if the
firmware never answer back the DHCPREQUEST. So we did a tcpdump on the
switch port and we saw pretty much the same thing :

"""
    16:17:11.516875 64:64:9b:a5:06:81 > 00:50:56:93:df:85, ethertype
IPv4 (0x0800), length 590: 10.5.0.2.67 > 10.5.0.10.67: BOOTP/DHCP,
Request from 90:b1:1c:4d:ed:32, length 548
    16:17:11.517216 00:50:56:93:df:85 > 00:00:5e:00:01:01, ethertype
IPv4 (0x0800), length 343: 10.5.0.10.67 > 10.5.1.1.67: BOOTP/DHCP,
Reply, length 301
    16:17:15.561568 64:64:9b:a5:06:81 > 00:50:56:93:df:85, ethertype
IPv4 (0x0800), length 590: 10.5.0.2.67 > 10.5.0.10.67: BOOTP/DHCP,
Request from 90:b1:1c:4d:ed:32, length 548
    16:17:15.561744 00:50:56:93:df:85 > 00:00:5e:00:01:01, ethertype
IPv4 (0x0800), length 343: 10.5.0.10.67 > 10.5.1.1.67: BOOTP/DHCP,
Reply, length 301
    16:17:20.549277 64:64:9b:a5:06:81 > 00:50:56:93:df:85, ethertype
IPv4 (0x0800), length 438: 10.5.0.2.67 > 10.5.0.10.67: BOOTP/DHCP,
Request from 90:b1:1c:4d:ed:32, length 396
    16:17:20.549459 00:50:56:93:df:85 > 00:00:5e:00:01:01, ethertype
IPv4 (0x0800), length 343: 10.5.0.10.67 > 10.5.1.1.67: BOOTP/DHCP,
Reply, length 301
"""

We start thinking that the issue was from the iPXE firmware itself (it
never sends the DHCPREQUEST). So we try rolling back commit by commit
to build each time a new iPXE firmware. We finally found the
problematic commit wich is the following one :

  d73982f098db9fdedb28a3826eb97a6832eac1e4 - [dhcp] Defer discovery if
link is blocked

What is weird is that we saw the DHCPDISCOVERY in our logs, but it's
the DHCPREQUEST that was never sent. It appears to be related to the
"blocked link" new concept introduced few commits before on :

  f3812395a261b80fe77d19ebb9045e790c434773 - [netdevice] Add a generic
concept of a "blocked link"


We only had this bug on somes servers, not all. For what it is worth.
Each time it's a server with a particular LOM implementation : Dell
blades with integrated switch and a Dell R520 with the BMC set on
shared on the LOM1. May be it is affecting the detection of the
"blocked link" ?

Best Regards,
Laurent



More information about the ipxe-devel mailing list