[ipxe-devel] iPXE DHCP bug
Wissam Shoukair
wissams at mellanox.com
Tue Dec 1 11:32:31 UTC 2015
Hi Laurent,
Are the clients that fails, connected to a switch with RSTP/MSTP configured?
If yes, is the client's port (in the switch) configured as 'port edge' (port fast)?
if no, try to configure the ports (which are connected to end points) to be with 'port edge' configuration, or try to disable the RSTP/MSTP in the switch (for the sake of the test)
Wissam
-----Original Message-----
From: ipxe-devel-bounces at lists.ipxe.org [mailto:ipxe-devel-bounces at lists.ipxe.org] On Behalf Of Laurent Apollis
Sent: Tuesday, December 01, 2015 13:03
To: ipxe-devel at lists.ipxe.org
Subject: [ipxe-devel] iPXE DHCP bug
Hello,
We encountered a weird bug with dhcp with a recent build of iPXE.
Here are the symptoms :
The BIOS PXE make its dhcp request. So far so good :
"""
Nov 21 16:17:11 ig-dhcpws-02 dhcpd: DHCPDISCOVER from
90:b1:1c:4d:ed:32 via 10.5.1.1
Nov 21 16:17:11 ig-dhcpws-02 dhcpd: DHCPOFFER on 10.5.1.20 to
90:b1:1c:4d:ed:32 via 10.5.1.1
Nov 21 16:17:15 ig-dhcpws-02 dhcpd: DHCPREQUEST for 10.5.1.20
(10.5.0.10) from 90:b1:1c:4d:ed:32 via 10.5.1.1
Nov 21 16:17:15 ig-dhcpws-02 dhcpd: DHCPACK on 10.5.1.20 to
90:b1:1c:4d:ed:32 via 10.5.1.1
"""
Then the BIOS load the iPXE firmware from our tftp server and re launch its own DHCP request :
"""
Oct 21 16:17:20 ig-dhcpws-02 dhcpd: DHCPDISCOVER from
90:b1:1c:4d:ed:32 via 10.5.1.1
Oct 21 16:17:20 ig-dhcpws-02 dhcpd: DHCPOFFER on 10.5.1.20 to
90:b1:1c:4d:ed:32 via 10.5.1.1
"""
And that's it. iPXE stuck and we never see the DHCPREQUEST. We were not sure if the firmware never received the DHCPOFFER or if the firmware never answer back the DHCPREQUEST. So we did a tcpdump on the switch port and we saw pretty much the same thing :
"""
16:17:11.516875 64:64:9b:a5:06:81 > 00:50:56:93:df:85, ethertype
IPv4 (0x0800), length 590: 10.5.0.2.67 > 10.5.0.10.67: BOOTP/DHCP, Request from 90:b1:1c:4d:ed:32, length 548
16:17:11.517216 00:50:56:93:df:85 > 00:00:5e:00:01:01, ethertype
IPv4 (0x0800), length 343: 10.5.0.10.67 > 10.5.1.1.67: BOOTP/DHCP, Reply, length 301
16:17:15.561568 64:64:9b:a5:06:81 > 00:50:56:93:df:85, ethertype
IPv4 (0x0800), length 590: 10.5.0.2.67 > 10.5.0.10.67: BOOTP/DHCP, Request from 90:b1:1c:4d:ed:32, length 548
16:17:15.561744 00:50:56:93:df:85 > 00:00:5e:00:01:01, ethertype
IPv4 (0x0800), length 343: 10.5.0.10.67 > 10.5.1.1.67: BOOTP/DHCP, Reply, length 301
16:17:20.549277 64:64:9b:a5:06:81 > 00:50:56:93:df:85, ethertype
IPv4 (0x0800), length 438: 10.5.0.2.67 > 10.5.0.10.67: BOOTP/DHCP, Request from 90:b1:1c:4d:ed:32, length 396
16:17:20.549459 00:50:56:93:df:85 > 00:00:5e:00:01:01, ethertype
IPv4 (0x0800), length 343: 10.5.0.10.67 > 10.5.1.1.67: BOOTP/DHCP, Reply, length 301 """
We start thinking that the issue was from the iPXE firmware itself (it never sends the DHCPREQUEST). So we try rolling back commit by commit to build each time a new iPXE firmware. We finally found the problematic commit wich is the following one :
d73982f098db9fdedb28a3826eb97a6832eac1e4 - [dhcp] Defer discovery if link is blocked
What is weird is that we saw the DHCPDISCOVERY in our logs, but it's the DHCPREQUEST that was never sent. It appears to be related to the "blocked link" new concept introduced few commits before on :
f3812395a261b80fe77d19ebb9045e790c434773 - [netdevice] Add a generic concept of a "blocked link"
We only had this bug on somes servers, not all. For what it is worth.
Each time it's a server with a particular LOM implementation : Dell blades with integrated switch and a Dell R520 with the BMC set on shared on the LOM1. May be it is affecting the detection of the "blocked link" ?
Best Regards,
Laurent
_______________________________________________
ipxe-devel mailing list
ipxe-devel at lists.ipxe.org
https://lists.ipxe.org/mailman/listinfo.cgi/ipxe-devel
More information about the ipxe-devel
mailing list