[ipxe-devel] UEFI PXE problems with amd systems

Christian Nilsson nikize at gmail.com
Fri May 1 11:46:02 UTC 2020

On Fri, 1 May 2020 at 12:18, Thomas Walker <Thomas.Walker at twosigma.com>

> Hi,
> I've run into some very peculiar behavior from some new AMD Rome base
> systems (Dell R7515 and R6525) that I've already been back and forth with
> Dell over for some time and am pretty well convinced at this point that it
> isn't their problem.
> We've used ipxe for years, and embed a script that makes an http call to a
> server that, based on the serial number of the booting system, fetches boot
> instructions.  Normally this is to just boot from localdisk but sometimes
> (rebuilds, etc) it is to fetch a boot script from a
> +second http server, which contains boot params and locations from which
> to fetch (HTTP again) kernel and initrd.  This has all worked well and good
> for quite some time, including across our transition from legacy BIOS boot
> to EFI a year or two back.
> These two systems both seem to have problems with the HTTP requests
> though.  Upon setting up some port spans and captures we saw a couple of
> intersting things:
> - Initial BOOTROM dhcp/dns/tftp was completely normal
> - As soon as ipxe took over, we started seeing trailing, repeating (i.e.
> the same sequence, but starting in different places on different packets)
> "garbage" tacked onto the end of outgoing packets, both UDP (DNS) and TCP
> (HTTP)
> - Nearby (within our lab setup) switches seemed to pass these odd packets
> ok but as soon as we hit the WAN routers, they were dropped.
> - Failures consistently happened when we went to a "far" (across the WAN)
> http server to retrieve boot instructions.  The SYN succeeds, but the
> SYN/ACK + GET with trailing garbage in the packet never makes it through
> the WAN router.  The http server keeps trying to resend its part
> +of the handshake, which arrives, and ipxe dutifuly responds but that too
> gets dropped.
> - Booting these same systems against the same servers and same ipxe
> revision but in legacy BIOS mode results in no trailing garbage and a
> successful boot
> - Booting these same systems in UEFI mode but using Fedora/Centos' patched
> grub as the 'PXE' image (which uses the UEFI IP stack) works fine, with no
> trailing garbage
> - Whether ipxe attempts to use the onboard BCM5720 gige ports or an add-on
> Mellanox CX5 25Gb adapter, the results are the same
> We build our own ipxe, but only to embed certificates and boot scripts.
> There are no code modifications whatsoever.  Currently using v1.20.1
> although I've tried many older versions and master HEAD.
> I've been playing around with ipxe debug options, and the transport layer
> sizes look right, IP layer csum is correct.  Is there some way to get ipxe
> to dump the full ethernet frame that is believes it is sending?  I had hope
> iobuf would do the trick, but apparently not.
> I do have some packet captures which I've gotten approval to share
> off-list with a few developers provided any on-list discussion avoids
> mention of / ubfuscates hostnames, IPs (yes, even RFC1918), etc.
> Any help with this issue would be greatly appreciated,
> (Appologies if this shows up twice.  I gave up waiting on moderation after
> a few days, joined the list, and resent it).

Are you using ipxe.efi, snponly.efi or which build are you using?
What does ifstat show you when you in the scenario where it doesn't work?
Especially which driver is it using?

If you are seeing snponly than that is the driver and part of stack in
(this case) Dells firmware, the broadcom driver itself probably comes from
if this is the case you might want to try and use ipxe.efi to try and get
it to use a ipxe driver instead, the bcm ones are a bit buggy, but the
mellanox ones should not be.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ipxe.org/pipermail/ipxe-devel/attachments/20200501/1391cee5/attachment.htm>

More information about the ipxe-devel mailing list