[ipxe-devel] SuperMicro SYS-1028U-TR+ with Intel X710-DR2 nic card

Todd Stansell todd at stansell.org
Tue Mar 29 19:41:20 UTC 2016


I'm trying to use iPXE to network boot from an X710-DR2 card (pci8086:1572)
that's in a SuperMicro SYS-1028U-TR+ system.

When iPXE loads, it does see the mac address of the nic, but it randomly just
stops talking on the network at different phases of the boot.  One time we saw
it fail to even dhcp initially.  Most of the time it's later on in the boot
sequence where the nic will just stop talking on the network (like just doing
HTTP to pull down an ipxe script that we're trying to chain to).

Here's a network trace from the server where it asks for the ipxe script and
then just fails to reply:

104   0.02618 10.64.105.49 -> 10.64.100.20 TFTP Ack  block 48
105   0.00003 10.64.100.20 -> 10.64.105.49 TFTP Data block 49 (1456 bytes)
106   0.02699 10.64.105.49 -> 10.64.100.20 TFTP Ack  block 49
107   0.00003 10.64.100.20 -> 10.64.105.49 TFTP Data block 50 (1011 bytes)
108   0.02572 10.64.105.49 -> 10.64.100.20 TFTP Ack  block 50
109   4.32797 10.64.105.49 -> 10.64.100.20 HTTP C port=27001
110   0.00004 10.64.100.20 -> 10.64.105.49 HTTP R port=27001
111   0.05008 10.64.105.49 -> 10.64.100.20 HTTP GET /boot.cgi?env=ipxe HTTP/1.1
112   0.00003 10.64.100.20 -> 10.64.105.49 HTTP R port=27001
113   0.41703 10.64.100.20 -> 10.64.105.49 HTTP HTTP/1.1 200 OK
114   1.12475 10.64.100.20 -> 10.64.105.49 HTTP HTTP/1.1 200 OK
115   2.26006 10.64.100.20 -> 10.64.105.49 HTTP HTTP/1.1 200 OK
116   4.51997 10.64.100.20 -> 10.64.105.49 HTTP HTTP/1.1 200 OK
117   8.34638 10.64.100.20 -> 10.64.105.49 HTTP R port=27001
118   0.67389 10.64.100.20 -> 10.64.105.49 HTTP HTTP/1.1 200 OK

The TFTP phase is when it is TFTPing iPXE itself (undionly.kpxe), which has an
embedded script to simply request an ipxe script from the server.  On the
console, iPXE just simply prints periods like it's waiting for a response, but
never gets one.   At one point we thought it didn't fail when it only used
TFTP, but after several kernel/initrd/imgfree loops, we got the same kind of
problem with TFTP.  It just seems to fail much less often compared to HTTP.

Even after iPXE hands off to the boot loader (when it manages to get that
far), things will randomly fail as well.  It makes me think that it's not
really an iPXE problem, but at the same time iPXE's view of the net0 device is
pretty broken.  This could be a BIOS problem but we're running the latest BIOS
release (2.0), and latest NIC firmware available (1.0.31).

After iPXE loads, I see the following odd values:

    iPXE> show net0/bustype:hexraw
    net0/bustype:hexraw = 908b5424048b04248902896204895a0889720c897a10896a1431c0c38b5424048b44240885c07501408b62048b5a088b720c8b7a108b6a1459ff32c3b8d23b01
    iPXE> show net0/busid:hexraw
    net0/busid:hexraw = 0000000000
    iPXE> show net0/busloc:hexraw
    net0/busloc:hexraw = 00000000

After doing a "dhcp net0", this is what ifstat shows:

    iPXE> ifstat
    net0: 3c:fd:fe:9c:a3:1c using undionly on  (open)
      [Link:up, TX:2 TXE:0 RX:1505 RXE:354]
      [RXE: 174 x "Error 0x440e6003 (http://ipxe.org/440e6003)"]
      [RXE: 130 x "The socket is not connected (http://ipxe.org/380f6001)"]
      [RXE: 36 x "Invalid argument (http://ipxe.org/1c056002)"]
      [RXE: 14 x "Operation not supported (http://ipxe.org/3c086003)"]

Adding tcpip debug logs, I see some of these:

    Unrecognized TCP/IP protocol 51

With http, tcpip, and netdevice debug enabled, I see the following when things
just stop talking:

    http://web01/boot.cgi?env=ipxe
    Unrecognized TCP/IP protocol 51
    NETDEV net0 failed to receive 0x0: Error 0x440e6003 (http://ipxe.org/440e6003)
    TCP/IP received UDP packet
    TCP/IP sending IPv4 packet
    TCP/IP received TCP packet
    HTTP 0x27ac4 TX GET /boot.cgi?env=ipxe HTTP/1.1
    HTTP 0x27ac4 TX Connection: keep-alive
    HTTP 0x27ac4 TX User-Agent: iPXE 1.0.0 (f8e167)
    HTTP 0x27ac4 TX Host: web01
    TCP/IP sending IPv4 packet
    TCP/IP received TCP packet
    TCP/IP received UDP packet
    NETDEV net0 failed to receive 0x0: The socket is not connected (http://ipxe.org/380f6001)
    ....................... [dots go on forever]

This is using undionly.kpxe on version 1.0.0+ (f8e167).

Is there some way to help determine if the problem is within iPXE somewhere or
the BIOS or the NIC itself?  It feels like all of these problems are the
failed handoff of the bus info above, so it can't load proper network drivers,
etc.

Todd



More information about the ipxe-devel mailing list