[ipxe-devel] virtnet_open_legacy() fails to find virtqueues on Google Compute Engine

Akihiro Suda suda.kyoto at gmail.com
Mon Dec 19 06:17:58 UTC 2016


Hello Ladi,

Thank you a lot for looking into this issue and writing patches about the
queue size. The patch set looks good to me.

>> I haven't captured DHCP packets (because I'm unsure how to do so in GCE's
>> fully managed DHCP server), but I guess DHCPACK is actually sent from the
>> DHCP server but somehow iPXE missed it.
>
> Do you see any interesting log output with DEBUG=virtio-net,virtio-pci ?

Did not see any interesting log. No error about TX is printed, and it just
fails to receive DHCPACK.

So, for ease of debugging with capturing packets, I tried to do `imgfetch
http://...` with a static IP address (`set net0/ip`) instead.

The result implies that "iPXE believes it has sent the second Ethernet
packet without an error, but it is not sent actually".

This assumption is consistent with my report about DHCP.
iPXE can successfully send the 1st packet (DHCPDISCOVER), but seems failing
to send the 2nd packet (DHCPREQUEST). It should be the reason why it cannot
receive DHCPACK.


Here is the log. In the client log, we can see "TX ... PSH ACK", but it is
not received on the server side.

=== iPXE Client 10.146.0.10 (DEBUG=virtio-net:7,virtio-pci:7,tcp:7,ipv4:7
CONFIG=cloud, 26050fd4 with Ladi's 4 great patches) ===

    iPXE> imgfetch http://10.146.0.2/a.ipxe
    http://10.146.0.2/a.ipxe...TCP 0x12eda4 allocated
    TCP 0x12eda4 transitioned from CLOSED to SYN_SENT
    TCP 0x12eda4 bound to port 46673
    TCP 0x12eda4 timer fired in SYN_SENT for 609cb55e..609cb55e 00000000
    TCP 0x12eda4 TX 46673->80 609cb55e..609cb55f           00000000    0 SYN
    IPv4 TX 10.146.0.10->10.146.0.2 len 64 proto 6 id 0100 csum 6489
    VIRTIO-NET 0xee704 enqueuing iobuf 0x12ef90 on vq 1
    VIRTIO-NET 0xee704 tx complete iobuf 0x12ef90
    VIRTIO-NET 0xee704 rx complete iobuf 0x12f5f4 len 74
    VIRTIO-NET 0xee704 enqueuing iobuf 0x1335f4 on vq 0
    IPv4 RX 10.146.0.10<-10.146.0.2 len 60 proto 6 id 0000 csum 258d
    TCP 0x12eda4 RX 46673<-80           609cb55f 5ae4dacb..5ae4dacc    0
SYN ACK
    TCP 0x12eda4 using timestamps, SACK, TX window x128, RX window x512
    TCP 0x12eda4 transitioned from SYN_SENT to ESTABLISHED
    TCP 0x12eda4 TX 46673->80 609cb55f..609cb5c1           5ae4dacc   98
PSH ACK
    IPv4 TX 10.146.0.10->10.146.0.2 len 150 proto 6 id 0201 csum 6332
    VIRTIO-NET 0xee704 enqueuing iobuf 0x12ec94 on vq 1
    VIRTIO-NET 0xee704 tx complete iobuf 0x12ec94
    VIRTIO-NET 0xee704 rx complete iobuf 0x12fdf4 len 74
    VIRTIO-NET 0xee704 enqueuing iobuf 0x12f5f4 on vq 0
    IPv4 RX 10.146.0.10<-10.146.0.2 len 60 proto 6 id 0000 csum 258d
    TCP 0x12eda4 RX 46673<-80           609cb55f 5ae4dacb..5ae4dacc    0
SYN ACK
    TCP 0x12eda4 timer fired in ESTABLISHED for 609cb55f..609cb5c1 5ae4dacc
    TCP 0x12eda4 TX 46673->80 609cb55f..609cb5c1           5ae4dacc   98
PSH ACK
    IPv4 TX 10.146.0.10->10.146.0.2 len 150 proto 6 id 0302 csum 6231
    VIRTIO-NET 0xee704 enqueuing iobuf 0x12ec94 on vq 1
    VIRTIO-NET 0xee704 tx complete iobuf 0x12ec94
    .TCP 0x12eda4 timer fired in ESTABLISHED for 609cb55f..609cb5c1 5ae4dacc
    TCP 0x12eda4 TX 46673->80 609cb55f..609cb5c1           5ae4dacc   98
PSH ACK
    IPv4 TX 10.146.0.10->10.146.0.2 len 150 proto 6 id 0402 csum 6131
    VIRTIO-NET 0xee704 enqueuing iobuf 0x12ec94 on vq 1
    VIRTIO-NET 0xee704 tx complete iobuf 0x12ec94
    VIRTIO-NET 0xee704 rx complete iobuf 0x1305f4 len 74
    VIRTIO-NET 0xee704 enqueuing iobuf 0x12fdf4 on vq 0
    IPv4 RX 10.146.0.10<-10.146.0.2 len 60 proto 6 id 0000 csum 258d
    TCP 0x12eda4 RX 46673<-80           609cb55f 5ae4dacb..5ae4dacc    0
SYN ACK
    ..TCP 0x12eda4 timer fired in ESTABLISHED for 609cb55f..609cb5c1
5ae4dacc
    TCP 0x12eda4 TX 46673->80 609cb55f..609cb5c1           5ae4dacc   98
PSH ACK
    IPv4 TX 10.146.0.10->10.146.0.2 len 150 proto 6 id 0503 csum 6030
    VIRTIO-NET 0xee704 enqueuing iobuf 0x12ec94 on vq 1
    VIRTIO-NET 0xee704 tx complete iobuf 0x12ec94
    ..VIRTIO-NET 0xee704 rx complete iobuf 0x130df4 len 74
    VIRTIO-NET 0xee704 enqueuing iobuf 0x1305f4 on vq 0
    IPv4 RX 10.146.0.10<-10.146.0.2 len 60 proto 6 id 0000 csum 258d
    TCP 0x12eda4 RX 46673<-80           609cb55f 5ae4dacb..5ae4dacc    0
SYN ACK
    .TCP 0x12eda4 timer fired in ESTABLISHED for 609cb55f..609cb5c1 5ae4dacc
    TCP 0x12eda4 TX 46673->80 609cb55f..609cb5c1           5ae4dacc   98
PSH ACK
    IPv4 TX 10.146.0.10->10.146.0.2 len 150 proto 6 id 0604 csum 5f2f
    VIRTIO-NET 0xee704 enqueuing iobuf 0x12ec94 on vq 1
    VIRTIO-NET 0xee704 tx complete iobuf 0x12ec94
    ......TCP 0x12eda4 timer expired in ESTABLISHED for 609cb55f..609cb5c1
5ae4dacc
    TCP 0x12eda4 transitioned from ESTABLISHED to CLOSED
    TCP 0x12eda4 connection deleted
     Connection timed out (http://ipxe.org/4c0a6035)
    iPXE> VIRTIO-NET 0xee704 rx complete iobuf 0x1315f4 len 74
    VIRTIO-NET 0xee704 enqueuing iobuf 0x130df4 on vq 0
    IPv4 RX 10.146.0.10<-10.146.0.2 len 60 proto 6 id 0000 csum 258d
    TCP 0x0 RX 46673<-80           609cb55f 5ae4dacb..5ae4dacc    0 SYN ACK
    IPv4 received packet rejected by stack: The socket is not connected (
http://ipxe.org/380a6001)

    iPXE> VIRTIO-NET 0xee704 rx complete iobuf 0x131df4 len 74
    VIRTIO-NET 0xee704 enqueuing iobuf 0x1315f4 on vq 0
    IPv4 RX 10.146.0.10<-10.146.0.2 len 60 proto 6 id 0000 csum 258d
    TCP 0x0 RX 46673<-80           609cb55f 5ae4dacb..5ae4dacc    0 SYN ACK
    IPv4 received packet rejected by stack: The socket is not connected (
http://ipxe.org/380a6001)
    VIRTIO-NET 0xee704 rx complete iobuf 0x1325f4 len 42
    VIRTIO-NET 0xee704 enqueuing iobuf 0x131df4 on vq 0
    VIRTIO-NET 0xee704 enqueuing iobuf 0x1325f4 on vq 1
    VIRTIO-NET 0xee704 tx complete iobuf 0x1325f4
    VIRTIO-NET 0xee704 rx complete iobuf 0x132df4 len 62
    VIRTIO-NET 0xee704 enqueuing iobuf 0x1325f4 on vq 0
    IPv4 RX 10.146.0.10<-221.204.224.56 len 48 proto 6 id 6aac csum 167b
    TCP 0x0 RX 22<-2281           0a4bb0cf 3368312b..3368312c    0 SYN
    IPv4 received packet rejected by stack: The socket is not connected (
http://ipxe.org/380a6001)
    VIRTIO-NET 0xee704 rx complete iobuf 0x1335f4 len 42
    VIRTIO-NET 0xee704 enqueuing iobuf 0x132df4 on vq 0
    VIRTIO-NET 0xee704 enqueuing iobuf 0x1335f4 on vq 1
    VIRTIO-NET 0xee704 tx complete iobuf 0x1335f4



=== HTTP server 10.146.0.2 (ws01) ===

    $ sudo tcpdump host 10.146.0.10
    tcpdump: verbose output suppressed, use -v or -vv for full protocol
decode
    listening on ens4, link-type EN10MB (Ethernet), capture size 262144
bytes
    04:55:43.715483 IP ipxe01.c.suda-main.internal.46673 >
ws01.c.suda-main.internal.http: Flags [S], seq 1620882782, win 65532,
options [nop,nop,TS val 323011 ecr 0,nop,nop,sackOK,nop,wscale 9,mss 1460],
length 0
    04:55:43.715549 IP ws01.c.suda-main.internal.http >
ipxe01.c.suda-main.internal.46673: Flags [S.], seq 1524947659, ack
1620882783, win 28160, options [mss 1420,sackOK,TS val 77061640 ecr
323011,nop,wscale 7], length 0
    04:55:44.712811 IP ws01.c.suda-main.internal.http >
ipxe01.c.suda-main.internal.46673: Flags [S.], seq 1524947659, ack
1620882783, win 28160, options [mss 1420,sackOK,TS val 77061890 ecr
323011,nop,wscale 7], length 0
    04:55:46.712829 IP ws01.c.suda-main.internal.http >
ipxe01.c.suda-main.internal.46673: Flags [S.], seq 1524947659, ack
1620882783, win 28160, options [mss 1420,sackOK,TS val 77062390 ecr
323011,nop,wscale 7], length 0
    04:55:50.712864 IP ws01.c.suda-main.internal.http >
ipxe01.c.suda-main.internal.46673: Flags [S.], seq 1524947659, ack
1620882783, win 28160, options [mss 1420,sackOK,TS val 77063390 ecr
323011,nop,wscale 7], length 0
    04:55:58.712842 IP ws01.c.suda-main.internal.http >
ipxe01.c.suda-main.internal.46673: Flags [S.], seq 1524947659, ack
1620882783, win 28160, options [mss 1420,sackOK,TS val 77065390 ecr
323011,nop,wscale 7], length 0
    04:56:14.712836 IP ws01.c.suda-main.internal.http >
ipxe01.c.suda-main.internal.46673: Flags [S.], seq 1524947659, ack
1620882783, win 28160, options [mss 1420,sackOK,TS val 77069390 ecr
323011,nop,wscale 7], length 0


2016-12-14 18:17 GMT+09:00 Ladi Prosek <lprosek at redhat.com>:

> Hi Akihiro,
>
> On Wed, Dec 14, 2016 at 8:51 AM, Akihiro Suda <suda.kyoto at gmail.com>
> wrote:
> > Hi Christian,
> >
> > Thank you for the suggestion, I'll try to report this to google.
> >
> > I found that the cause of this issue is because GCE's
> VIRTIO_PCI_QUEUE_NUM
> > is 4096, which is larger than iPXE's MAX_QUEUE_NUM (256).
> >
> > https://git.ipxe.org/ipxe.git/blob/26050fd4c87c50503d5bd573b2ec91
> 703676e211:/src/drivers/bus/virtio-pci.c#l43
> > https://git.ipxe.org/ipxe.git/blob/26050fd4c87c50503d5bd573b2ec91
> 703676e211:/src/include/ipxe/virtio-ring.h#l24
> >
> >
> > IIUC this issue will be resolved if GCE supports virtio 1.0, which would
> > allow iPXE to use custom queue num smaller than the virtio device's max
> > value.
>
> iPXE virtio 1.0 code does not cap the queue size to MAX_QUEUE_NUM. In
> fact, it does not check it against any maximum and still uses
> fixed-sized virtqueue data structures, just like legacy. This is a
> bug.
>
> > However, it would be great if iPXE can support setting MAX_QUEUE_NUM to
> 4096
> > (via `config/general.h`, probably).
>
> Even better, iPXE should try to use the host provided size in legacy
> mode, i.e. no MAX_QUEUE_NUM needed. The few extra dynamic allocations
> shouldn't hurt anything.
>
> > Actually I tried to set MAX_QUEUE_NUM to 4096 and got
> virtnet_open_legacy()
> > succeeded, but it still doesn't work:
> >
> >     iPXE> dhcp
> >     DHCP 0x120e44 entering discovery state
> >     Configuring (net0 42:01:0a:92:00:0a)...DHCP 0x120e44 DHCPDISCOVER
> >     DHCP 0x120e44 DHCPOFFER from 169.254.169.254:67 for 10.146.0.10
> >     .DHCP 0x120e44 DHCPDISCOVER
> >     ..DHCP 0x120e44 entering request state
> >     DHCP 0x120e44 DHCPREQUEST to 169.254.169.254:67 for 10.146.0.10
> >     DHCP 0x120e44 DHCPREQUEST to 169.254.169.254:67 for 10.146.0.10
> >     .DHCP 0x120e44 DHCPREQUEST to 169.254.169.254:67 for 10.146.0.10
> >     .DHCP 0x120e44 DHCPREQUEST to 169.254.169.254:67 for 10.146.0.10
> >     ...DHCP 0x120e44 DHCPREQUEST to 169.254.169.254:67 for 10.146.0.10
> >     ....... No configuration methods succeeded (http://ipxe.org/040ee119
> )
> >     iPXE>
> >
> > I haven't captured DHCP packets (because I'm unsure how to do so in GCE's
> > fully managed DHCP server), but I guess DHCPACK is actually sent from the
> > DHCP server but somehow iPXE missed it.
>
> Do you see any interesting log output with DEBUG=virtio-net,virtio-pci ?
>
> > I'd appreciate if anyone can support me to set MAX_QUEUE_NUM to 4096.
>
> I'll look into it. My proposal is to keep using 128 as the cap in
> virtio 1.0 and support any queue size in legacy, only subject to
> available memory.
>
> > 2016-12-14 16:19 GMT+09:00 Christian Nilsson <nikize at gmail.com>:
> >>
> >> You might want to report this to google as well if you haven't already.
> >>
> >> On Wed, Dec 14, 2016 at 7:16 AM, Akihiro Suda <suda.kyoto at gmail.com>
> >> wrote:
> >> > Hello,
> >> >
> >> > I tried to boot iPXE on a Google Compute Engine instance, but the
> virtio
> >> > driver doesn't work because virtnet_open_legacy() fails to find
> >> > virtqueues.
> >> >
> >> > Please let me know what I can do for providing further information to
> >> > debug
> >> > the issue.
> >> >
> >> > The serial console output with "DEBUG=virtio-net" is as follows:
> >> >
> >> >     iPXE initialising devices...Common virtio capability not found!
> >> >     VIRTIO-NET 0xdf584 busaddr=0000:00:04.0 ioaddr=0xc040 irq=11
> >> >     VIRTIO-NET 0xdf584 mac=42:01:0a:92:00:0a
> >> >     ok
> >> >
> >> >     iPXE 1.0.0+ (26050) -- Open Source Network Boot Firmware --
> >> > http://ipxe.org
> >> >     Features: DNS HTTP iSCSI TFTP SRP AoE ELF MBOOT PXE bzImage Menu
> >> > PXEXT
> >> >     iPXE> ifstat
> >> >     net0: 42:01:0a:92:00:0a using virtio-net on 0000:00:04.0 (closed)
> >> >       [Link:up, TX:0 TXE:0 RX:0 RXE:0]
> >> >     iPXE> dhcp
> >> >     VIRTIO-NET 0xdf584 cannot register queue 0
> >> >     Could not open net0: No such file or directory
> >> > (http://ipxe.org/2d5c403b)
> >> >     iPXE>
> >> >
> >> >
> >> > This ENOENT is raised from virtnet_open_legacy():
> >> >
> >> > https://git.ipxe.org/ipxe.git/blob/26050fd4c87c50503d5bd573b2ec91
> 703676e211:/src/drivers/net/virtio-net.c#l205
> >> >
> >> >
> >> > I also tested some previous commits, but all of them failed, so the
> >> > issue is
> >> > not a regression in some recent commit:
> >> >
> >> >  * 26050fd (Dec  8, 2016): the last commit when I tested
> >> >  * 7b499f8 (Apr 11, 2016): the last commit without support for virtio
> >> > 1.0
> >> >  * e4419ff (Jul  2, 2010):  the first commit with the "native iPXE
> >> > driver"
> >> > (The commit date is older than 232c208, but actually newer than
> 232c208)
> >> >  * 232c208 (Jul 11, 2010): the last commit without the "native iPXE
> >> > driver".
> >> > The error message is "ERROR: queue size 4096 > 512", "Cannot register
> >> > queue
> >> > #0"
> >> >
> >> >
> >> > Regards,
> >> >
> >> > Akihiro Suda
> >> >
> >> >
> >> > _______________________________________________
> >> > ipxe-devel mailing list
> >> > ipxe-devel at lists.ipxe.org
> >> > https://lists.ipxe.org/mailman/listinfo.cgi/ipxe-devel
> >> >
> >
> >
> >
> > _______________________________________________
> > ipxe-devel mailing list
> > ipxe-devel at lists.ipxe.org
> > https://lists.ipxe.org/mailman/listinfo.cgi/ipxe-devel
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ipxe.org/pipermail/ipxe-devel/attachments/20161219/74a4966d/attachment.htm>


More information about the ipxe-devel mailing list