[ipxe-devel] [bug] Unbalanced UEFI TPL manipulations in iPXE with DOWNLOAD_PROTO_HTTPS=1

Vladislav K. Valtchev vladislav.valtchev at gmail.com
Thu Jun 11 10:54:52 UTC 2020


Hello,
I'd like to forward here a bug-report that's currently on
launchpad.net: https://bugs.launchpad.net/ipxe/+bug/1882671

Originally it seemed a QEMU bug, but later the investigation done by
Laszlo Ersek (Red Hat) plus some tests of mine confirmed that the
bug is in the 82540em.efi iPXE driver for QEMU.

He was even able to
bisect the problem and get d8c500b7945e
("[efi] Drop to TPL_APPLICATION when gathering entropy",
2018-03-12) as first bad commit.

I hope this is the right mailing list for reporting this bug
and that the information provided so far will be helpful.

Thanks,
Vladislav K. Valtchev

P.S.
I'm not sure whether to add 1882671 at bugs.launchpad.net
in CC or not. In doubt, I omitted it. Please, add it to CC
if you believe its the right thing to do.

-------- Forwarded Message --------
From: Laszlo Ersek (Red Hat) <1882671 at bugs.launchpad.net>
Reply-To: Bug 1882671 <1882671 at bugs.launchpad.net>
To: vladislav.valtchev at gmail.com
Subject: [Bug 1882671] Re: qemu-system-x86_64 (ver 4.2) stuck at boot 
with OVMF bios
Date: Wed, 10 Jun 2020 23:17:46 -0000

Hi Vlad,

the ipxe-qemu package in Ubuntu (1.0.0+git-20190109.133f4c4-0ubuntu3) is
built with DOWNLOAD_PROTO_HTTPS enabled (in "src/config/general.h").
According to the Ubuntu changelog, this is a new feature added in
"1.0.0+git-20190109.133f4c4-0ubuntu1".

With DOWNLOAD_PROTO_HTTPS enabled, I can reproduce the issue locally,
with iPXE built from source at git commit 133f4c4 (which you report the
issue for), and also at current iPXE master (9ee70fb95bc2).

The issue does not reproduce (with DOWNLOAD_PROTO_HTTPS enabled) at
commit fbe8c52d. This suggests the problem should be bisectable.

If I disable DOWNLOAD_PROTO_HTTPS, then the problem goes away even at
133f4c4 (i.e., the issue is masked).

I've used current edk2 master to test with (14c7ed8b51f6).

Viewed at 133f4c4:

The DOWNLOAD_PROTO_HTTPS feature test macro seems to result in iPXE
attempting to gather entropy. (Likely for setting up TLS connections.)
For entropy gathering, iPXE seems to use an EFI timer, and to measure
jitter across one timer tick. In this, iPXE plays some tricks with the
UEFI TPL (Task Priority Level).

In general, iPXE seems to want to run at TPL_CALLBACK most of the time,
to mask the timer interrupt in most code locations, and drops down to
TPL_APPLICATION only when it actively wants a timer callback (for the
jitter collection, see above).

When the iPXE driver is launched, the StartImage() UEFI boot service
takes a note of the current TPL. It is TPL_APPLICATION (value 4). Then
iPXE seems to perform the above trickery with TPL_CALLBACK & entropy
collection. Finally, after installing EfiDriverBindingProtocol and
EfiComponentName2Protocol, the iPXE driver exits (as expected from a
UEFI driver model driver -- the entry point function is only supposed to
perform some setup steps & install some protocol interfaces). At this
point, StartImage() verifies whether the TPL has been restored to the
same as it was before launching the driver.

Unfortunately, something about the TPL manipulations in iPXE is
unbalanced, because I see the following TPL changes:

- raise: APPLICATION (4) -> CALLBACK (8)
- raise: CALLBACK (8) -> NOTIFY (16)
- raise: NOTIFY (16) -> NOTIFY (16)
- restore: NOTIFY (16) -> NOTIFY (16)
- restore: NOTIFY (16) -> CALLBACK (8)

Note that the final "restore: CALLBACK (8) -> APPLICATION (4)"
transition is missing, before iPXE exits. This is what StartImage()
catches and reports with the failed ASSERT().

So, as I mentioned, the problem is bisectable. Here's the bisection log:

> git bisect start
> # bad: [9ee70fb95bc266885ff88be228b044a2bb226eeb] [efi] Attempt to
> # connect our driver directly if ConnectController fails
> git bisect bad 9ee70fb95bc266885ff88be228b044a2bb226eeb
> # bad: [133f4c47baef6002b2ccb4904a035cda2303c6e5] [build] Handle
> # R_X86_64_PLT32 from binutils 2.31
> git bisect bad 133f4c47baef6002b2ccb4904a035cda2303c6e5
> # good: [fbe8c52d0d9cdb3d6f5fe8be8edab54618becc1f] [ena] Fix spurious
> # uninitialised variable warning on older versions of gcc
> git bisect good fbe8c52d0d9cdb3d6f5fe8be8edab54618becc1f
> # bad: [bc85368cdd311fe68ffcf251e7e8e90c14f8a9dc] [librm] Ensure that
> # inline code symbols are unique
> git bisect bad bc85368cdd311fe68ffcf251e7e8e90c14f8a9dc
> # bad: [0778418e29ea16fc897fc5b6e497054f5ba86ebd] [golan] Do not
> # assume all devices are identical
> git bisect bad 0778418e29ea16fc897fc5b6e497054f5ba86ebd
> # good: [f672a27b34220865b403df519593f382859559e0] [efi] Raise TPL
> # within EFI_USB_IO_PROTOCOL entry points
> git bisect good f672a27b34220865b403df519593f382859559e0
> # bad: [d8c500b7945e57023dde5bd0be2b0e40963315d9] [efi] Drop to
> # TPL_APPLICATION when gathering entropy
> git bisect bad d8c500b7945e57023dde5bd0be2b0e40963315d9
> # good: [c84f9d67272beaed98f98bf308471df16340a3be] [iscsi] Parse IPv6
> # address in root path
> git bisect good c84f9d67272beaed98f98bf308471df16340a3be
> # first bad commit: [d8c500b7945e57023dde5bd0be2b0e40963315d9] [efi]
> # Drop to TPL_APPLICATION when gathering entropy

The bisection fingers d8c500b7945e ("[efi] Drop to TPL_APPLICATION when
 gathering entropy", 2018-03-12) as first bad commit.

Feel free to report this problem on the upstream iPXE mailing list.

Regarding Ubuntu downstream, you should be able to work around this
issue by #undef-ing DOWNLOAD_PROTO_HTTPS again, in
"src/config/general.h" -- *minimally* in the CONFIG=qemu build(s). That
is, in the ipxe-qemu subpackage.

That's because in a CONFIG=qemu build, you totally don't need (or even
*use*) the iPXE HTTPS infrastructure (the entropy gathering that trips
the ASSERT seems spurious to me, with CONFIG=qemu). With CONFIG=qemu,
iPXE provides the UEFI SNP (Simple Network Protocol) interface on top of
the e1000 NIC, and the crypto stuff (if any) is done by the platform
firmware (edk2 / OVMF).



** Project changed: qemu => ipxe

** Package changed: qemu (Ubuntu) => ipxe (Ubuntu)

-- 
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/1882671

Title:
  qemu-system-x86_64 (ver 4.2) stuck at boot with OVMF bios

Status in iPXE:
  New
Status in ipxe package in Ubuntu:
  Confirmed

Bug description:
  The version of QEMU (4.2.0) packaged for Ubuntu 20.04 hangs
  indefinitely at boot if an OVMF bios is used. This happens ONLY with
  qemu-system-x86_64. qemu-system-i386 works fine with the latest ia32
  OVMF bios.

  NOTE[1]: the same identical OVMF bios works fine on QEMU 2.x packaged with Ubuntu 18.04.
  NOTE[2]: reproducing the fatal bug requires *no* operating system:

     qemu-system-x86_64 -bios OVMF-pure-efi.fd

  On its window QEMU gets stuck at the very first stage:
     "Guest has not initialized the display (yet)."

  NOTE[3]: QEMU gets stuck no matter if KVM is used or not.

  NOTE[4]: By adding the `-d int` option it is possible to observe that
  QEMU is, apparently, stuck in an endless loop of interrupts. For the
  first few seconds, registers' values vary quickly, but at some point
  they reach a final value, while the interrupt counter increments:

    2568: v=68 e=0000 i=0 cpl=0 IP=0038:0000000007f1d225 pc=0000000007f1d225 SP=0030:0000000007f0c8d0 env->regs[R_EAX]=0000000000000000
  RAX=0000000000000000 RBX=0000000007f0c920 RCX=0000000000000000 RDX=0000000000000001
  RSI=0000000006d18798 RDI=0000000000008664 RBP=0000000000000000 RSP=0000000007f0c8d0
  R8 =0000000000000001 R9 =0000000000000089 R10=0000000000000000 R11=0000000007f2c987
  R12=0000000000000000 R13=0000000000000000 R14=0000000007087901 R15=0000000000000000
  RIP=0000000007f1d225 RFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
  ES =0030 0000000000000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
  CS =0038 0000000000000000 ffffffff 00af9a00 DPL=0 CS64 [-R-]
  SS =0030 0000000000000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
  DS =0030 0000000000000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
  FS =0030 0000000000000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
  GS =0030 0000000000000000 ffffffff 00cf9300 DPL=0 DS   [-WA]
  LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
  TR =0000 0000000000000000 0000ffff 00008b00 DPL=0 TSS64-busy
  GDT=     00000000079eea98 00000047
  IDT=     000000000758f018 00000fff
  CR0=80010033 CR2=0000000000000000 CR3=0000000007c01000 CR4=00000668
  DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
  DR6=00000000ffff0ff0 DR7=0000000000000400
  CCS=0000000000000044 CCD=0000000000000000 CCO=EFLAGS  
  EFER=0000000000000d00

  
  NOTE[5]: Just to better help the investigation of the bug, I'd like to remark that the issue is NOT caused by an endless loop of triple-faults. I tried with -d cpu_reset and there is NO such loop. No triple fault whatsoever.

  NOTE[6]: The OVMF version used for the test has been downloaded from:
  https://www.kraxel.org/repos/jenkins/edk2/edk2.git-ovmf-x64-0-20200515.1398.g6ff7c838d0.noarch.rpm

  but the issue is the same with older OVMF versions as well.

  
  Please take a look at it, as the bug is NOT a corner case. QEMU 4.2.0 cannot boot with an UEFI firmware (OVMF) while virtualizing a x86_64 machine AT ALL.

  Thank you very much,
  Vladislav K. Valtchev

To manage notifications about this bug go to:
https://bugs.launchpad.net/ipxe/+bug/1882671/+subscriptions




More information about the ipxe-devel mailing list