[ipxe-devel] Windows having problems parsing iBFT from recent iPXE versions?
Floris Bos
bos at je-eigen-domein.nl
Wed Oct 29 22:15:56 UTC 2014
Hi,
On 10/29/2014 06:31 PM, Michael Brown wrote:
> On 29/10/14 17:14, Floris Bos wrote:
>> I'm not sure if it is actually the iBFT that is the problem.
>> My initial guess was that was the case because the nameserver does not
>> show up in "ipconfig", and my iSCSI disk is not there.
>> But perhaps Windows does not copy the nameserver from iBFT, but normally
>> gets that by using normal DHCP later on.
>> And the real problem is that network connectivity is just screwed up,
>> perhaps caused by iPXE leaving the network adapter in some kind of state
>> Windows is not expecting.
>> That I am seeing DHCP requests, and repeated ARP requests for the IP of
>> my SAN after Windows booted supports the theory that it does have the
>> iBFT, but that Windows is able to transmit network packets, but somehow
>> has problems receiving them.
>>
>> - Several commits I tried before "[tcp] Do not send RST for unrecognised
>> connections" all work properly
>> - Several commits I tried after, all fail
>> - It might be coincidence, but I just managed to get HEAD to work by
>> reversing both "[tcp] Do not send RST for unrecognised connections" and
>> "[tcp] Defer sending ACKs until all received packets have been
>> processed" both which do hackery in src/net/tcp.c.
>
> The problem does not seem to be related to the iBFT; I think we can
> leave that aside for now.
>
> Interesting. I wonder if it could be somehow related to the
> possibility of packets arriving between the time that Windows last
> allows iPXE control of the NIC (via an INT 13 call) and the moment
> that the Windows native driver starts up.
>
> Unfortunately there is no way to enforce a clean handover of the NIC
> when doing anything with iSCSI, since the INT 13 API simply does not
> have any "shut down device" call. The Windows driver will therefore
> always find the NIC in a slightly unexpected state in which it is
> already up and running and receiving packets. It's plausible that the
> two TCP-related changes alter the behaviour in terms of when packets
> are transmitted (and thus responses received) sufficiently to
> trigger/avoid a bug.
>
> You could try using the iPXE native driver instead of undionly.kkpxe.
> This will definitely change the state of the NIC at the time that the
> Windows driver starts up, and it may be that Windows likes this state
> better.
>
Does seem to work with the native driver.
> You could also try using wireshark to see if there are any packets
> present on the network which might arrive after iPXE last relinquishes
> control (i.e. after the last packet sent by iPXE within its TCP
> connection to the iSCSI target) but before the Windows driver has
> started up (i.e. before Windows' initial DHCP request or anything else
> which has obviously been sent by Windows).
>
undionly.kkpxe with the two patches reversed (does work):
- iPXE communcation seems to end with an iSCSI read response, iPXE ACKs
nicely
- then there is this long wait on Windows startup (waiting for disks?),
and during that there are some TCP retransmissions of an iSCSI NOP
command trying to keep the connection warm from SAN to virtualbox.
- straight after that Windows takes over, there is some DHCP/ARP traffic
(not shown below), a LLMNR request for wpad, and a new iSCSI login.
==
No. Time Source Destination Protocol Length
Info
315 29.079661000 192.168.178.4 192.168.178.99 iSCSI
116 SCSI: Read(10) LUN: 0x00 (LBA: 0x00000000, Len: 1)
316 29.080009000 192.168.178.99 192.168.178.4 TCP
116 [TCP segment of a reassembled PDU]
317 29.080017000 192.168.178.99 192.168.178.4 iSCSI
580 SCSI: Data In LUN: 0x00 (Read(10) Response Data) SCSI: Response
LUN: 0x00 (Read(10)) (Good)
318 29.080127000 192.168.178.4 192.168.178.99 TCP
68 5624 > iscsi-target [ACK] Seq=929 Ack=4389 Win=262144 Len=0
TSval=1345915 TSecr=9180359
319 29.080229000 192.168.178.4 192.168.178.99 TCP
68 5624 > iscsi-target [ACK] Seq=929 Ack=4901 Win=262144 Len=0
TSval=1345915 TSecr=9180359
383 39.098163000 192.168.178.99 192.168.178.4 iSCSI
116 NOP In
384 39.941994000 192.168.178.99 192.168.178.4 iSCSI
116 [TCP Retransmission] NOP In
385 41.633880000 192.168.178.99 192.168.178.4 iSCSI
116 [TCP Retransmission] NOP In
388 45.017528000 192.168.178.99 192.168.178.4 iSCSI
116 [TCP Retransmission] NOP In
403 51.784762000 192.168.178.99 192.168.178.4 iSCSI
116 [TCP Retransmission] NOP In
907 172.706543000 192.168.178.4 224.0.0.252 LLMNR
66 Standard query 0xbed4 A wpad
908 172.706550000 192.168.178.4 224.0.0.252 LLMNR
66 Standard query 0xbed4 A wpad
909 172.787428000 192.168.178.4 192.168.178.99 TCP
68 49154 > iscsi-target [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=256
SACK_PERM=1
910 172.787730000 192.168.178.99 192.168.178.4 TCP
68 iscsi-target > 49154 [SYN, ACK] Seq=0 Ack=1 Win=14600 Len=0
MSS=1460 SACK_PERM=1 WS=8
911 172.787999000 192.168.178.4 192.168.178.99 TCP
56 49154 > iscsi-target [ACK] Seq=1 Ack=1 Win=65536 Len=0
912 172.789389000 192.168.178.4 192.168.178.99 iSCSI
244 Login Command
[...various iSCSI commands...]
2149 173.114694000 192.168.178.4 224.0.0.252 LLMNR
66 Standard query 0xbed4 A wpad
2150 173.114697000 192.168.178.4 224.0.0.252 LLMNR
66 Standard query 0xbed4 A wpad
2169 176.317161000 192.168.178.4 192.168.178.255 NBNS
112 Registration NB WORKGROUP<00>
2170 176.317172000 192.168.178.4 192.168.178.255 NBNS
112 Registration NB WORKGROUP<00>
2171 176.317356000 192.168.178.4 192.168.178.255 NBNS
112 Registration NB MININT-SG3NP4U<00>
==
undionly.kkpxe without patch reversion (does NOT work):
- Seems the iSCSI read response is retransmitted lacking the last ACK.
Those packets may arrive when Windows is about to take over.
- Windows does not seem to do any iSCSI communication
==
293 30.869835000 192.168.178.4 192.168.178.99 iSCSI
116 SCSI: Read(10) LUN: 0x00 (LBA: 0x00000000, Len: 1)
294 30.870209000 192.168.178.99 192.168.178.4 TCP
116 [TCP segment of a reassembled PDU]
295 30.870230000 192.168.178.99 192.168.178.4 iSCSI
580 SCSI: Data In LUN: 0x00 (Read(10) Response Data) SCSI: Response
LUN: 0x00 (Read(10)) (Good)
296 30.870346000 192.168.178.4 192.168.178.99 TCP
68 28509 > iscsi-target [ACK] Seq=929 Ack=4389 Win=262144 Len=0
TSval=1363248 TSecr=9418323
315 34.310476000 192.168.178.99 192.168.178.4 TCP
580 [TCP Retransmission] iscsi-target > 28509 [PSH, ACK] Seq=4389
Ack=929 Win=16624 Len=512 TSval=9419184 TSecr=1363248[Reassembly error,
protocol TCP: New fragment overlaps old data (retransmission?)]
353 41.189666000 192.168.178.99 192.168.178.4 TCP
580 [TCP Retransmission] iscsi-target > 28509 [PSH, ACK] Seq=4389
Ack=929 Win=16624 Len=512 TSval=9420904 TSecr=1363248[Reassembly error,
protocol TCP: New fragment overlaps old data (retransmission?)]
522 54.980144000 192.168.178.99 192.168.178.4 TCP
580 [TCP Retransmission] iscsi-target > 28509 [PSH, ACK] Seq=4389
Ack=929 Win=16624 Len=512 TSval=9424352 TSecr=1363248[Reassembly error,
protocol TCP: New fragment overlaps old data (retransmission?)]
549 60.879695000 192.168.178.99 192.168.178.4 iSCSI
164 NOP In, NOP In
1333 173.270305000 192.168.178.4 192.168.178.255 NBNS
112 Registration NB MININT-GRDEK79<00>
1334 173.270318000 192.168.178.4 192.168.178.255 NBNS
112 Registration NB MININT-GRDEK79<00>
1335 173.270536000 192.168.178.4 192.168.178.255 NBNS
112 Registration NB WORKGROUP<00>
==
>>> A problem is that the SAN-booted OS is likely to clear the screen
>>> almost immediately, meaning that the warning message would not be seen
>>> in practice.
>>
>> But if I am doing a SAN installation couldn't the warning be printed the
>> moment I do the sanhook command?
>
> The iBFT is not created until you attempt to boot from the SAN target.
Thought you already had memory reserved for it in some data segment,
before filling it in.
==
/** The boot firmware table generated by iPXE */
static union xbft_table __bss16 ( xbftab ) __attribute__ (( aligned ( 16
) ));
==
Or am I misunderstanding what that does?
(not a low level programmer)
--
Yours sincerely,
Floris Bos
More information about the ipxe-devel
mailing list