[ipxe-devel] Misc iPXE patches

Sat Nov 13 12:43:22 UTC 2010

Thanks for looking over everything and shortening my patch list.

On Fri, Nov 12, 2010 at 7:50 PM, Michael Brown <mbrown at fensystems.co.uk>wrote:

>
>
> Yes, that is worrisome, and I suspect it would break on some other NICs.
>  Do
> you have any details on how this NIC handled interrupts, and why this
> didn't
> work with iPXE?
>
>
I'll try to get more data from them.  They may respond on-list anyway since
I think they are subscribed.

> > And finally, for lack of debug I cannot provide an ongoing assessment of
> >  how required it is, but I disabled arp table population on non-arp
> >  traffic, ICMP echo replies and TCP resets because with thousands of
> nodes
> >  on a vlan, gPXE's neighbor table wasn't quite coping with the chaotic
> >  traffic (tried to learn too many incidental macs and didn't manage to
> get
> >  the boot servers in the table).  Unfortunately, debugging thousands of
> >  nodes booting at once is not a frequent test case and I had no need of
> >  'proper' network behavior, so I took a hatchet to it:
> >
> https://xcat.svn.sourceforge.net/svnroot/xcat/xcat-dep/trunk/xnba/ipxe-drop
> > packets.patch It could be possible to reproduce the problem more
> >  synthetically at small scale for a 'proper' fix, but I didn't try given
> a
> >  good enough solution to my problem.
>
> Both ICMP and TCP should be unicast, so I'm puzzled why you would see high
> traffic volumes for these protocols.  Removing the ability to respond with
> ICMP
> echo replies and TCP resets would noticeably downgrade functionality,
> whereas
> the ARP portion of the patch should result only in an extra pair of ARP
> packets, so I'm tempted to apply only the ARP portion.  Any thoughts?
>
> Michael
>

I don't mind either way, I consider the ability to maintain a patch
externally a good way to make calls on 'acceptable' limitations for our user
base that would be unacceptable to others.  My first step was the arp patch
too, thinking that was all, but it wasn't enough.  To address your
puzzlement, these nodes nominally participate in a number of clustered
services with some number of probes from monitoring services, particularly
when they miss heartbeats.  So monitoring services come to probe the running
OS to evaluate state and end up trying to probe iPXE, consuming neighbor
table entries to get their queries answered.  However, it looked like gPXE
(at the time) should have been able to drop the entries in spite of that
situation and move on, but somehow wasn't.  Since we're the only ones to
really see this, keeping the function intact for almost everyone makes
sense.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ipxe.org/pipermail/ipxe-devel/attachments/20101113/26c57765/attachment.htm>