[ipxe-devel] Mellanox ConnectX4 slow transfers

John Hanks griznog at gmail.com
Tue Mar 22 16:46:02 UTC 2022


Hi,

I have some nodes which have Mellanox ConnectX4 nics in them and when I
boot with iPXE they seem (based on observed speed) to be negotiating 100
Mbps when bringing the links up. I enabled debugging (make DEBUG=golan
...), which produces the following output:

  NBP file downloaded successfully.
iPXE initialising devices...golan_probe: start
golan_probe: Using NODNIC driver
golan_probe: rc = 0
golan_probe: start
golan_probe: Using NODNIC driver
golan_probe: rc = 0
golan_probe: start
golan_probe: Using normal driver
golan_bring_up
golan_cmd_init Command interface was initialized
golan_core_enable_hca
golan_handle_pages
golan_handle_pages pages needed: 6
golan_provide_pages
golan_provide_pages Pages handled
golan_set_hca_cap
golan_set_hca_cap caps.uar_sz = 5
golan_set_hca_cap caps.log_pg_sz = 12
golan_set_hca_cap caps.log_uar_sz = 0
golan_handle_pages
golan_handle_pages pages needed: 10024
golan_provide_pages
golan_provide_pages Pages handled
golan_hca_init
golan_alloc_uar: UAR allocated with index 0x80
UAR idx 80 (BE 80000003)
golan_create_eq: Event queue created (EQN = 0x4)
golan_alloc_pd: Protection domain created (PDN = 0x11)
golan_create_mkey: Got DMA Key for local access read/write (MKEY = 0x1000)
golan_bring_down: start
golan_destroy_mkey DMA Key (0x1000) for local access write was destroyed
golan_dealloc_pd in
golan_dealloc_pd Protection domain (0x11) was destroyed
golan_destory_eq in
golan_destory_eq Event queue (0x4) was destroyed
golan_dealloc_uar in
golan_dealloc_uar UAR (0x80) was destroyed
golan_teardown_hca in
golan_teardown_hca HCA teardown compleated
golan_handle_pages
golan_handle_pages pages needed: -10024
golan_take_pages
golan_take_pages Pages handled
golan_bring_down: end
golan_probe: rc = 0
ok

[[ normal boot stuff is here ]]

golan_remove: start
golan_remove: Using NODNIC driver remove
golan_remove: end
golan_remove: start
golan_remove: Using NODNIC driver remove
golan_remove: end
golan_remove: start
golan_remove: Using normal driver remove
golan_remove_normal

[[ node boots ]]


Is there something in that which confirms 100 Mbps or gives a clue as to
why these are so slow? Once booted to an OS the NICs work fine, I only see
this slowdown during the initial ipxe fetching of kernel and initrd. I
don't (yet) have easy access to watch this at the switch but will get that
soon. For now I base my guess that this is auto negotiating 100 Mbps on
watching the host that sends the images max out at 11 MB/s when one of
these nodes is fetching an image.

Is there a way to get more debugging here or better, force the link to come
up at a higher speed?

Best,

griznog
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ipxe.org/pipermail/ipxe-devel/attachments/20220322/58256f13/attachment.htm>


More information about the ipxe-devel mailing list