Low bandwidth using IPoIB for OmniPath #33

Open
opened 2023-09-01 10:33:34 +02:00 by rarias · 2 comments
rarias commented 2023-09-01 10:33:34 +02:00 (Migrated from pm.bsc.es)

Using 2 threads:

% iperf -c 10.0.42.40 -P 2
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec  1.79 GBytes  1.54 Gbits/sec                  receiver
[  8]   0.00-10.00  sec  1.79 GBytes  1.54 Gbits/sec                  receiver
[SUM]   0.00-10.00  sec  3.59 GBytes  3.08 Gbits/sec                  receiver

It should be close to 100 Gbits/sec.

Using 2 threads: ``` % iperf -c 10.0.42.40 -P 2 [ ID] Interval Transfer Bitrate [ 5] 0.00-10.00 sec 1.79 GBytes 1.54 Gbits/sec receiver [ 8] 0.00-10.00 sec 1.79 GBytes 1.54 Gbits/sec receiver [SUM] 0.00-10.00 sec 3.59 GBytes 3.08 Gbits/sec receiver ``` It should be close to 100 Gbits/sec.
rarias commented 2023-09-01 10:34:21 +02:00 (Migrated from pm.bsc.es)

The MTU is very low:

ibp5s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 2044
        inet 10.0.42.7  netmask 255.255.255.0  broadcast 0.0.0.0
Infiniband hardware address can be incorrect! Please read BUGS section in ifconfig(8).
        infiniband 80:81:00:02:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00  txqueuelen 256  (InfiniBand)
        RX packets 892981  bytes 46399345 (44.2 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1937891  bytes 3966409356 (3.6 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
The MTU is very low: ``` ibp5s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2044 inet 10.0.42.7 netmask 255.255.255.0 broadcast 0.0.0.0 Infiniband hardware address can be incorrect! Please read BUGS section in ifconfig(8). infiniband 80:81:00:02:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 txqueuelen 256 (InfiniBand) RX packets 892981 bytes 46399345 (44.2 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 1937891 bytes 3966409356 (3.6 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 ```
rarias commented 2023-09-01 10:52:29 +02:00 (Migrated from pm.bsc.es)

So, the device was in datagram mode not in connected, from the Intel docs:

The traditional term for sending IP traffic over the InfiniBand* fabric is IP over IB or
IPoIB. The Intel®Omni-Path fabric does not implement InfiniBand*; instead, this
concept is known as IP over Fabric or IPoFabric. From the software point of view, it
behaves the same way as IPoIB, and in fact uses an ib_ipoib driver and sends IP
traffic over the ib0 and/or ib1 ports. Thus, we will primarily refer to this traffic as
IPoFabric, but will contunue to use the terms ib_ipoib and the ib0/ib1 ports, and
measure performance with traditional IP oritented benchmarks such as qperf and
iperf. For IPoFabric bandwidth benchmarks, a prerequisite for good performance is
having 8KB MTU enabled on the fabric. To enable the 8K MTU, one only needs to have
IPoIB configured for Connected Mode. Currently, the best way to accomplish this is as
follows:

modprobe ib_ipoib
ifup ib0
echo connected > /sys/class/net/ib0/mode
echo 65520 > /sys/class/net/ib0/mtu

This allows the MTU to be increased:

ibp5s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 65520
        inet 10.0.42.7  netmask 255.255.255.0  broadcast 0.0.0.0
        infiniband 80:81:00:02:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00  txqueuelen 256  (InfiniBand)
        RX packets 1538425  bytes 80197008 (76.4 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 3272190  bytes 73713595124 (68.6 GiB)
        TX errors 0  dropped 16363 overruns 0  carrier 0  collisions 0

Now we reach 16 Gbits/sec:

hut$ iperf -c 10.0.42.40 -l 1M
Connecting to host 10.0.42.40, port 5201
[  5] local 10.0.42.7 port 43612 connected to 10.0.42.40 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.93 GBytes  16.5 Gbits/sec    0   3.12 MBytes
[  5]   1.00-2.00   sec  1.83 GBytes  15.7 Gbits/sec    1   3.12 MBytes
[  5]   2.00-3.00   sec  1.89 GBytes  16.2 Gbits/sec    1   3.12 MBytes
[  5]   3.00-4.00   sec  1.91 GBytes  16.5 Gbits/sec    0   3.12 MBytes
[  5]   4.00-5.00   sec  1.83 GBytes  15.8 Gbits/sec    1   3.12 MBytes
[  5]   5.00-6.00   sec  1.92 GBytes  16.5 Gbits/sec    0   3.12 MBytes
[  5]   6.00-7.00   sec  1.83 GBytes  15.7 Gbits/sec    1   3.12 MBytes
[  5]   7.00-8.00   sec  1.92 GBytes  16.5 Gbits/sec    0   3.12 MBytes
[  5]   8.00-9.00   sec  1.82 GBytes  15.7 Gbits/sec    1   3.12 MBytes
[  5]   9.00-10.00  sec  1.92 GBytes  16.5 Gbits/sec    0   3.12 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  18.8 GBytes  16.1 Gbits/sec    5             sender
[  5]   0.00-10.00  sec  18.8 GBytes  16.1 Gbits/sec                  receiver
So, the device was in datagram mode not in connected, from the Intel docs: > The traditional term for sending IP traffic over the InfiniBand* fabric is IP over IB or IPoIB. The Intel®Omni-Path fabric does not implement InfiniBand*; instead, this concept is known as IP over Fabric or IPoFabric. From the software point of view, it behaves the same way as IPoIB, and in fact uses an ib_ipoib driver and sends IP traffic over the ib0 and/or ib1 ports. Thus, we will primarily refer to this traffic as IPoFabric, but will contunue to use the terms ib_ipoib and the ib0/ib1 ports, and measure performance with traditional IP oritented benchmarks such as qperf and iperf. For IPoFabric bandwidth benchmarks, a prerequisite for good performance is having 8KB MTU enabled on the fabric. To enable the 8K MTU, one only needs to have IPoIB configured for Connected Mode. Currently, the best way to accomplish this is as follows: ``` modprobe ib_ipoib ifup ib0 echo connected > /sys/class/net/ib0/mode echo 65520 > /sys/class/net/ib0/mtu ``` This allows the MTU to be increased: ``` ibp5s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 65520 inet 10.0.42.7 netmask 255.255.255.0 broadcast 0.0.0.0 infiniband 80:81:00:02:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 txqueuelen 256 (InfiniBand) RX packets 1538425 bytes 80197008 (76.4 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 3272190 bytes 73713595124 (68.6 GiB) TX errors 0 dropped 16363 overruns 0 carrier 0 collisions 0 ``` Now we reach 16 Gbits/sec: ``` hut$ iperf -c 10.0.42.40 -l 1M Connecting to host 10.0.42.40, port 5201 [ 5] local 10.0.42.7 port 43612 connected to 10.0.42.40 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 1.93 GBytes 16.5 Gbits/sec 0 3.12 MBytes [ 5] 1.00-2.00 sec 1.83 GBytes 15.7 Gbits/sec 1 3.12 MBytes [ 5] 2.00-3.00 sec 1.89 GBytes 16.2 Gbits/sec 1 3.12 MBytes [ 5] 3.00-4.00 sec 1.91 GBytes 16.5 Gbits/sec 0 3.12 MBytes [ 5] 4.00-5.00 sec 1.83 GBytes 15.8 Gbits/sec 1 3.12 MBytes [ 5] 5.00-6.00 sec 1.92 GBytes 16.5 Gbits/sec 0 3.12 MBytes [ 5] 6.00-7.00 sec 1.83 GBytes 15.7 Gbits/sec 1 3.12 MBytes [ 5] 7.00-8.00 sec 1.92 GBytes 16.5 Gbits/sec 0 3.12 MBytes [ 5] 8.00-9.00 sec 1.82 GBytes 15.7 Gbits/sec 1 3.12 MBytes [ 5] 9.00-10.00 sec 1.92 GBytes 16.5 Gbits/sec 0 3.12 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 18.8 GBytes 16.1 Gbits/sec 5 sender [ 5] 0.00-10.00 sec 18.8 GBytes 16.1 Gbits/sec receiver ```
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: rarias/jungle#33
No description provided.