ovni/doc/user/emulation/mpi.md

10 KiB

MPI model

The Message Passing Interface (MPI) is a standard library interface specification for message-passing communication libraries targeting parallel computing architectures. The interface defines functions for point-to-point communication primitives, collectives, remote memory access (RMA), I/O and process management.

The Sonar library instruments the most essential MPI functions that any user application or any external library may execute. Sonar tracks the calls to these MPI functions made at each point. Both users and developers can use this information to analyze the time spent inside MPI functions. The next section explains a view that is provided to achieve this goal.

The Sonar library is compatible with the MPI standards 3.0, 3.1 and 4.0. See the MPI documentation for more information about the MPI standards and their functions.

Sonar requires an installation of the ovni library and an MPI library. Use the option --with-ovni=prefix when building Sonar to specify the ovni prefix. The building procedure will compile and install the libsonar-mpi.so. See the Sonar documentation for more details about the building steps.

An application can instrument the MPI function calls by linking with the Sonar library libsonar-mpi.so. At run-time, the Sonar library does not enable the instrumentation by default. Sonar instruments the MPI functions when the environment variable SONAR_MPI_INSTRUMENT is defined to ovni. Its default value is none.

As an example, a user can generate a trace with MPI function events of an MPI program app.c in this way:

$ mpicc -c app.c -o app.o
$ mpicc app.o -o app -L${SONAR_PREFIX}/lib -lsonar-mpi
$ export SONAR_MPI_INSTRUMENT=ovni
$ mpirun -n 2 ./app

This will generate an ovni trace in the ovni directory, which can be emulated using the ovniemu tool.

!!! Note

Notice that the order of libraries at the linking stage is important. The
Sonar library should always have precedence on the MPI library. That's the
usual behavior when using `mpicc` tools. The `mpicc` tool should link the
application with the MPI libraries as the last libraries in the list of
application's dependencies. If this order is not respected, the Sonar
library would not be able to intercept the MPI function calls and instrument
them.

!!! Note

Notice the Task-Aware MPI (TAMPI), as well as other external libraries,
intercepts the MPI functions and may call MPI functions instead. Thus, the
order in which such libraries and Sonar are linked to the application will
also alter the resulting ovni trace. Give precedence to the Sonar library to
instrument the MPI function calls made by the application. You can achieve
by linking your application with the linking options `-lsonar-mpi -ltampi`.
Otherwise, give precendence to the TAMPI library to track the real MPI
functions that are being executed (i.e., the ones that the MPI library
actually runs). In this case, use the linking options `-ltampi -lsonar-mpi`.

Function view

The function view attempts to provide a general overview of which are the MPI functions being executed at any point in time. The function view shows the MPI functions called by each thread (and for each CPU, the MPI functions executed by the running thread in that CPU).

The function states shown in this view are listed below. Each function state (in bold) includes a list of all the MPI functions that are instrumented as that particular state. Notice that only the most important functions are instrumented. Also, notice that not all functions have their own state. For instance, the large count MPI (with _c suffix) introduced in MPI 4.0, the extended functions (with v or w suffix), and Fortran functions (with lower case name and _ suffix) are instrumented as their simple C function without suffix.

  • Setup functions: The running thread is executing MPI setup functions to initialize and finalize the MPI environment. The following function states are shown:

    • MPI_Init: MPI_Init, mpi_init_

    • MPI_Init_thread: MPI_Init_thread, mpi_init_thread_

    • MPI_Finalize: MPI_Finalize, mpi_finalize_

  • Request functions: The running thread is executing MPI functions that wait or test MPI requests after being generated by non-blocking MPI operations. The following functions are instrumented:

    • MPI_Wait: MPI_Wait, mpi_wait_

    • MPI_Waitall: MPI_Waitall, mpi_waitall_

    • MPI_Waitany: MPI_Waitany, mpi_waitany_

    • MPI_Waitsome: MPI_Waitsome, mpi_waitsome_

    • MPI_Test: MPI_Test, mpi_test_

    • MPI_Testall: MPI_Testall, mpi_testall_

    • MPI_Testany: MPI_Testany, mpi_testany_

    • MPI_Testsome: MPI_Testsome, mpi_testsome_

  • Point-to-point functions: The running thread is communicating through MPI by executing point-to-point primitives. The instrumented functions are:

    • MPI_Recv: MPI_Recv, MPI_Recv_c, mpi_recv_

    • MPI_Send: MPI_Send, MPI_Send_c, mpi_send_

    • MPI_Bsend: MPI_Bsend, MPI_Bsend_c, mpi_bsend_

    • MPI_Rsend: MPI_Rsend, MPI_Rsend_c, mpi_rsend_

    • MPI_Ssend: MPI_Ssend, MPI_Ssend_c, mpi_ssend_

    • MPI_Sendrecv: MPI_Sendrecv, MPI_Sendrecv_c, mpi_sendrecv_

    • MPI_Sendrecv_replace: MPI_Sendrecv_replace, MPI_Sendrecv_replace_c, mpi_sendrecv_replace_

    • MPI_Irecv: MPI_Irecv, MPI_Irecv_c, mpi_irecv_

    • MPI_Isend: MPI_Isend, MPI_Isend_c, mpi_isend_

    • MPI_Ibsend: MPI_Ibsend, MPI_Ibsend_c, mpi_ibsend_

    • MPI_Irsend: MPI_Irsend, MPI_Irsend_c, mpi_irsend_

    • MPI_Issend: MPI_Issend, MPI_Issend_c, mpi_issend_

    • MPI_Isendrecv: MPI_Isendrecv, MPI_Isendrecv_c, mpi_isendrecv_

    • MPI_Isendrecv_replace: MPI_Isendrecv_replace, MPI_Isendrecv_replace_c, mpi_isendrecv_replace_

  • Collective functions: The running thread is communicating through MPI by executing collective functions. The instrumented functions are:

    • MPI_Gather: MPI_Gather, MPI_Gatherv, MPI_Gather_c, MPI_Gatherv_c, mpi_gather_, mpi_gatherv_

    • MPI_Allgather: MPI_Allgather, MPI_Allgatherv, MPI_Allgather_c, MPI_Allgatherv_c, mpi_allgather_, mpi_allgatherv_

    • MPI_Scatter: MPI_Scatter, MPI_Scatterv, MPI_Scatter_c, MPI_Scatterv_c, mpi_scatter_, mpi_scatterv_

    • MPI_Reduce: MPI_Reduce, MPI_Reduce_c, mpi_reduce_

    • MPI_Reduce_scatter: MPI_Reduce_scatter, MPI_Reduce_scatter_c, mpi_reduce_scatter_

    • MPI_Reduce_scatter_block: MPI_Reduce_scatter_block, MPI_Reduce_scatter_block_c, mpi_reduce_scatter_block_

    • MPI_Allreduce: MPI_Allreduce, MPI_Allreduce_c, mpi_allreduce_

    • MPI_Barrier: MPI_Barrier, MPI_Barrier_c, mpi_barrier_

    • MPI_Bcast: MPI_Bcast, MPI_Bcast_c, mpi_bcast

    • MPI_Alltoall: MPI_Alltoall, MPI_Alltoallv, MPI_Alltoallw, MPI_Alltoall_c, MPI_Alltoallv_c, MPI_Alltoallw_c, mpi_alltoall_, mpi_alltoallv_, mpi_alltoallw_

    • MPI_Scan: MPI_Scan, MPI_Scan_c, mpi_scan_

    • MPI_Exscan: MPI_Exscan, MPI_Exscan_c, mpi_exscan_

    • MPI_Igather: MPI_Igather, MPI_Igatherv, MPI_Igather_c, MPI_Igatherv_c, mpi_igather_, mpi_igatherv_

    • MPI_Iallgather: MPI_Iallgather, MPI_Iallgatherv, MPI_Iallgather_c, MPI_Iallgatherv_c, mpi_iallgather_, mpi_iallgatherv_

    • MPI_Iscatter: MPI_Iscatter, MPI_Iscatterv, MPI_Iscatter_c, MPI_Iscatterv_c, mpi_iscatter_, mpi_iscatterv_

    • MPI_Ireduce: MPI_Ireduce, MPI_Ireduce_c, mpi_ireduce_

    • MPI_Iallreduce: MPI_Iallreduce, MPI_Iallreduce_c, mpi_iallreduce_

    • MPI_Ireduce_scatter: MPI_Ireduce_scatter, MPI_Ireduce_scatter_c, mpi_ireduce_scatter_

    • MPI_Ireduce_scatter_block: MPI_Ireduce_scatter_block, MPI_Ireduce_scatter_block_c, mpi_ireduce_scatter_block_

    • MPI_Ibarrier: MPI_Ibarrier, MPI_Ibarrier_c, mpi_ibarrier_

    • MPI_Ibcast: MPI_Ibcast, MPI_Ibcast_c, mpi_ibcast_

    • MPI_Ialltoall: MPI_Ialltoall, MPI_Ialltoallv, MPI_Ialltoallw, MPI_Ialltoall_c, MPI_Ialltoallv_c, MPI_Ialltoallw_c, mpi_ialltoall_, mpi_ialltoallv_, mpi_ialltoallw_

    • MPI_Iscan: MPI_Iscan, MPI_Iscan_c, mpi_iscan_

    • MPI_Iexscan: MPI_Iexscan, MPI_Iexscan_c, mpi_iexscan_

!!! Note

The Sonar library does not support large count MPI functions for the Fortran
language yet, and thus, these functions are not instrumented.

The figure below shows an example of the MPI function view. The program executes a distributed stencil algorithm with MPI and OmpSs-2. There are several MPI processes, each running OmpSs-2 tasks on an exclusive set of CPUs. Most of these are computation tasks, while the others are concurrent tasks performing communications using the blocking mode of the TAMPI library. These use MPI_Send and MPI_Recv functions to send and receive blocks of data. The program was linked with Sonar and preceding the TAMPI library. Thus, the trace shows the blocking MPI function calls made by the application.

MPI function view example

The light green areas correspond to the MPI_Init_thread calls, the grey ones are MPI_Send calls and the dark green areas are MPI_Recv calls. There are other secondary calls like MPI_Bcast (orange), MPI_Barrier (blue) and MPI_Finalize (red) calls.

As mentioned above, the trace shows the blocking MPI functions called by the application because Sonar was placed before TAMPI in the linking order. However, these blocking calls may not be actually executed by the MPI library; TAMPI will transparently replace them with non-blocking calls (e.g., MPI_Isend and MPI_Irecv) and a polling mechanism for the generated MPI requests. If you want to explore the actual MPI functions being executed, you should link the Sonar library after TAMPI.