# MPI model The [Message Passing Interface (MPI)][mpi] is a standard library interface specification for message-passing communication libraries targeting parallel computing architectures. The interface defines functions for point-to-point communication primitives, collectives, remote memory access (RMA), I/O and process management. The [Sonar][sonar] library instruments the most essential MPI functions that any user application or any external library may execute. Sonar tracks the calls to these MPI functions made at each point. Both users and developers can use this information to analyze the time spent inside MPI functions. The next section explains a view that is provided to achieve this goal. The Sonar library is compatible with the MPI standards 3.0, 3.1 and 4.0. See the [MPI documentation][mpi docs] for more information about the MPI standards and their functions. [mpi]: https://www.mpi-forum.org [mpi docs]: https://www.mpi-forum.org/docs [sonar]: https://github.com/bsc-pm/sonar [sonar docs]: https://github.com/bsc-pm/sonar#readme Sonar requires an installation of the ovni library and an MPI library. Use the option `--with-ovni=prefix` when building Sonar to specify the ovni prefix. The building procedure will compile and install the `libsonar-mpi.so`. See the [Sonar documentation][sonar docs] for more details about the building steps. An application can instrument the MPI function calls by linking with the Sonar library `libsonar-mpi.so`. At run-time, the Sonar library does not enable the instrumentation by default. Sonar instruments the MPI functions when the environment variable `SONAR_MPI_INSTRUMENT` is defined to `ovni`. Its default value is `none`. As an example, a user can generate a trace with MPI function events of an MPI program `app.c` in this way: ``` $ mpicc -c app.c -o app.o $ mpicc app.o -o app -L${SONAR_PREFIX}/lib -lsonar-mpi $ export SONAR_MPI_INSTRUMENT=ovni $ mpirun -n 2 ./app ``` This will generate an ovni trace in the `ovni` directory, which can be emulated using the `ovniemu` tool. !!! Note Notice that the order of libraries at the linking stage is important. The Sonar library should always have precedence on the MPI library. That's the usual behavior when using `mpicc` tools. The `mpicc` tool should link the application with the MPI libraries as the last libraries in the list of application's dependencies. If this order is not respected, the Sonar library would not be able to intercept the MPI function calls and instrument them. !!! Note Notice the Task-Aware MPI (TAMPI), as well as other external libraries, intercepts the MPI functions and may call MPI functions instead. Thus, the order in which such libraries and Sonar are linked to the application will also alter the resulting ovni trace. Give precedence to the Sonar library to instrument the MPI function calls made by the application. You can achieve by linking your application with the linking options `-lsonar-mpi -ltampi`. Otherwise, give precendence to the TAMPI library to track the real MPI functions that are being executed (i.e., the ones that the MPI library actually runs). In this case, use the linking options `-ltampi -lsonar-mpi`. ## Function view The function view attempts to provide a general overview of which are the MPI functions being executed at any point in time. The function view shows the MPI functions called by each thread (and for each CPU, the MPI functions executed by the running thread in that CPU). The function states shown in this view are listed below. Each function state (in bold) includes a list of all the MPI functions that are instrumented as that particular state. Notice that only the most important functions are instrumented. Also, notice that not all functions have their own state. For instance, the large count MPI (with `_c` suffix) introduced in MPI 4.0, the extended functions (with `v` or `w` suffix), and Fortran functions (with lower case name and `_` suffix) are instrumented as their simple C function without suffix. - *Setup functions*: The running thread is executing MPI setup functions to initialize and finalize the MPI environment. The following function states are shown: - **MPI_Init**: `MPI_Init`, `mpi_init_` - **MPI_Init_thread**: `MPI_Init_thread`, `mpi_init_thread_` - **MPI_Finalize**: `MPI_Finalize`, `mpi_finalize_` - *Request functions*: The running thread is executing MPI functions that wait or test MPI requests after being generated by non-blocking MPI operations. The following functions are instrumented: - **MPI_Wait**: `MPI_Wait`, `mpi_wait_` - **MPI_Waitall**: `MPI_Waitall`, `mpi_waitall_` - **MPI_Waitany**: `MPI_Waitany`, `mpi_waitany_` - **MPI_Waitsome**: `MPI_Waitsome`, `mpi_waitsome_` - **MPI_Test**: `MPI_Test`, `mpi_test_` - **MPI_Testall**: `MPI_Testall`, `mpi_testall_` - **MPI_Testany**: `MPI_Testany`, `mpi_testany_` - **MPI_Testsome**: `MPI_Testsome`, `mpi_testsome_` - *Point-to-point functions*: The running thread is communicating through MPI by executing point-to-point primitives. The instrumented functions are: - **MPI_Recv**: `MPI_Recv`, `MPI_Recv_c`, `mpi_recv_` - **MPI_Send**: `MPI_Send`, `MPI_Send_c`, `mpi_send_` - **MPI_Bsend**: `MPI_Bsend`, `MPI_Bsend_c`, `mpi_bsend_` - **MPI_Rsend**: `MPI_Rsend`, `MPI_Rsend_c`, `mpi_rsend_` - **MPI_Ssend**: `MPI_Ssend`, `MPI_Ssend_c`, `mpi_ssend_` - **MPI_Sendrecv**: `MPI_Sendrecv`, `MPI_Sendrecv_c`, `mpi_sendrecv_` - **MPI_Sendrecv_replace**: `MPI_Sendrecv_replace`, `MPI_Sendrecv_replace_c`, `mpi_sendrecv_replace_` - **MPI_Irecv**: `MPI_Irecv`, `MPI_Irecv_c`, `mpi_irecv_` - **MPI_Isend**: `MPI_Isend`, `MPI_Isend_c`, `mpi_isend_` - **MPI_Ibsend**: `MPI_Ibsend`, `MPI_Ibsend_c`, `mpi_ibsend_` - **MPI_Irsend**: `MPI_Irsend`, `MPI_Irsend_c`, `mpi_irsend_` - **MPI_Issend**: `MPI_Issend`, `MPI_Issend_c`, `mpi_issend_` - **MPI_Isendrecv**: `MPI_Isendrecv`, `MPI_Isendrecv_c`, `mpi_isendrecv_` - **MPI_Isendrecv_replace**: `MPI_Isendrecv_replace`, `MPI_Isendrecv_replace_c`, `mpi_isendrecv_replace_` - *Collective functions*: The running thread is communicating through MPI by executing collective functions. The instrumented functions are: - **MPI_Gather**: `MPI_Gather`, `MPI_Gatherv`, `MPI_Gather_c`, `MPI_Gatherv_c`, `mpi_gather_`, `mpi_gatherv_` - **MPI_Allgather**: `MPI_Allgather`, `MPI_Allgatherv`, `MPI_Allgather_c`, `MPI_Allgatherv_c`, `mpi_allgather_`, `mpi_allgatherv_` - **MPI_Scatter**: `MPI_Scatter`, `MPI_Scatterv`, `MPI_Scatter_c`, `MPI_Scatterv_c`, `mpi_scatter_`, `mpi_scatterv_` - **MPI_Reduce**: `MPI_Reduce`, `MPI_Reduce_c`, `mpi_reduce_` - **MPI_Reduce_scatter**: `MPI_Reduce_scatter`, `MPI_Reduce_scatter_c`, `mpi_reduce_scatter_` - **MPI_Reduce_scatter_block**: `MPI_Reduce_scatter_block`, `MPI_Reduce_scatter_block_c`, `mpi_reduce_scatter_block_` - **MPI_Allreduce**: `MPI_Allreduce`, `MPI_Allreduce_c`, `mpi_allreduce_` - **MPI_Barrier**: `MPI_Barrier`, `MPI_Barrier_c`, `mpi_barrier_` - **MPI_Bcast**: `MPI_Bcast`, `MPI_Bcast_c`, `mpi_bcast` - **MPI_Alltoall**: `MPI_Alltoall`, `MPI_Alltoallv`, `MPI_Alltoallw`, `MPI_Alltoall_c`, `MPI_Alltoallv_c`, `MPI_Alltoallw_c`, `mpi_alltoall_`, `mpi_alltoallv_`, `mpi_alltoallw_` - **MPI_Scan**: `MPI_Scan`, `MPI_Scan_c`, `mpi_scan_` - **MPI_Exscan**: `MPI_Exscan`, `MPI_Exscan_c`, `mpi_exscan_` - **MPI_Igather**: `MPI_Igather`, `MPI_Igatherv`, `MPI_Igather_c`, `MPI_Igatherv_c`, `mpi_igather_`, `mpi_igatherv_` - **MPI_Iallgather**: `MPI_Iallgather`, `MPI_Iallgatherv`, `MPI_Iallgather_c`, `MPI_Iallgatherv_c`, `mpi_iallgather_`, `mpi_iallgatherv_` - **MPI_Iscatter**: `MPI_Iscatter`, `MPI_Iscatterv`, `MPI_Iscatter_c`, `MPI_Iscatterv_c`, `mpi_iscatter_`, `mpi_iscatterv_` - **MPI_Ireduce**: `MPI_Ireduce`, `MPI_Ireduce_c`, `mpi_ireduce_` - **MPI_Iallreduce**: `MPI_Iallreduce`, `MPI_Iallreduce_c`, `mpi_iallreduce_` - **MPI_Ireduce_scatter**: `MPI_Ireduce_scatter`, `MPI_Ireduce_scatter_c`, `mpi_ireduce_scatter_` - **MPI_Ireduce_scatter_block**: `MPI_Ireduce_scatter_block`, `MPI_Ireduce_scatter_block_c`, `mpi_ireduce_scatter_block_` - **MPI_Ibarrier**: `MPI_Ibarrier`, `MPI_Ibarrier_c`, `mpi_ibarrier_` - **MPI_Ibcast**: `MPI_Ibcast`, `MPI_Ibcast_c`, `mpi_ibcast_` - **MPI_Ialltoall**: `MPI_Ialltoall`, `MPI_Ialltoallv`, `MPI_Ialltoallw`, `MPI_Ialltoall_c`, `MPI_Ialltoallv_c`, `MPI_Ialltoallw_c`, `mpi_ialltoall_`, `mpi_ialltoallv_`, `mpi_ialltoallw_` - **MPI_Iscan**: `MPI_Iscan`, `MPI_Iscan_c`, `mpi_iscan_` - **MPI_Iexscan**: `MPI_Iexscan`, `MPI_Iexscan_c`, `mpi_iexscan_` !!! Note The Sonar library does not support large count MPI functions for the Fortran language yet, and thus, these functions are not instrumented. The figure below shows an example of the MPI function view. The program executes a distributed stencil algorithm with MPI and OmpSs-2. There are several MPI processes, each running OmpSs-2 tasks on an exclusive set of CPUs. Most of these are computation tasks, while the others are concurrent tasks performing communications using the blocking mode of the TAMPI library. These use `MPI_Send` and `MPI_Recv` functions to send and receive blocks of data. The program was linked with Sonar and preceding the TAMPI library. Thus, the trace shows the blocking MPI function calls made by the application. ![MPI function view example](fig/mpi-function.png) The light green areas correspond to the `MPI_Init_thread` calls, the grey ones are `MPI_Send` calls and the dark green areas are `MPI_Recv` calls. There are other secondary calls like `MPI_Bcast` (orange), `MPI_Barrier` (blue) and `MPI_Finalize` (red) calls. As mentioned above, the trace shows the blocking MPI functions called by the application because Sonar was placed before TAMPI in the linking order. However, these blocking calls may not be actually executed by the MPI library; TAMPI will transparently replace them with non-blocking calls (e.g., `MPI_Isend` and `MPI_Irecv`) and a polling mechanism for the generated MPI requests. If you want to explore the actual MPI functions being executed, you should link the Sonar library after TAMPI.