From bba46ac2003c6504fdf39a8802dae38935aa3f18 Mon Sep 17 00:00:00 2001 From: Rodrigo Arias Date: Fri, 13 Sep 2024 16:03:15 +0200 Subject: [PATCH] Explain some concepts in the documentation --- doc/user/concepts.md | 31 -- doc/user/concepts/part-model.md | 62 ++++ doc/user/concepts/part-model.svg | 516 +++++++++++++++++++++++++++++++ doc/user/concepts/trace-model.md | 72 +++++ doc/user/runtime/index.md | 111 +++++++ mkdocs.yml | 5 +- 6 files changed, 765 insertions(+), 32 deletions(-) delete mode 100644 doc/user/concepts.md create mode 100644 doc/user/concepts/part-model.md create mode 100644 doc/user/concepts/part-model.svg create mode 100644 doc/user/concepts/trace-model.md create mode 100644 doc/user/runtime/index.md diff --git a/doc/user/concepts.md b/doc/user/concepts.md deleted file mode 100644 index c867319..0000000 --- a/doc/user/concepts.md +++ /dev/null @@ -1,31 +0,0 @@ -# Overview - -The objective of the ovni project is to provide insight into what -happened at execution of a program. - -![Instrumentation process](fig/instrumentation.svg) - -The key pieces of software involved are instrumented so they emit events -during the execution which allow the reconstruction of the execution -later on. - -During the execution phase, the information gathered in the events is -kept very short and simple, so the overhead is kept at minimum to avoid -disturbing the execution process. Here is an example of a single event -emitted during the execution phase, informing the current thread to -finish the execution: - - 00 4f 48 65 52 c0 27 b4 d3 ec 01 00 - -During the emulation phase, the events are read and processed in the -emulator, reconstructing the execution. State transitions are recorded -in a Paraver trace. Here is an example of the same thread ceasing the -execution: - - 2:0:1:1:1:50105669:1:0 - -Finally, loading the trace in the Paraver program, we can generate a -timeline visualization of the state change. Here is the example for the -same state transition of the thread stopping the execution: - -![Visualization](fig/visualization.png) diff --git a/doc/user/concepts/part-model.md b/doc/user/concepts/part-model.md new file mode 100644 index 0000000..d2602d1 --- /dev/null +++ b/doc/user/concepts/part-model.md @@ -0,0 +1,62 @@ +# Part model + +Ovni has a model to represent the hardware components as well as the software +concepts like threads or processes. Each concept is considered to be a *part*. +Here is an example diagram depicting the part hierarchy: + +![lalala](part-model.svg "foo bar") + +Notice how a loom can restrict the CPUs of the node to its child processes. + +## Software parts + +These are not physical parts, but they abstract common concepts. + +### Thread + +A thread in ovni is directly mapped to a [POSIX +thread](https://en.wikipedia.org/wiki/Pthreads) and they are identified by a +`TID` which must be unique in a [node](#node). Threads in ovni have [a model with +an internal state](../emulation/ovni.md/#thread_model) that tries to tracks the +state of the real thread. + +### Process + +A process is directly mapped to a UNIX +[process](https://en.wikipedia.org/wiki/Process_(computing)) and they are +identified by a `PID` number which must be unique in a [node](#node). + +### Loom + +A loom has no direct mapping to a usual concept. It consists of a set of +[CPUs](#cpu) from the same node and a set of processes that can *only run in +those CPUs*. Each CPUs must belong to one and only one loom. It is often used +to group CPUs that belong to the same process when running workloads with +multiple processes (like with MPI). + +Each loom has a virtual CPU which collects running threads that are not +exclusively assigned to a physical CPU, so we cannot determine on which CPU they +are running. + +## Hardware parts + +These parts have a physical object assigned. + +### CPU + +A CPU is a hardware thread that can execute at most one thread at a time. Each +CPU must have a physical ID that is unique in a node. In ovni there is also a +virtual CPU, which simply is used to collect threads that are not tied to an +specific physical CPU, so it cannot be easily determined where they are running. + +### Node + +A *node* refers to a compute node, often a physical machine with memory and +network which may contain one or more +[sockets](https://en.wikipedia.org/wiki/CPU_socket), where each socket has one +or more CPUs. + +### System + +A system represents the complete set of hardware parts and software parts that +are known to ovni in a given trace. diff --git a/doc/user/concepts/part-model.svg b/doc/user/concepts/part-model.svg new file mode 100644 index 0000000..b8c5293 --- /dev/null +++ b/doc/user/concepts/part-model.svg @@ -0,0 +1,516 @@ + + + + + + + + + + + + Hardware parts + + + + Thread + + Process + + Thread + + Thread + + Thread + + Process + + Thread + + Thread + + Loom + + Node + + CPU + + CPU + + + + + + + + + + + + CPU + + CPU + + + + + System + + + Software parts + + diff --git a/doc/user/concepts/trace-model.md b/doc/user/concepts/trace-model.md new file mode 100644 index 0000000..318e42b --- /dev/null +++ b/doc/user/concepts/trace-model.md @@ -0,0 +1,72 @@ +# Trace model + +An event model is composed by a group of runtime events + +## Trace + +The information generated by a program or later processed by other ovni tools is +known as a trace. A runtime trace stores the information as-is in disk from a +program execution. While a emulation trace is generated from the runtime trace +for visualization with Paraver. + +All the information is always stored inside the same directory, by default +`ovni/`, which is known as the trace directory. + +## Event + +An event is a point in time that has some information associated. Events written +at runtime by libovni have at MCV, a clock and a optional payload. The list of +all events recognized by the emulator can be found [here](../emulation/events.md). + +## State + +A state is a discrete value that can change over time based on the events the +emulator receives. Usually a single event causes a single state change, which is +then written to the Paraver traces. An example is the thread state, which can +change over time based on the events `OH*` that indicate a state transition +of the current thread. + +## MCV + +The MCV acronym is short of Model-Class-Value, which is a three character (byte) +identification for events. + +## Clock + +A clock is a 64 bit counter, which counts the number of nanoseconds from an +arbitrary point in time in the past. Each event has the value of the clock +stored inside, to indicate when that event happened. In a given trace there can +be multiple clocks which don't refer to the same point in the past and must be +corrected so they all produce an ordered sequence of events. The ovnisync +program performs this correction by measuring the difference across clocks of +different nodes. + +## Event model + +An event model is composed of several components: + +- A set of [events](#event) all with the same model identifier in the + [MCV](#mcv) +- The emulator code that processes those events. +- A human readable name, like `ovni` or `nanos6`. + +## Payload + +Events may have associated additional information which is stored in the stream. + +## Binary stream + +A binary stream is a file named `stream.obs` (.obs stands for Ovni Binary +Stream) composed of a header and a concatenated array of events without padding. +Notice that each event may have different length. + +## Stream metadata + +The stream metadata is a JSON file named `stream.json` which holds information +about the stream. + +## Stream + +A stream is a directory which contains a binary stream and the associated stream +metadata file. Each stream is associated with a given part of a system. As of +now, libovni can only generate streams associated to [threads](part-model.md#thread). diff --git a/doc/user/runtime/index.md b/doc/user/runtime/index.md new file mode 100644 index 0000000..b75b03a --- /dev/null +++ b/doc/user/runtime/index.md @@ -0,0 +1,111 @@ +# Introduction + +To use *libovni* to instrument a program, follow the next instructions +carefully, or you may end up with an incomplete trace that is rejected at +emulation. + +You can also generate a valid trace from your own software or hardware +directly, but be sure to follow the [trace specification](trace_spec.md). + +## Initialization + +To initialize libovni follow these steps in all threads: + +1. **Check the version**. Call `ovni_version_check()` once before calling any + ovni function. It can be called multiple times from any thread, but only one + is required. + +2. **Init the process**. Call `ovni_proc_init()` to initialize the process when + a new process begins the execution. It can only be called **once per + process** and it must be called before the thread is initialized. + +3. **Init the thread**. Call `ovni_thread_init()` when a new thread begins the + execution (including the main process thread after the process is + initialized). Multiple attempts to initialize the thread are ignored with a + warning. + +The `ovni_proc_init()` arguments are as follows: + +```c +void ovni_proc_init(int app, const char *loom, int pid); +``` + +The `app` defines the "appid" of the program, which must be a number >0. This is +useful to run multiple processes some of which run the same "app", so you can +tell which one is which. The `loom` defines the +[loom](../concepts/part-model.md#loom) name and assignes the process to that +loom. It must be compose of the host name, a dot and a suffix. The PID is the +one obtained by `getpid(2)`. + +The `ovni_thread_init()` function only accepts one argument, the TID as returned +by `gettid(2)`. + +## Setup metadata + +Once the process and thread are initialized, you can begin adding metadata to +the thread stream. + +1. **Require models**. Call `ovni_thread_require()` with the required model + version before emitting events for a given model. Only required once from a + thread in a given trace. + +2. **Emit loom CPUs**. Call `ovni_add_cpu()` to register each CPU in the loom. It can + be done from a single thread or multiple threads, in the latter the list of + CPUs is merged. + +3. **Set the rank**. If you use MPI, call `ovni_proc_set_rank()` to register the + rank and number of ranks of the current execution. Only once per process. + +## Start the execution + +The current thread must switch to the "Running" state before any event can be +processed by the emulator. Do so by emitting a `OHx` event in the stream with +the appropriate payload: + +```c +static void thread_execute(int32_t cpu, int32_t ctid, uint64_t tag) +{ + struct ovni_ev ev = {0}; + ovni_ev_set_clock(&ev, ovni_clock_now()); + ovni_ev_set_mcv(&ev, "OHx"); + ovni_payload_add(&ev, (uint8_t *) &cpu, sizeof(cpu)); + ovni_payload_add(&ev, (uint8_t *) &ctid, sizeof(ctid)); + ovni_payload_add(&ev, (uint8_t *) &tag, sizeof(tag)); + ovni_ev_emit(&ev); +} +``` + +The `cpu` is the logical index (not the physical ID) of the loom CPU at which +this thread will begin the execution. Use -1 if it is not known. The `ctid` and +`tag` allow you to track the exact point at which a given thread was created and +by which thread but they are not relevant for the first thread, so they can be +set to -1. + +## Emit events + +After this point you can emit any other event from this thread. Use the +`ovni_ev_*` set of functions to create and emit events. Notice that all events +are refer to the current thread that emits them. + +If you need to store metadata information, use the `ovni_attr_*` set of +functions. The metadata is stored in disk by `ovni_attr_fluch()` and when the +thread is freed by `ovni_thread_free()`. + +Attempting to emit events or writing metadata without having a thread +initialized will cause your program to abort. + +## Finishing the execution + +To finalize the execution **every thread** must perform the following steps, +otherwise the trace **will be rejected**. + +1. **End the current thread**. Emit a [`OHe` event](../emulation/events.md#OHe) to inform the current thread ends. +2. **Flush the buffer**. Call `ovni_flush()` to be sure all events are written + to disk. +3. **Free the thread**. Call `ovni_thread_free()` to complete the stream and + free the memory used by the buffer. +4. **Finish the process**. If this is the last thread, call `ovni_proc_fini()` + to set the process state to finished. + +If a thread fails to perform these steps, the complete trace will be rejected by +the emulator as it cannot guarantee the trace to be consistent. diff --git a/mkdocs.yml b/mkdocs.yml index 2938067..b9411f7 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -23,9 +23,12 @@ markdown_extensions: nav: - index.md - 'User guide': - - user/concepts.md - user/installation.md + - 'Concepts': + - user/concepts/part-model.md + - user/concepts/trace-model.md - 'Runtime': + - user/runtime/index.md - user/runtime/tracing.md - user/runtime/mark.md - user/runtime/distributed.md