250 lines
7.6 KiB
Markdown
250 lines
7.6 KiB
Markdown
# Trace specification
|
|
|
|
!!! Important
|
|
|
|
This document refers to the trace specification for
|
|
the version 2
|
|
|
|
The ovni instrumentation library stores the information collected in a
|
|
trace following the specification of this document.
|
|
|
|
The complete trace is stored in a top-level directory named `ovni`.
|
|
Inside this directory you will find the loom directories with the prefix
|
|
`loom.`. The name of the loom is built from the `loom` parameter of
|
|
`ovni_proc_init()`, prefixing it with `loom.`.
|
|
|
|
Each loom directory contains one directory per process of that loom. The
|
|
name is composed of the `proc.` prefix and the PID of the process
|
|
specified in the `pid` argument to `ovni_proc_init()`.
|
|
|
|
Each process directory contains:
|
|
|
|
- The process metadata file `metadata.json`.
|
|
- The thread streams, composed of:
|
|
- The binary stream like `thread.123.obs`
|
|
- The thread metadata like `thread.123.json`
|
|
|
|
## Process metadata
|
|
|
|
!!! Important
|
|
|
|
Process metadata has version 2
|
|
|
|
The process metadata file contains important information about the trace
|
|
that is invariant during the complete execution, and generally is
|
|
required to be available prior to processing the events in the trace.
|
|
|
|
The metadata is stored in the JSON file `metadata.json` inside each
|
|
process directory and contains the following keys:
|
|
|
|
- `version`: a number specifying the version of the metadata format.
|
|
Must have the value 2 for this version.
|
|
- `app_id`: the application ID, used to distinguish between applications
|
|
running on the same loom.
|
|
- `rank`: the rank of the MPI process (optional).
|
|
- `nranks`: number of total MPI processes (optional).
|
|
- `cpus`: the array of $`N_c`$ CPUs available in the loom. Only one
|
|
process in the loom must contain this mandatory key. Each element is a
|
|
dictionary with the keys:
|
|
- `index`: containing the logical CPU index from 0 to $`N_c - 1`$.
|
|
- `phyid`: the number of the CPU as given by the operating system
|
|
(which can exceed $`N_c`$).
|
|
|
|
Here is an example of the `metadata.json` file:
|
|
|
|
```
|
|
{
|
|
"version": 2,
|
|
"app_id": 1,
|
|
"rank": 0,
|
|
"nranks": 4,
|
|
"cpus": [
|
|
{
|
|
"index": 0,
|
|
"phyid": 0
|
|
},
|
|
{
|
|
"index": 1,
|
|
"phyid": 1
|
|
},
|
|
{
|
|
"index": 2,
|
|
"phyid": 2
|
|
},
|
|
{
|
|
"index": 3,
|
|
"phyid": 3
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
## Thread metadata
|
|
|
|
!!! Important
|
|
|
|
Thread metadata has version 2
|
|
|
|
The thread metadata stores constant information per thread, like the
|
|
process metadata. The information is stored in a dictionary, where the
|
|
name of the emulation models are used as keys. In particular, the
|
|
libovni library writes information in the "ovni" key, such as the
|
|
model requirements, and other information like the version of libovni
|
|
used. Example:
|
|
|
|
```json
|
|
{
|
|
"version": 2,
|
|
"ovni": {
|
|
"lib": {
|
|
"version": "1.4.0",
|
|
"commit": "unknown"
|
|
},
|
|
"require": {
|
|
"ovni": "1.0.0"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
The metadata is written to disk when the thread is first initialized,
|
|
and then every time the thread stream is flushed.
|
|
|
|
## Thread binary streams
|
|
|
|
!!! Important
|
|
|
|
Thread binary stream has version 1
|
|
|
|
Streams are a binary files that contains a succession of events with
|
|
monotonically increasing clock values. Streams have a small header and
|
|
the variable size events just after the header.
|
|
|
|
The header contains the magic 4 bytes of "ovni" and a version number of
|
|
4 bytes too. Here is a figure of the data stored in disk:
|
|
|
|
![Stream](fig/stream.svg)
|
|
|
|
Similarly, events have a fixed size header followed by an optional
|
|
payload of varying size. The header has the following information:
|
|
|
|
- Event flags
|
|
- Payload size in a special format
|
|
- Model, category and value codes
|
|
- Time in nanoseconds
|
|
|
|
The event size can vary depending on the data stored in the payload. The
|
|
payload size is specified using 4 bits, with the value `0x0` for no
|
|
payload, or with value $`v`$ for $`v + 1`$ bytes of payload. This allows
|
|
us to use 16 bytes of payload with value `0xf` at the cost of
|
|
sacrificing payloads of one byte.
|
|
|
|
There are two types of events, depending of the size needed for the
|
|
payload:
|
|
|
|
- Normal events: with a payload up to 16 bytes
|
|
- Jumbo events: with a payload up to $`2^{32}`$ bytes
|
|
|
|
## Normal events
|
|
|
|
The normal events are composed of:
|
|
|
|
- 4 bits of flags
|
|
- 4 bits of payload size
|
|
- 3 bytes for the MCV
|
|
- 8 bytes for the clock
|
|
- 0 to 16 bytes of payload
|
|
|
|
Here is an example of a normal event without payload, a total of 12
|
|
bytes:
|
|
|
|
```
|
|
00 4f 48 65 01 c5 cf 1d 96 d0 12 00 |.OHe........|
|
|
```
|
|
|
|
And in the following figure you can see every field annotated:
|
|
|
|
![Normal event without payload](fig/event-normal.svg)
|
|
|
|
Another example of a normal event with 16 bytes of payload, a total of
|
|
28 bytes:
|
|
|
|
```
|
|
0f 4f 48 78 58 c1 b0 b5 95 43 11 00 00 00 00 00 |.OHxX....C......|
|
|
ff ff ff ff 00 00 00 00 00 00 00 00 |............|
|
|
```
|
|
|
|
In the following figure you can see each field annotated:
|
|
|
|
![Normal event with payload content](fig/event-normal-payload.svg)
|
|
|
|
## Jumbo events
|
|
|
|
The jumbo events are just like normal events but they can hold large
|
|
data. The size of the jumbo data is stored as a 32 bits integer as a
|
|
normal payload, and the jumbo data just follows the event.
|
|
|
|
- 4 bits of flags
|
|
- 4 bits of payload size (always 4 with value 0x3)
|
|
- 3 bytes for the MCV
|
|
- 8 bytes for the clock
|
|
- 4 bytes of payload with the size of the jumbo data
|
|
- 0 to 2^32 bytes of jumbo data
|
|
|
|
Example of a jumbo event of 30 bytes in total, with 14 bytes of jumbo
|
|
data:
|
|
|
|
```
|
|
13 56 59 63 eb c1 4b 1a 96 d0 12 00 0e 00 00 00 |.VYc..K.........|
|
|
01 00 00 00 74 65 73 74 74 79 70 65 31 00 |....testtype1.|
|
|
```
|
|
|
|
In the following figure you can see each field annotated:
|
|
|
|
![Jumbo event](fig/event-jumbo.svg)
|
|
|
|
## Design considerations
|
|
|
|
The stream format has been designed to be very simple, so writing a
|
|
parser library would take no more than 2 days for a single developer.
|
|
|
|
The size of the events has been designed to be small, with 12 bytes per
|
|
event when no payload is used.
|
|
|
|
!!! Caution
|
|
|
|
The events are stored in disk following the endianness of the
|
|
machine where they are generated. So a stream generated with a
|
|
little endian machine would be different than on a big endian
|
|
machine. We assume the same endiannes is used to write the trace
|
|
at runtime and read it after, at the emulation process.
|
|
|
|
The events are designed to be easily identified when looking at the
|
|
raw stream in binary, as the MCV codes can be read as ASCII characters:
|
|
|
|
```
|
|
00000000 6f 76 6e 69 01 00 00 00 0f 4f 48 78 08 ba 2e 5c |ovni.....OHx...\|
|
|
00000010 b5 b0 00 00 00 00 00 00 ff ff ff ff 00 00 00 00 |................|
|
|
00000020 00 00 00 00 13 56 59 63 3c c2 2e 5c b5 b0 00 00 |.....VYc<..\....|
|
|
00000030 0e 00 00 00 01 00 00 00 74 65 73 74 74 79 70 65 |........testtype|
|
|
00000040 31 00 07 56 54 63 43 cc 2e 5c b5 b0 00 00 01 00 |1..VTcC..\......|
|
|
00000050 00 00 01 00 00 00 03 56 54 78 03 cd 2e 5c b5 b0 |.......VTx...\..|
|
|
00000060 00 00 01 00 00 00 03 56 54 70 2b 7d 37 5c b5 b0 |.......VTp+}7\..|
|
|
00000070 00 00 01 00 00 00 03 56 54 72 c3 4d 40 5c b5 b0 |.......VTr.M@\..|
|
|
00000080 00 00 01 00 00 00 03 56 54 65 03 36 49 5c b5 b0 |.......VTe.6I\..|
|
|
00000090 00 00 01 00 00 00 00 4f 48 65 f5 36 49 5c b5 b0 |.......OHe.6I\..|
|
|
000000a0 00 00 |..|
|
|
```
|
|
|
|
This allows a human to detect signs of corruption by visually inspecting
|
|
the streams.
|
|
|
|
## Limitations
|
|
|
|
The streams are designed to be read only forward, as they only contain
|
|
the size of each event in the header.
|
|
|
|
Currently, we only support using the threads as sources of events, using
|
|
one stream per thread. However, adding support for more streams from
|
|
multiple sources is planned for the future.
|