Compare commits

...

1470 Commits

Author SHA1 Message Date
ab86243a07
Add missing which in nodes checkPhase
All checks were successful
CI / build:cross (pull_request) Successful in 6s
CI / build:all (pull_request) Successful in 16s
CI / build:all (push) Successful in 4s
CI / build:cross (push) Successful in 6s
When enabling checks, the build log is polluted with errors.

Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
Tested-by: Aleix Boné <abonerib@bsc.es>
2025-10-23 15:59:21 +02:00
14f2393d30 Update website
All checks were successful
CI / build:cross (pull_request) Successful in 6s
CI / build:all (pull_request) Successful in 15s
CI / build:all (push) Successful in 4s
CI / build:cross (push) Successful in 6s
Add apex page and replace bscpkgs references for jungle after the merge.

See: rarias/jungle-website#1
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-22 15:48:13 +02:00
f115d611e7 Add aaguirre user
All checks were successful
CI / build:cross (pull_request) Successful in 6s
CI / build:all (pull_request) Successful in 15s
CI / build:all (push) Successful in 3s
CI / build:cross (push) Successful in 6s
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-22 15:28:29 +02:00
4261d327c6 Include agenix module and package directly
All checks were successful
CI / build:cross (pull_request) Successful in 6s
CI / build:all (pull_request) Successful in 15s
CI / build:all (push) Successful in 4s
CI / build:cross (push) Successful in 6s
Avoids adding an extra flake input only to fetch a single module and
package.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-10-14 09:37:47 +02:00
4685c36e2f Add brief documentation on maintainer roles
All checks were successful
CI / build:cross (pull_request) Successful in 6s
CI / build:all (pull_request) Successful in 15s
CI / build:all (push) Successful in 3s
CI / build:cross (push) Successful in 6s
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-10-10 16:37:00 +02:00
c6c788f1e2 Add meta to packages
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-10-10 16:36:56 +02:00
606386d006 Add maintainers
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-10-10 16:36:38 +02:00
1fba0a14a8 Enable ovni for cross compilation
All checks were successful
CI / build:cross (pull_request) Successful in 5s
CI / build:all (pull_request) Successful in 15s
CI / build:all (push) Successful in 3s
CI / build:cross (push) Successful in 6s
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-10-10 12:08:28 +02:00
d6621e939a Add RISC-V cross-compilation target for CI
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-10 12:08:24 +02:00
67726c1d44 Fix nanos6 cross-compilation for riscv
All checks were successful
CI / build:all (pull_request) Successful in 15s
CI / build:all (push) Successful in 4s
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-10-09 15:49:01 +02:00
a971ed6a54 Fix cross compilation for lmbench
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-10-09 15:49:01 +02:00
06581e455c Disable papi when cross compiling
Even if we do an override to papi get the proper configure flags for
cross-compiling, the memory fences are not defined for risc-v:

mb.h:67:2: error: #error Need to define rmb for this architecture!

See: #184
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-10-09 15:48:45 +02:00
dd7f24f455 Replace __noChroot with requiredSystemFeatures
All checks were successful
CI / build:all (pull_request) Successful in 15s
CI / build:all (push) Successful in 4s
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-10-09 11:41:18 +02:00
64e2c39582 Add hwloc test with sys-devices feature
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-09 11:41:06 +02:00
98d17b19d3 Enable custom sys-devices system feature
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-10-09 11:40:44 +02:00
44cc60fcd8 Update license year range to 2025
All checks were successful
CI / build:all (pull_request) Successful in 15s
CI / build:all (push) Successful in 4s
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-07 16:07:32 +02:00
ca48ce556c Update gitlab CI after merge
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-07 16:07:30 +02:00
e8ac9dfb64 Upgrade README after bscpkgs merge
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-07 16:07:28 +02:00
188ba6df0a Remove bscpkgs input
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-07 16:07:26 +02:00
b1a37ae1fe Enable unfree packages in nixpkgs config
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-07 16:07:24 +02:00
63822bb054 Move the rest of packages to main overlay
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-07 16:07:23 +02:00
b94a1493d5 Merge flake.nix with bscpkgs
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-07 16:07:21 +02:00
826d6a28ef Move slurm to pkgs/
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-07 16:07:19 +02:00
ae6b0ae161 Move MPICH to pkgs/mpich and set as default
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-07 16:07:01 +02:00
01986c376b Merge remote-tracking branch 'bscpkgs/master' into merge-bscpkgs 2025-10-03 13:47:04 +02:00
e42058f08b Allow access to hut from fox
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-10-02 17:03:21 +02:00
f3bfe89f27 Fetch website from its own git repository
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-02 15:45:21 +02:00
ee6f981006 Add script to trim the repository
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-02 15:44:56 +02:00
92ee4a09d7 Rename test to tests and tests to testList
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-10-01 15:53:09 +02:00
34f4b6aa37 Move bsc-ci test into let
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-10-01 15:45:33 +02:00
2f2d6cbea8 Rework bsc-ci
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-10-01 15:45:31 +02:00
69b09b6dda Add riscv64 cross compilation to bsc-ci and hydra
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-10-01 15:45:29 +02:00
a737d725ed Put helper attrs of ompss2 drv to passthru
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-10-01 15:45:28 +02:00
6c1d1f3b2b Remove gcc from tampi *buildInputs
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-10-01 15:45:26 +02:00
f338ef47d5 Fix strictDeps ovni
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-10-01 15:45:25 +02:00
239e84c40c Fix strictDeps osu
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-10-01 15:45:23 +02:00
ed820e79f8 Fix strictDeps mercurium
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-10-01 15:45:22 +02:00
afeb415c98 Fix strictDeps tampi
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-10-01 15:45:20 +02:00
256b24b97b Fix strictDeps sonar
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-10-01 15:45:18 +02:00
492f73b600 Fix strictDeps nanos6
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-10-01 15:45:17 +02:00
76ddd85afe Fix strictDeps paraver
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-10-01 15:45:15 +02:00
7affb8ef4b Fix strictDeps ompss2
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-10-01 15:45:13 +02:00
4ba823e5b7 Fix strictDeps intel 2023
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-10-01 15:45:11 +02:00
51eecde59e Fix strictDeps bench6
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-10-01 15:45:08 +02:00
9eb5c486ba Fix strictDeps bigotes
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-10-01 15:43:58 +02:00
5df49dcfab Add gitea CI configuration
Builds the .#bsc-ci.all target on each PR. Causes all packages to be
built in hut, populating the nix cache.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-10-01 14:59:25 +02:00
b040bebd1d Add acinca user
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-10-01 12:27:43 +02:00
f69629d2da Restart slurmd on failure
A failure to reach the control node can cause slurmd to fail and the
unit remains in the failed state until is manually restarted. Instead,
try to restart the service every 30 seconds, forever:

    owl1% systemctl show slurmd | grep -E 'Restart=|RestartUSec='
    Restart=on-failure
    RestartUSec=30s
    owl1% pgrep slurmd
    5903
    owl1% sudo kill -SEGV 5903
    owl1% pgrep slurmd
    6137

Fixes: #177
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-09-30 17:20:39 +02:00
0668f0db74 Lower connect timeout when using hut substituter
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-09-29 18:44:48 +02:00
5fcd57a061 Use hut substituter in all nodes
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-09-29 18:44:38 +02:00
ad1544759f Remove machine access for user csiringo
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-09-29 18:23:24 +02:00
2ffdd53d86 Add hydraJobs with tests and packages
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-09-26 16:12:46 +02:00
e1c950a530 Mount apex /home via NFS in raccoon
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-09-26 12:28:53 +02:00
f9632c37f8 Remove extra SSH jump configuration
We now have direct visibility among nodes so we don't need any extra
SSH configuration to reach them.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-09-26 12:28:51 +02:00
1f0cb4ae76 Add raccoon peer to wireguard
It routes traffic from fox, apex and the compute nodes so that we can
reach the git servers and tent.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-09-26 12:28:48 +02:00
d49d078bed Add raccoon host key
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-09-26 12:28:46 +02:00
e98fdb89ab Restrict fox peer to a single IP
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-09-26 12:28:43 +02:00
6afe05b5fd Use lowercase peer hostnames
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-09-26 12:28:25 +02:00
7d5aebf882 Share a public folder for documents
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-09-19 10:59:40 +02:00
94cbfd38a6 Fix AMDuProfPcm so it finds libnuma.so
We change the search procedure so it detects NixOS from /etc/os-release
and uses "libnuma.so" when calling dlopen, instead of harcoding a full
path to /usr. The full patch of libnuma is stored in the runpath, so
dlopen can find it.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
Tested-by: Vincent Arcila <vincent.arcila@bsc.es>
2025-09-19 10:54:36 +02:00
4da7780472 Add amd_hsmp module in fox for AMD uProf
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-09-19 10:54:24 +02:00
a6dfc267fd Fix hidden dependencies for AMDuProfSys
It tries to dlopen libcrypt.so.1 and libstdc++.so.6, so we make sure
they are available by adding them to the runpath.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-09-19 10:54:19 +02:00
d6126501ba Disable NMI watchdog in fox
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-09-19 10:54:17 +02:00
ac0deb47b6 Fix amd-uprof dependencies with patchelf
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-09-19 10:54:15 +02:00
f7d676de77 Fix hrtimer new interface
The hrtimer_init() is now done via hrtimer_setup() with the callback
function as argument.

See: https://lwn.net/Articles/996598/
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-09-19 10:54:09 +02:00
cf1db201b2 Use CFLAGS_MODULE instead of EXTRA_CFLAGS
Fixes the build in Linux 6.15.6, as it was not able to find the include
files.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-09-19 10:54:07 +02:00
e6e4846529 Add AMD uProf module and enable it in fox
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-09-19 10:54:05 +02:00
084d556c56 Add AMD uProf package and driver
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-09-19 10:53:49 +02:00
c7b5ec13b8 Provide nixpkgs.lib in bscpkgs outputs
Currently, we can use bscpkgs similarly to nixpkgs either through
the flake outputs or with import bscpkgs:

```nix
# currently supported:
bscpkgs.legacyPackages.x86_64-linux.hello
let pkgs = import bscpkgs { system = "x86_64-linux"; }; in pkgs.hello
```
The missing piece is nixpkgs.lib (not pkgs.lib, the system agnostic
one). The workaround is to do bscpkgs.inputs.nixpkgs.lib instead. We can
simplify this by forwarding the lib to our outputs.

This enables us to use bscpkgs as a drop-in
replacing the inputs to our flake from nixpkgs to bscpkgs.
(inputs.nixpkgs.url = "<*BSC*pkgs url>").


Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
Tested-by: Aleix Boné <abonerib@bsc.es>
2025-09-12 14:28:42 +02:00
00dfe801f4 Fix GPI-2 and enable TAGASPI
The rdma-core driver.h include is no longer installed:

 56dd87acd2

So ibv_read_sysfs_file() is not defined. As the symbols is still
distributed, we simply add the missing prototype manually.

Similarly, the gaspi_get_system_mem() function is not available from the
gaspi public headers, so we define it in the max_mem.c test.

Fixes: rarias/bscpkgs#7
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-09-12 14:21:00 +02:00
ff0fc18d0a Mount home via NFS from apex in fox
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 15:34:02 +02:00
19c7e32678 Allow access to NFS via wireguard subnet
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 15:33:47 +02:00
017c19e7d0 Use 10.106.0.0/24 subnet to avoid collisions
The 106 byte is the code for 'j' (jungle) in ASCII:

	% printf j | od -t d
	0000000         106
	0000001

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 12:03:13 +02:00
a36eff8749 Revert "Remove pam_slurm_adopt from fox"
This reverts commit 1eac0fcad8211195499bc566e6c70312b31af700.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 12:03:06 +02:00
df17b11458 Enable fail2ban in fox
Protect fox against ssh bruteforce attacks:

fox% sudo lastb | head
root     ssh:notty    200.124.28.102   Mon Sep  1 11:25 - 11:25  (00:00)
root     ssh:notty    200.124.28.102   Mon Sep  1 11:25 - 11:25  (00:00)
root     ssh:notty    200.124.28.102   Mon Sep  1 11:25 - 11:25  (00:00)
root     ssh:notty    200.124.28.102   Mon Sep  1 11:25 - 11:25  (00:00)
root     ssh:notty    200.124.28.102   Mon Sep  1 11:25 - 11:25  (00:00)
root     ssh:notty    200.124.28.102   Mon Sep  1 11:25 - 11:25  (00:00)
root     ssh:notty    200.124.28.102   Mon Sep  1 11:25 - 11:25  (00:00)
root     ssh:notty    200.124.28.102   Mon Sep  1 11:25 - 11:25  (00:00)
root     ssh:notty    200.124.28.102   Mon Sep  1 11:24 - 11:24  (00:00)
root     ssh:notty    200.124.28.102   Mon Sep  1 11:24 - 11:24  (00:00)

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 12:03:02 +02:00
0dc7b7eb3d Accept connections from apex to fox slurmd
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 12:03:00 +02:00
dff6eaf587 Accept fox connection to slurm controller
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 12:02:59 +02:00
4b6b67b587 Add fox machine to SLURM
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 12:02:57 +02:00
20e7d244d1 Rekey secrets with trusted fox key
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 12:02:55 +02:00
c5d3b8e7f0 Trust fox for compute node secrets
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 12:02:52 +02:00
6bbfb0d124 Make apex host specific to each machine
Allows direct contact via the VPN when accessing from fox, but use
Internet when using the rest of the machines.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 12:02:49 +02:00
46d03d5ca7 Add local host fox in apex
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 12:02:46 +02:00
e366e6ce87 Enable wireguard in apex
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 12:02:43 +02:00
e415f70bbb Add wireguard server in fox
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-09-03 12:02:38 +02:00
200c727bbf Use writeShellScript for suspend.sh and resume.sh
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-08-29 12:35:28 +02:00
7413021440 Add firewall rules to slurm server
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-08-29 12:35:26 +02:00
20b4805335 Remove hut from slurm
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-08-29 12:35:24 +02:00
f7dff9deab Only configure apex as slurm server
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-08-29 12:35:22 +02:00
f569933732 Split slurm configuration for client and server
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-08-29 12:35:20 +02:00
ee895d2e4f Move slurm control server to apex
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-08-29 12:35:16 +02:00
5ee8623af2 Fix typo in csiringo ssh key
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-08-27 17:44:20 +02:00
a0e4b209b0 Enable nix-ld in weasel
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-08-27 16:19:34 +02:00
ce25867421 Add csiringo user with access to apex and weasel
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-08-27 16:02:26 +02:00
f89bba35a6 Access gitlab via raccoon in fox
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-08-27 15:27:38 +02:00
2c8d7ed855 Build nOS-V with PAPI support
To support the new instrumentation for HWC it would be useful to already
build nOS-V with PAPI support enabled. The enablePapi switch allows it
to be disabled with `nosv.override { enablePapi = false; }`.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-08-01 13:12:48 +02:00
d591721a61 Move StartLimit* options to unit section
The StartLimitBurst and StartLimitIntervalSec options belong to the
[Unit] section, otherwise they are ignored in [Service]:

> Unknown key 'StartLimitIntervalSec' in section [Service], ignoring.

When using [Unit], the limits are properly set:

  apex% systemctl show power-policy.service | grep StartLimit
  StartLimitIntervalUSec=10min
  StartLimitBurst=10
  StartLimitAction=none

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-24 14:32:46 +02:00
343b4f155e Set power policy to always turn on
In all machines, as soon as we recover the power, turn the machine back
on. We cannot rely on the previous state as we will shut them down
before the power is cut to prevent damage on the power supply
monitoring circuit.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-24 11:22:38 +02:00
39a211a846 Add NixOS module to control power policy
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-24 11:22:36 +02:00
142985c505 Move August shutdown to 3rd at 22h
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-24 11:22:33 +02:00
3f3dc2d037 Disable automatic August shutdown for Fox
The UPC has different dates for the yearly power cut, and Fox can
recover properly from a power loss, so we don't need to have it turned
off before the power cut. Simply disabling the timer is enough.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-24 11:22:10 +02:00
3269d763aa Add cudainfo program to test CUDA
The cudainfo program checks that we can initialize the CUDA RT library
and communicate with the driver. It can be used as standalone program or
built with cudainfo.gpuCheck so it is executed inside the build sandbox
to see if it also works fine. It uses the autoAddDriverRunpath hook to
inject in the runpath the location of the library directory for CUDA
libraries.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-23 11:52:09 +02:00
f2d8ee8552 Add missing symlink in cuda sandbox
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-23 11:51:47 +02:00
8d984a0672 Enable cuda systemFeature in raccoon and fox
This allows running derivations which depend on cuda runtime without
breaking the sandbox. We only need to add `requiredSystemFeatures = [ "cuda" ];`
to the derivation.

Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-07-22 17:07:13 +02:00
f3733418b2 Move shared nvidia settings to a separate module
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-07-22 17:06:45 +02:00
1666c14a35 Remove dependency on wx from paraver kernel
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-07-22 13:00:23 +02:00
b29f03ba6e Fix boost >=1.87 in wxparaver
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-07-22 12:59:53 +02:00
ae2ef1d2df Add patch to paraver to prevent focus stealing
See: https://github.com/bsc-performance-tools/wxparaver/issues/18
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-07-22 12:59:39 +02:00
9a48ae45bb Update paraver to 4.12.0
Adds a new patch to fix libxml2: the m4 AM_PATH_XML2 macro has been
deprecated and is no longer included in the latest nixpkgs unstable.
Upstream recommends using `PKG_CHECK_MODULES` instead.

Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-07-22 12:57:33 +02:00
ce8b05b142 Replace xeon07 by hut in ssh config
The xeon07 machine has been renamed to hut.

Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-07-21 18:10:08 +02:00
974bb56dc3 Fix lmbench build issues with GCC 14
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-07-21 18:01:36 +02:00
88d4d8e317 Drop tagaspi from bench6
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-21 18:01:20 +02:00
885e04e446 Remove unused inputs from intel compiler 2023
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-21 18:00:39 +02:00
4a5787e0c6 Enable automatic Nix GC in raccoon
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-21 17:58:26 +02:00
6c11093033 Select proprietary NVIDIA driver in raccoon
The NVIDIA GTX 960 from 2016 has the Maxwell architecture, and NixOS
suggests using the proprietary driver for older than Turing:

> It is suggested to use the open source kernel modules on Turing or
> later GPUs (RTX series, GTX 16xx), and the closed source modules
> otherwise.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-21 17:58:21 +02:00
26f52aa27d Use gcc 13 for intel compiler 2023
Intel compiler for C++ (icpc) is not able to parse the location of C++
headers from the output of gcc 14, but works fine for gcc 13.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-21 17:23:30 +02:00
52fe43bfe1 Fix PATH in intel compiler wrapper
We need to add the gcc in the PATH, but adding it directly to $PATH
doesn't work, as it will be restored to $path_backup before icc runs. So
for now we simply inject it to path_backup, but ideally we should find a
more robust solution.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-21 17:23:30 +02:00
f0637b4569 Drop GPI-2 and TAGASPI
GPI-2 fails to build, which is needed for TAGASPI.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-21 17:23:30 +02:00
6ddfea0a3a Use boost 1.86 for paraver
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-07-21 17:23:30 +02:00
e7adef1ffa Fix intel-compiler by ignoring broken symlinks
In the future, we may want to look if those symlinks are needed.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-21 17:23:30 +02:00
e82d3c3b9f Upgrade OmpSs-2 LLVM to 2025.06
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-21 17:23:30 +02:00
4442b6a706 Update TAMPI to 4.1
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-21 17:23:30 +02:00
2d0b014dc7 Update Nanos6 to 4.3
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-21 17:23:30 +02:00
867ba3ec5a Update nOS-V to 3.2.0
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-21 17:23:30 +02:00
2cacc2b265 Update ovni to 1.12.0
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-21 17:23:30 +02:00
e4abd8d8f6 flake.lock: Update
Flake lock file updates:

• Updated input 'nixpkgs':
    'path:/nix/store/2csx2kkb2hxyxhhmg2xs9jfyypikwwk6-source?lastModified=1736867362&narHash=sha256-i/UJ5I7HoqmFMwZEH6vAvBxOrjjOJNU739lnZnhUln8%3D&rev=9c6b49aeac36e2ed73a8c472f1546f6d9cf1addc' (2025-01-14)
  → 'path:/nix/store/zk8v61cpk1wprp9ld5ayc1g5fq4pdkwv-source?lastModified=1752436162&narHash=sha256-Kt1UIPi7kZqkSc5HVj6UY5YLHHEzPBkgpNUByuyxtlw%3D&rev=dfcd5b901dbab46c9c6e80b265648481aafb01f8' (2025-07-13)

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-21 17:23:30 +02:00
750504744f Enable open source NVidia driver in fox
It is recommended for newer versions.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-18 09:57:38 +02:00
c26ec1b6f1 Remove option allowUnfree from fox and raccoon
It is already set to true for all machines.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-18 09:57:21 +02:00
2ef32f773c Ban another scanner trying to connect via SSH
It is constantly spamming out logs:

apex# journalctl | grep 'Connection closed by 84.88.52.176' | wc -l
2255

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-18 09:51:49 +02:00
fc9fcd602a Update weasel IPMI hostname for monitoring
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-18 09:51:21 +02:00
0e37ab5fe1 Remove merged MPICH patch
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-07-16 13:07:12 +02:00
a1b387e454 Remove package ix as it is gone
Fails with: "error: ix has been removed from Nixpkgs, as the ix.io
pastebin has been offline since Dec. 2023".

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-07-16 13:07:06 +02:00
380abe9957 flake.lock: Update
Flake lock file updates:

• Updated input 'agenix':
    'github:ryantm/agenix/f6291c5935fdc4e0bef208cfc0dcab7e3f7a1c41?narHash=sha256-b%2Buqzj%2BWa6xgMS9aNbX4I%2BsXeb5biPDi39VgvSFqFvU%3D' (2024-08-10)
  → 'github:ryantm/agenix/531beac616433bac6f9e2a19feb8e99a22a66baf?narHash=sha256-9P1FziAwl5%2B3edkfFcr5HeGtQUtrSdk/MksX39GieoA%3D' (2025-06-17)
• Updated input 'agenix/darwin':
    'github:lnl7/nix-darwin/4b9b83d5a92e8c1fbfd8eb27eda375908c11ec4d?narHash=sha256-gzGLZSiOhf155FW7262kdHo2YDeugp3VuIFb4/GGng0%3D' (2023-11-24)
  → 'github:lnl7/nix-darwin/43975d782b418ebf4969e9ccba82466728c2851b?narHash=sha256-dyN%2BteG9G82G%2Bm%2BPX/aSAagkC%2BvUv0SgUw3XkPhQodQ%3D' (2025-04-12)
• Updated input 'agenix/home-manager':
    'github:nix-community/home-manager/3bfaacf46133c037bb356193bd2f1765d9dc82c1?narHash=sha256-7ulcXOk63TIT2lVDSExj7XzFx09LpdSAPtvgtM7yQPE%3D' (2023-12-20)
  → 'github:nix-community/home-manager/abfad3d2958c9e6300a883bd443512c55dfeb1be?narHash=sha256-YZCh2o9Ua1n9uCvrvi5pRxtuVNml8X2a03qIFfRKpFs%3D' (2025-04-24)
• Updated input 'bscpkgs':
    'git+https://git.sr.ht/~rodarima/bscpkgs?ref=refs/heads/master&rev=6782fc6c5b5a29e84a7f2c2d1064f4bcb1288c0f' (2024-11-29)
  → 'git+https://git.sr.ht/~rodarima/bscpkgs?ref=refs/heads/master&rev=9d1944c658929b6f98b3f3803fead4d1b91c4405' (2025-06-11)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/9c6b49aeac36e2ed73a8c472f1546f6d9cf1addc?narHash=sha256-i/UJ5I7HoqmFMwZEH6vAvBxOrjjOJNU739lnZnhUln8%3D' (2025-01-14)
  → 'github:NixOS/nixpkgs/dfcd5b901dbab46c9c6e80b265648481aafb01f8?narHash=sha256-Kt1UIPi7kZqkSc5HVj6UY5YLHHEzPBkgpNUByuyxtlw%3D' (2025-07-13)

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-07-16 13:07:01 +02:00
37c12783bb Upgrade nixpkgs to nixos 25.05
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-07-16 13:06:40 +02:00
7379e84e79 Silently ban OpenVAS BSC scanner from apex
It is spamming our logs with refused connection lines:

apex% sudo journalctl -b0 | grep 'refused connection.*SRC=192.168.8.16' | wc -l
13945

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 17:40:41 +02:00
b802f88df9 Rotate anavarro password and SSH key
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 17:24:41 +02:00
bd94c4ad00 Add weasel machine configuration
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 17:24:38 +02:00
570c6e175d Remove extra flush commands on firewall stop
They are not needed as they are already flushed when the firewall
starts or stops.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 11:18:45 +02:00
96661dd0d4 Prevent accidental use of nftables
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 11:18:42 +02:00
28db7799ea Add proxy configuration for internal hosts
Access internal hosts via apex proxy. From the compute nodes we first
open an SSH connection to apex, and then tunnel it through the HTTP
proxy with netcat.

This way we allow reaching internal GitLab repositories without
requiring the user to have credentials in the remote host, while we can
use multiple remotes to provide redundancy.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 11:18:36 +02:00
508059c99e Remove unused blackbox configuration modules
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 11:18:30 +02:00
b9f9cc7d7a Use IPv4 in blackbox probes
Otherwise they simply fail as IPv6 doesn't work.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 11:18:26 +02:00
eae0c7cb59 Make NFS mount async to improve latency
Don't wait to flush writes, as we don't care about consistency on a
crash:

> This option allows the NFS server to violate the NFS protocol and
> reply to requests before any changes made by that request have been
> committed to stable storage (e.g. disc drive).
>
> Using this option usually improves performance, but at the cost that
> an unclean server restart (i.e. a crash) can cause data to be lost or
> corrupted.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 11:18:20 +02:00
2280635cd6 Disable root_squash from NFS
Allows root to read files in the NFS export, so we can directly run
`nixos-rebuild switch` from /home.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 11:18:16 +02:00
16ada09600 Remove SSH proxy to access BSC clusters
We now have direct connection to them.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 11:18:13 +02:00
0d291d715c Add users to apex machine
They need to be able to login to apex to access any other machine from
the SSF rack.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 11:18:09 +02:00
66001f76f7 Remove proxy from hut HTTP probes
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 11:18:04 +02:00
1e3b85067d Remove proxy configuration from environment
All machines have now direct connection with the outside world.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 11:18:00 +02:00
36ee1f3adc Add storcli utility to apex
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 11:17:57 +02:00
25e9c071b0 Add new configuration for apex
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 11:17:43 +02:00
80cee2dbd0 Add pmartin1 user with access to fox
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-03 11:16:43 +02:00
ee92934c74 Add access to fox for rpenacob user
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-02 16:58:53 +02:00
db0f3fed91 Revert "Only allow Vincent to access fox for now"
This reverts commit e9e3704b677baed1649583f25e4e1bc050a9534e.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-02 16:58:49 +02:00
adeaa0484d Add all terminfo files in environment
Fixes problems with the kitty terminal when opening vim or kakoune.

Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-07-02 16:02:45 +02:00
815810830e Monitor Fox BMC with ICMP probes too
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-02 15:51:22 +02:00
7a52e1907c Restrict DAC VPN to fox-ipmi machine only
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-02 15:51:19 +02:00
22a2e1b9e8 Monitor fox via VPN
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-02 15:51:16 +02:00
f29461ae32 Add OpenVPN service to connect to fox BMC
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-02 15:51:13 +02:00
208197f099 Add ac.upc.edu as name search server
Allows referring to fox.ac.upc.edu directly as fox.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-02 15:51:09 +02:00
479ca1b671 Disable kptr_restrict in fox
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-02 15:08:42 +02:00
40529fbdcb Disable NUMA balancing in fox
See: https://www.kernel.org/doc/html/latest/admin-guide/sysctl/kernel.html#numa-balancing

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-02 15:08:02 +02:00
9b0d3fb21e Load amd_uncore module in fox
Needed for L3 events in perf.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-02 15:07:58 +02:00
d8444131d8 Enable SSH X11 forwarding
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-02 15:07:54 +02:00
af540456a6 Disable registration in Gitea
Get rid of all the spam accounts they are trying to register.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:36:18 +02:00
42d6734da8 Enable msmtp configuration in tent
Allows gitea to send notifications via email.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:36:15 +02:00
071a8084a0 Add GitLab runner with debian docker for PM
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:36:13 +02:00
24a0c58592 Monitor nix-daemon in tent
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:36:11 +02:00
810a6dfcec Move nix-daemon exporter to modules
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:36:09 +02:00
47ad89dee1 Add p service for pastes
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:36:07 +02:00
8af1b259f5 Enable public-inbox service in tent
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:36:06 +02:00
560003d4fd Enable gitea in tent
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:36:04 +02:00
68ff45075c Add bsc.es to resolve domain names
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:36:02 +02:00
fc68d16197 Monitor AXLE machine too
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:36:00 +02:00
f6ec1293f4 Use IPv4 for blackbox exporter
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:35:59 +02:00
4feeff978c Add public html files to tent
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:35:57 +02:00
7b19292912 Add docker GitLab runner for BSC GitLab
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:35:55 +02:00
0627db0eb9 Add GitLab shell runner in tent for PM
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:35:54 +02:00
ae2f6dde41 Enable jungle robot emails for Grafana in tent
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:35:52 +02:00
3bf70656dc Add tent key for nix-serve
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:35:50 +02:00
1cf989d727 Remove jungle nix cache from tent
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:35:48 +02:00
19f734e622 Enable nix cache
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:35:47 +02:00
d6e3d9626c Serve Grafana from subpath
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:35:45 +02:00
9c32e42dcc Add nginx server in tent
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:35:43 +02:00
61e6d3232b Add monitoring in tent
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:35:00 +02:00
a87b99d0a4 Update bench6 package to bf29a531
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-16 15:34:35 +02:00
43d32ac16d Use nixpkgs from flake.lock and support attrs when importing bscpkgs
This makes `nix-build` and friends use the current flake lock instead of
the outdated pinned version we had in `./nixpkgs.nix`

With this, `nix-build -A ovni` and `nix build .#ovni` should produce the
same result.

This will fail if the flake nixpkgs input does not come from NixOS/nixpkgs.
We could use edolstra/flake-compat instead, but it's overkill imho.

Additionally, I made default.nix behave like nixpkgs, so that we can
import bscpkgs à la nixpkgs (Apply overlays and other options that nixpkgs
accepts):

```nix
let pkgs = import bscpkgs { inherit system; }; in <...>
```

Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-06-16 12:29:55 +02:00
d0fd8cde46 Disable nix garbage collector in tent
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-06-11 16:05:05 +02:00
5223ea53f6 Rekey secrets with tent keys
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-11 16:04:20 +02:00
253426ce00 Add tent host key and admin keys
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-11 16:04:16 +02:00
df67b6cd26 Create directories in /vault/home for tent users
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-11 16:04:12 +02:00
766da21097 Add software RAID in tent using 3 disks
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-11 16:04:10 +02:00
18461c0d59 Add access to tent to all hut users too
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-11 16:04:06 +02:00
028b151c78 Add hut SSH configuration from outside SSF LAN
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-11 16:04:04 +02:00
7176b066bb Don't use proxy in base preset
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-11 16:04:00 +02:00
c3c3614f63 Add tent machine from xeon04
We moved the tent machine to the server room in the BSC building and is
now directly connected to the raccoon via NAT.

Fixes: #106
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-11 16:03:54 +02:00
e13288fc29 Create specific SSF rack configuration
Allow xeon machines to optionally inherit SSF configuration such as the
NFS mount point and the network configuration.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-11 16:03:49 +02:00
9d1944c658 Upgrade and fix lmbench package
Now it needs libtirpc to provide rpc/rpc.h, as it seems it is gone from
libc. We also fix the install target so it installs the additional
benchmarks.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
Tested-by: Aleix Boné <abonerib@bsc.es>
2025-06-11 16:01:40 +02:00
e9e3704b67 Only allow Vincent to access fox for now
Needed to run benchmarks without interference.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-11 12:08:57 +02:00
7d3c7342ae Use performance governor in fox
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-11 12:08:55 +02:00
8f80ed2cce Add hut as nix cache in fox
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-11 12:08:51 +02:00
d00f996f59 Use extra- for substituters and trusted-public-keys
From the nix manual:

> A configuration setting usually overrides any previous value. However,
> for settings that take a list of items, you can prefix the name of the
> setting by extra- to append to the previous value.

Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-06-11 11:27:37 +02:00
e40fd24f26 Use DHCP for Ethernet in fox
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-11 10:24:53 +02:00
83efd6c876 Use UPC time servers as others are blocked
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-11 10:24:47 +02:00
f0c4206ab8 Create tracing group and add arocanon in raccoon
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-03 11:09:41 +02:00
8b43a6ffb6 Extend perf support in raccoon
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-03 11:09:30 +02:00
2bca10b0e4 Enable nixdebuginfod in raccoon
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-03 10:50:01 +02:00
eec3e27d66 Make raccoon use performance governor
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-03 10:45:35 +02:00
e51ef52721 Enable binfmt emulation in raccoon
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-03 10:45:33 +02:00
9dc67d402f Disable nix garbage collector in raccoon
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-03 10:45:31 +02:00
62ec4e014a Add dbautist user to raccoon machine
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-03 10:45:28 +02:00
4d03842f7c Add node exporter monitoring in raccoon
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-03 10:45:26 +02:00
8fedc5518e Allow X11 forwarding via SSH
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-03 10:45:23 +02:00
43dc336638 Enable linger for user rarias
Allows services to run without a login session.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-03 10:45:19 +02:00
2b08fcd21a Only proxy SSH git remotes via hut in xeon
Other machines like raccoon have direct access.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-03 10:44:31 +02:00
557618d43f Add machine map file
Documents the location, board and serial numbers so we can track the
machines if they move around. Some information is unkown.

Using the Nix language to encode the machines location and properties
allows us to later use that information in the configuration of the
machines themselves.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-02 14:55:58 +02:00
e8ac6cf0f3 Remove fox monitoring via IPMI
We will need to setup an VPN to be able to access fox in its new
location, so for now we simply remove the IPMI monitoring.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-02 11:26:53 +02:00
f8fc391cae Monitor fox, gateway and UPC anella via ICMP
Fox should reply once the machine is connected to the UPC network.
Monitoring also the gateway and UPC anella allows us to estimate if the
whole network is down or just fox.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-02 11:26:51 +02:00
6c1afa3fd8 Update configuration for UPC network
The fox machine will be placed in the UPC network, so we update the
configuration with the new IP and gateway. We won't be able to reach hut
directly so we also remove the host entry and proxy.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-02 11:26:48 +02:00
008584b465 Disable home via NFS in fox
It won't be accesible anymore as we won't be in the same LAN.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-02 11:26:46 +02:00
a22c862192 Rekey all secrets
Fox is no longer able to use munge or ceph, so we remove the key and
rekey them.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-02 11:26:44 +02:00
cd0c070439 Rotate fox SSH host key
Prevent decrypting old secrets by reading the git history.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-02 11:26:42 +02:00
201ff64b25 Distrust fox SSH key
We no longer will share secrets with fox until we can regain our trust.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-02 11:26:38 +02:00
9bee145e25 Remove Ceph module from fox
It will no longer be accesible from the UPC.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-02 11:26:36 +02:00
4528b7c2a6 Remove fox from SLURM
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-02 11:26:20 +02:00
1eac0fcad8 Remove pam_slurm_adopt from fox
We no longer will be able to use SLURM from jungle.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-02 11:26:02 +02:00
8e5b2dc5cc Fix C runtime objects path in OmpSs-2 LLVM
Some gcc versions append an extension to the patch version number, but
this extension is not part of the installation path. This patch removes
the extension to the patch version.

Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-06-02 08:58:19 +02:00
f89cd4d7e2 Remove dangling libomp.so symlink
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-06-02 08:57:23 +02:00
dd15f9c943 Add UPC temperature sensor monitoring
These sensors are part of their air quality measurements, which just
happen to be very close to our server room.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-05-29 13:01:37 +02:00
4048b3327a Add meteocat exporter
Allows us to track ambient temperature changes and estimate the
temperature delta between the server room and exterior temperature.
We should be able to predict when we would need to stop the machines due
to excesive temperature as summer approaches.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-05-29 13:01:29 +02:00
f4229e34f6 Add custom nix-daemon exporter
Allows us to see which derivations are being built in realtime. It is a
bit of a hack, but it seems to work. We simply look at the environment
of the child processes of nix-daemon (usually bash) and then look for
the $name variable which should hold the current derivation being
built. Needs root to be able to read the environ file of the different
nix-daemon processes as they are owned by the nixbld* users.

See: https://discourse.nixos.org/t/query-ongoing-builds/23486
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-05-29 12:57:07 +02:00
5208a3483b Set keep-outputs to true in all machines
From the documentation of keep-outputs, setting it to true would prevent
the GC from removing build time dependencies:

If true, the garbage collector will keep the outputs of non-garbage
derivations. If false (default), outputs will be deleted unless they are
GC roots themselves (or reachable from other roots).

In general, outputs must be registered as roots separately. However,
even if the output of a derivation is registered as a root, the
collector will still delete store paths that are used only at build time
(e.g., the C compiler, or source tarballs downloaded from the network).
To prevent it from doing so, set this option to true.

See: https://nix.dev/manual/nix/2.24/command-ref/conf-file.html#conf-keep-outputs
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-04-22 17:27:37 +02:00
92eacfad20 Add raccoon node exporter monitoring
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-22 14:50:08 +02:00
80309d107b Increase data retention to 5 years
Now that we have more space, we can extend the retention time to 5 years
to hold the monitoring metrics. For a year we have:

	# du -sh /var/lib/prometheus2
	13G     /var/lib/prometheus2

So we can expect it to increase to about 65 GiB. In the future we may
want to reduce some adquisition frequency.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-22 14:50:03 +02:00
d0f151595f Don't forward any docker traffic
Access to the 23080 local port will be done by applying the INPUT rules,
which pass through nixos-fw.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-15 14:16:15 +02:00
93f8d3aa89 Allow traffic from docker to enter port 23080
Before:

  hut% sudo docker run -it --rm alpine /bin/ash -xc 'true | nc -w 3 -v 10.0.40.7 23080'
  + true
  + nc -w 3 -v 10.0.40.7 23080
  nc: 10.0.40.7 (10.0.40.7:23080): Operation timed out

After:

  hut% sudo docker run -it --rm alpine /bin/ash -xc 'true | nc -w 3 -v 10.0.40.7 23080'
  + true
  + nc -w 3 -v 10.0.40.7 23080
  10.0.40.7 (10.0.40.7:23080) open

Fixes: #94
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-15 14:16:10 +02:00
d84645f3e1 Add bscpm04.bsc.es SSH host and public key
Allows fetching repositories from hut and other machines in jungle
without the need to do any extra configuration.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-15 14:15:45 +02:00
55b71d6901 Use hut nix cache in owl1, owl2 and raccoon
For owl1 and owl2 directly connect to hut via LAN with HTTP, but for
raccoon pass via the proxy using jungle.bsc.es with HTTPS. There is no
risk of tampering as packages are signed.

Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-04-15 14:08:17 +02:00
89c65ea578 Clean all iptables rules on stop
Prevents the "iptables: Chain already exists." error by making sure that
we don't leave any chain on start. The ideal solution is to use
iptables-restore instead, which will do the right job. But this needs to
be changed in NixOS entirely.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-15 14:08:14 +02:00
129273e8d8 Make nginx listen on all interfaces
Needed for local hosts to contact the nix cache via HTTP directly.
We also allow the incoming traffic on port 80.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-15 14:08:07 +02:00
fdac196c6c Fix nginx /cache regex
`nix-serve` does not handle duplicates in the path:
```
hut$ curl http://127.0.0.1:5000/nix-cache-info
StoreDir: /nix/store
WantMassQuery: 1
Priority: 30
hut$ curl http://127.0.0.1:5000//nix-cache-info
File not found.
```

This meant that the cache was not accessible via:
`curl https://jungle.bsc.es/cache/nix-cache-info` but
`curl https://jungle.bsc.es/cachenix-cache-info` worked.

Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-04-15 14:08:04 +02:00
3f4b4fb810 Add new GitLab runner for gitlab.bsc.es
It uses docker based on alpine and the host nix store, so we can perform
builds but isolate them from the system.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-08 17:41:18 +02:00
2c7211ffa3 Remove SLURM partition all
We no longer have homogeneous nodes so it doesn't make much sense to
allocate a mix of them.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-08 17:15:27 +02:00
18f25307ab Add varcila user to hut and fox
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-08 17:15:25 +02:00
7c55d10ceb Adjust fox slurm config after disabling SMT
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-08 17:15:23 +02:00
5c549faaa8 Add abonerib user to fox
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-08 17:15:21 +02:00
9fd35a9ce4 Don't move doc in web output
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-08 17:15:19 +02:00
5487a93972 Reject SSH connections without SLURM allocation
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-08 17:15:15 +02:00
fe16ea373f Add users to fox
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-08 17:15:13 +02:00
163434af09 Add dalvare1 user
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-08 17:15:11 +02:00
71164400d4 Mount NVME disks in /nvme{0,1}
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-08 17:15:06 +02:00
f887dacdea Exclude fox from being suspended by slurm
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-08 17:15:04 +02:00
4f5c8dbbaf Use IPMI host names instead of IP addresses
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-08 17:15:01 +02:00
14b192b1d9 Add fox IPMI monitoring
Use agenix to store the credentials safely.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-08 17:14:59 +02:00
2b04812320 Add new fox machine
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-04-08 17:14:42 +02:00
f962816eab Update PM gitlab URL to new server bscpm04.bsc.es
The old server has died, so we move to the new URL at bscpm04.bsc.es.

Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-03-07 15:21:11 +01:00
c4583f787d Fix Nanos6 build from git
The src.rev attribute is not available as it comes from source before
the recursive operator. Instead, simply get it from the function inputs.

Cc: Aleix Boné <aleix.boneribo@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-02-28 13:36:00 +01:00
22e40db034 Add explicit zlib dependency
The stdenv no longer provides it by default.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-01-22 16:05:52 +01:00
501f11a8e5 Merge outputs of MPI in a single directory
Some MPI implementations now have their headers in the dev output as
well as the mpicc wrappers.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-22 16:03:08 +01:00
505f101e00 Update wxGTK30 to wxGTK32 in paraver kernel
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-22 16:03:06 +01:00
f44eebc133 flake.lock: Update
Flake lock file updates:

• Updated input 'nixpkgs':
    'path:/nix/store/z7y28qzhk7driiwcw78k0mb24laknm0f-source?lastModified=1700390070&narHash=sha256-de9KYi8rSJpqvBfNwscWdalIJXPo8NjdIZcEJum1mH0%3D&rev=e4ad989506ec7d71f7302cc3067abd82730a4beb' (2023-11-19)
  → 'path:/nix/store/2csx2kkb2hxyxhhmg2xs9jfyypikwwk6-source?lastModified=1736867362&narHash=sha256-i/UJ5I7HoqmFMwZEH6vAvBxOrjjOJNU739lnZnhUln8%3D&rev=9c6b49aeac36e2ed73a8c472f1546f6d9cf1addc' (2025-01-14)

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-22 16:02:49 +01:00
2f6f6ba703 Update PM GitLab tokens to new URL
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 15:43:13 +01:00
371b0c7e76 Fix MPICH build by fetching upstream patches too
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 15:43:13 +01:00
ae34eacf4a flake.lock: Update
Flake lock file updates:

• Updated input 'agenix':
    'github:ryantm/agenix/de96bd907d5fbc3b14fc33ad37d1b9a3cb15edc6' (2024-07-09)
  → 'github:ryantm/agenix/f6291c5935fdc4e0bef208cfc0dcab7e3f7a1c41' (2024-08-10)
• Updated input 'bscpkgs':
    'git+https://git.sr.ht/~rodarima/bscpkgs?ref=refs/heads/master&rev=de89197a4a7b162db7df9d41c9d07759d87c5709' (2024-04-24)
  → 'git+https://git.sr.ht/~rodarima/bscpkgs?ref=refs/heads/master&rev=6782fc6c5b5a29e84a7f2c2d1064f4bcb1288c0f' (2024-11-29)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/693bc46d169f5af9c992095736e82c3488bf7dbb' (2024-07-14)
  → 'github:NixOS/nixpkgs/9c6b49aeac36e2ed73a8c472f1546f6d9cf1addc' (2025-01-14)

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 15:43:13 +01:00
dab6f08d89 Set nixpkgs to track nixos-24.11
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 15:43:13 +01:00
8190523c30 Add script to monitor GPFS
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 15:43:07 +01:00
d335d69ba6 Add BSC machines to ssh config
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 14:23:51 +01:00
cec49eb5fc Collect statistics from logged users
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 14:23:48 +01:00
22db38c98f Add custom GPFS exporter for MN5
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 14:23:46 +01:00
0d4eebbb59 Remove exception to fetch task endpoint
It causes the request to go to the website rather than the Gitea
service.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 14:23:43 +01:00
025f6a0c0c Use SSD for boot, then switch to NVME
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 14:23:40 +01:00
abc74c5445 Use NVME as root
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 14:23:37 +01:00
6942f09f69 Keep host header for Grafana requests
This was breaking requests due to CSRF check.

See: https://github.com/grafana/grafana/issues/45117#issuecomment-1033842787
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 14:23:32 +01:00
56f6855af7 Ignore logging requests from the gitea runner
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 14:23:28 +01:00
81c822e68e Log the client IP not the proxy
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 14:23:22 +01:00
53e80b1f19 Ignore misc directory
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 14:23:19 +01:00
21feb01e7b Create paste directories in /ceph/p
Ensure that all hut users have a paste directory in /ceph/p owned by
themselves. We need to wait for the ceph mount point to create them, so
we use a systemd service that waits for the remote-fs.target.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 14:23:16 +01:00
9ea7b2b475 Add p command to paste files
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 14:23:10 +01:00
fce4d89e1d Use nginx to serve website and other services
Instead of using multiple tunels to forward all our services to the VM
that serves jungle.bsc.es, just use nginx to redirect the traffic from
hut. This allows adding custom rules for paths that are not posible
otherwise.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 14:23:07 +01:00
6b282375f8 Mount the NVME disk in /nvme
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-01-16 14:22:58 +01:00
6782fc6c5b Add cacheline parameter to nOS-V
By default it is set to 64 bits. The cacheline parameter is required
when cross-compiling nOS-V, as it cannot be read from the build machine.

Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2024-11-29 09:16:03 +01:00
73550ad5a9 Remove unneeded NODES dependencies
The autoreconfHook helper already provides autotools binaries. Also NODES
no longer uses papi.

Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2024-11-29 09:16:03 +01:00
48d67ef6c2 Fix NODES native dependencies
Move NODES build tools to nativeBuildInputs. This is needed for
cross-compilation, given that build tools must much the build system.

Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2024-11-29 09:16:03 +01:00
Raúl Peñacoba
73e30d20e9 Python is needed in openmp now
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2024-11-29 09:09:27 +01:00
5f85082553 Update sonar to 1.0.1
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-11-27 16:10:55 +01:00
46f15ac201 Update LLVM to 2024.11
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-11-27 16:10:52 +01:00
b442ddf1a4 Update Nanos6 to 4.2
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-11-27 16:10:50 +01:00
b006538147 Update TAMPI to 4.0
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-11-27 16:09:15 +01:00
995aa0b2e2 Update NODES to 1.3
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-11-27 16:09:12 +01:00
896ec0ad0f Update nOS-V to 3.1.0
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-11-27 16:09:09 +01:00
2d9d2701a9 Update ovni to 1.11.0
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-11-27 16:08:50 +01:00
74e11db8b6 Only enable MPI in ovni on native builds
Tested with:

hut% nix build .#bsc-ci.all
hut% nix build .#pkgsCross.riscv64.ovni

Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
Reviewed-by: Aleix Boné <aleix.boneribo@bsc.es>
2024-10-28 13:42:19 +01:00
e046363e52 nos-v: fix cross compilation
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2024-10-28 13:40:35 +01:00
aa3f816388 ovni: fix cross compilation
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2024-10-28 13:40:33 +01:00
3eff2662bb paraver: install manpages
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2024-10-28 13:39:52 +01:00
260986b9f2 Delay nix-gc until /home is mounted
Prevents starting the garbage collector before the remote FS are
mounted, in particular /home. Otherwise, all the gcroots which have
symlinks in /home will be considered stale and they will be removed.

See: #79
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-09-20 09:45:30 +02:00
15afbe94bd Add dbautist user with access to hut
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-09-20 09:42:02 +02:00
efd35a9cd1 Set the serial console to ttyS1 in raccoon
Apparently the ttyS0 console doesn't exist but ttyS1 does:

  raccoon% sudo stty -F /dev/ttyS0
  stty: /dev/ttyS0: Input/output error
  raccoon% sudo stty -F /dev/ttyS1
  speed 9600 baud; line = 0;
  -brkint -imaxbel

The dmesg line agrees:

  00:03: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A

The console configuration is then moved from base to xeon to allow
changing it for the raccoon machine.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:56 +02:00
50ad1d637c Remove setLdLibraryPath and driSupport options
They have been removed from NixOS. The "hardware.opengl" group is now
renamed to "hardware.graphics".

See: 98cef4c273
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:53 +02:00
c299d53146 Add documentation section about GRUB chain loading
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:47 +02:00
152b71e718 Add 10 min shutdown jitter to avoid spikes
The shutdown timer will fire at slightly different times for the
different nodes, so we slowly decrease the power consumption.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:44 +02:00
0911d5b92a Don't mount the nix store in owl nodes
Initially we planned to run jobs in those nodes by sharing the same nix
store from hut. However, these nodes are now used to build packages
which are not available in hut. Users also ssh to the nodes, which
doesn't mount the hut store, so it doesn't make much sense to keep
mounting it.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:42 +02:00
5ddae068af Emulate other architectures in owl nodes too
Allows cross-compilation of packages for RISC-V that are known to try to
run RISC-V programs in the host.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:39 +02:00
d17be714ec Program shutdown for August 2nd for all machines
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:36 +02:00
28ce15d74d Enable debuginfod daemon in owl nodes
WARNING: This will introduce noise, as the daemon wakes up from time to
time to check for new packages.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:30 +02:00
504f9bb570 Set gitea and grafana log level to warn
Prevents filling the journal logs with information messages.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:27 +02:00
f158cb63e8 Set default SLURM job time limit to one hour
Prevents enless jobs from being left forever, while allow users to
request a larger time limit.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:24 +02:00
8860f76cad Allow other jobs to run in unused cores
The current select mechanism was using the memory too as a consumable
resource, which by default only sets 1 MiB per node. As each job already
requests 1 MiB, it prevents other jobs from running.

As we are not really concerned with memory usage, we only use the unused
cores in the select criteria.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:22 +02:00
b86798cd69 Use authentication tokens for PM GitLab runner
Starting with GitLab 16, there is a new mechanism to authenticate the
runners via authentication tokens, so use it instead.  Older tokens and
runners are also removed, as they are no longer used.

With the new way of managing tokens, both the tags and the locked state
are managed from the GitLab web page.

See: https://docs.gitlab.com/ee/ci/runners/new_creation_workflow.html
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:16 +02:00
7ed74931cf flake.lock: Update
Flake lock file updates:

• Updated input 'agenix':
    'github:ryantm/agenix/1381a759b205dff7a6818733118d02253340fd5e' (2024-04-02)
  → 'github:ryantm/agenix/de96bd907d5fbc3b14fc33ad37d1b9a3cb15edc6' (2024-07-09)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/6143fc5eeb9c4f00163267708e26191d1e918932' (2024-04-21)
  → 'github:NixOS/nixpkgs/693bc46d169f5af9c992095736e82c3488bf7dbb' (2024-07-14)

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:13 +02:00
6e9d33b483 Allow ptrace to any process of the same user
Allows users to attach GDB to their own processes, without requiring
running the program with GDB from the start. It is only available in
compute nodes, the storage nodes continue with the restricted settings.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:09 +02:00
58abaefbc4 Add abonerib user to hut, raccon, owl1 and owl2
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:07 +02:00
5ea7827a8a Grant rpenacob access to owl1 and owl2 nodes
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:05 +02:00
b17e4a13f9 Access private repositories via hut SSH proxy
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:03 +02:00
9c4e60c2c2 Set the default proxy to point to hut
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:35:56 +02:00
e7376917bd Allow incoming traffic to hut proxy
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:35:23 +02:00
130e191d37 eudy: koro: fcs: Fix fcs unprotected cpuid all
smp_processor_id() was called in a preepmtible context, which could
invalidate the returned value. However, this was not harmful, because
fcs threads in nosv are pinned.

Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2024-07-17 11:40:20 +02:00
349f69e30a Add support for armv7 emulation in hut
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-07-17 11:12:48 +02:00
59ab6405c5 Monitor raccoon machine via IPMI
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-07-17 11:12:32 +02:00
a0dab66aa5 Move vlopez user to jungleUsers for koro host
Access to other machines can be easily added into the "hosts" attribute
without the need to replicate the configuration.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-07-16 12:35:39 +02:00
525cad4117 Add raccoon motd file
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-07-16 12:35:38 +02:00
24ee74d614 Split xeon specific configuration from base
To accomodate the raccoon knights workstation, some of the configuration
pulled by m/common/main.nix has to be removed. To solve it, the xeon
specific parts are placed into m/common/xeon.nix and only the common
configuration is at m/common/base.nix.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-07-16 12:35:37 +02:00
15b4b28d2c Control user access to each machine
The users.jungleUsers configuration option behaves like the users.users
option, but defines the list attribute `hosts` for each user, which
filters users so that only the user can only access those hosts.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-07-16 12:35:34 +02:00
b1ce302e4b Add PostgreSQL DB for performance test results
The database will hold the performance results of the execution of the
benchmarks. We follow the same setup on knights3 for now.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-07-16 12:35:24 +02:00
b8b85f55cd Enable Grafana email alerts
Allows sending Grafana alerts via email too, so we have a reduntant
mechanism in case Slack fails to deliver them.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-05-31 15:57:38 +02:00
1189626a6f Enable mail notification in Gitea
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-05-31 10:56:49 +02:00
dbd95dd7b8 Add msmtp to send notifications via email
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-05-31 10:56:20 +02:00
81b680a7d2 Allow Ceph traffic to lake2 2024-05-02 17:43:48 +02:00
ba60e121df Collect Gitea metrics in Prometheus
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-05-02 17:32:25 +02:00
432e6c8521 Add Gitea service
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-05-02 17:31:51 +02:00
c8160122b3 Add firewall rules for Ceph and monitoring
The firewall was blocking the monitoring traffic from hut and the Ceph
traffic among OSDs. The rules only allow connecting from the specific
host that they are supposed to be coming from.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-04-25 13:25:11 +02:00
3863fc25a5 Add workaround for MPICH 4.2.0
See: https://github.com/pmodels/mpich/issues/6946

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-04-25 13:25:08 +02:00
2b26cd2f46 Fix SLURM bug in rank integer sign expansion
See: https://bugs.schedmd.com/show_bug.cgi?id=19324

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-04-25 13:25:05 +02:00
30f2079f0b Merge pmix outputs for MPICH
MPICH expects headers and libraries to be present in the same directory.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-04-25 13:25:03 +02:00
366436b6d3 Remove nixseparatedebuginfod input
It has been integrated in nixpkgs, so is no longer required.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-04-25 13:24:58 +02:00
9f1cd02144 flake.lock: Update
Flake lock file updates:

• Updated input 'agenix':
    'github:ryantm/agenix/daf42cb35b2dc614d1551e37f96406e4c4a2d3e4' (2023-10-08)
  → 'github:ryantm/agenix/1381a759b205dff7a6818733118d02253340fd5e' (2024-04-02)
• Updated input 'agenix/darwin':
    'github:lnl7/nix-darwin/87b9d090ad39b25b2400029c64825fc2a8868943' (2023-01-09)
  → 'github:lnl7/nix-darwin/4b9b83d5a92e8c1fbfd8eb27eda375908c11ec4d' (2023-11-24)
• Updated input 'agenix/home-manager':
    'github:nix-community/home-manager/32d3e39c491e2f91152c84f8ad8b003420eab0a1' (2023-04-22)
  → 'github:nix-community/home-manager/3bfaacf46133c037bb356193bd2f1765d9dc82c1' (2023-12-20)
• Added input 'agenix/systems':
    'github:nix-systems/default/da67096a3b9bf56a91d16901293e51ba5b49a27e' (2023-04-09)
• Updated input 'bscpkgs':
    'git+https://git.sr.ht/~rodarima/bscpkgs?ref=refs/heads/master&rev=e148de50d68b3eeafc3389b331cf042075971c4b' (2023-11-22)
  → 'git+https://git.sr.ht/~rodarima/bscpkgs?ref=refs/heads/master&rev=de89197a4a7b162db7df9d41c9d07759d87c5709' (2024-04-24)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/e4ad989506ec7d71f7302cc3067abd82730a4beb' (2023-11-19)
  → 'github:NixOS/nixpkgs/6143fc5eeb9c4f00163267708e26191d1e918932' (2024-04-21)
• Updated input 'nixseparatedebuginfod':
    'github:symphorien/nixseparatedebuginfod/232591f5274501b76dbcd83076a57760237fcd64' (2023-11-05)
  → 'github:symphorien/nixseparatedebuginfod/98d79461660f595637fa710d59a654f242b4c3f7' (2024-03-07)
• Removed input 'nixseparatedebuginfod'
• Removed input 'nixseparatedebuginfod/flake-utils'
• Removed input 'nixseparatedebuginfod/flake-utils/systems'
• Removed input 'nixseparatedebuginfod/nixpkgs'

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-04-25 13:24:29 +02:00
de89197a4a Add bigotes package
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2024-04-24 17:59:24 +02:00
5d3820631a Fix Nanos6 4.0 build
It looks like after upgrading the compiler the build breaks. The patch
simply adds the missing cstdint include, until a new release is made.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2024-04-23 12:03:49 +02:00
9c8a077828 Enable separatedebuginfo for openmp
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2024-04-12 12:27:54 +02:00
fce556cb28 Build OmpSs-2 LLVM with commit version information
Allows users to see which commit (or git tag) was used in clang.
Examples for the release and git versions:

% clang --version
clang version 18.0.0 (18.0.0-ompss-2)

% clang --version
clang version 18.0.0 (0a6d6c6)

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2024-03-14 14:13:14 +01:00
82ccae1315 Use google.com probe instead of bsc.es
The main website of the BSC is failing every day around 3:00 AM for
almost one hour, so it is not a very good target. Instead, google.com is
used which should be more reliable. The same robots.txt path is fetched,
as it is smaller than the main page.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-03-05 16:52:21 +01:00
37ce5ef391 Paraver: Use wrapGApps Hook for nixos
Needed to fix the error on NixOS:

  GLib-GIO-ERROR **: No GSettings schemas are installed on the system

See https://github.com/NixOS/nixpkgs/issues/16285

Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2024-02-15 16:06:29 +01:00
1df80460d2 Add another HTTPS probe for bsc.es
As all other HTTPS probes pass through the opsproxy01.bsc.es proxy, we
cannot detect a problem in our proxy or in the BSC one. Adding another
target like bsc.es that doesn't use the ops proxy allows us to discern
where the problem lies.

Instead of monitoring https://www.bsc.es/ directly, which will trigger
the whole Drupal server and take a whole second, we just fetch robots.txt
so the overhead on the server is minimal (and returns in less than 10 ms).

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2024-02-13 12:26:56 +01:00
7f17fe8874 Move slurm client in a separate module
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2024-02-13 11:11:17 +01:00
Raúl Peñacoba
3b21a32d83 Add ovni to OpenMP-V
Allows building OpenMP-V with ovni support, which is neccessary to run
the runtime tests of OpenMP-V in ovni.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2024-01-15 10:20:46 +01:00
5880a6e5f6 Enable public-inbox at jungle.bsc.es/lists
The public-inbox service fetches emails from the sourcehut mailing lists
and displays them on the web. The idea is to reduce the dependency on
external services and add a secondary storage for the mailing lists in
case sourcehut goes down or changes the current free plans.

The service is available in https://jungle.bsc.es/lists/ and is open to
the public. It currently mirrors the bscpkgs and jungle mailing list.

We also edited the CSS to improve the readability and have larger fonts
by default.

The service for public-inbox produced by NixOS is not well configured to
fetch emails from an IMAP mail server, so we also manually edit the
service file to enable the network.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-12-15 11:18:08 +01:00
c4d5135fde Split openmp versions in separate derivations
The openmp derivation provides both libomp and libompv. To avoid
accidentally linking with the wrong library and to avoid the nosv
dependency on libomp, this patch separates each version in a different
derivation.

Also, it adapts the clang wrappers and stdenvs to provide an stdenv per
openmp library where each openmp will be used by default when the
compiler flag "-fopenmp" is used. This eases linking ompv with nixpkgs
libraries, such as blis, that expect openmp to be provided with stdenv.

Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2023-12-07 18:01:20 +01:00
3f2b9a766b Update clang with internal bug release
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2023-12-07 18:01:08 +01:00
ecbb45d6ac Monitor https://pm.bsc.es/gitlab/ too
The GitLab instance is in the /gitlab endpoint and may fail
independently of https://pm.bsc.es/.

Cc: Víctor López <victor.lopez@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-12-05 09:56:28 +01:00
c564d945d4 Enable nixseparatedebuginfod module
The module is only enabled on Hut and Eudy because we noticed activity
on the debuginfod service even if no debug session was active.

Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2023-12-04 11:04:52 +01:00
Raúl Peñacoba
3ed644b88f Add clangNosvOpenmp-ld compiler test
Add a test to verify that "clang -fopenmp=libompv" links correctly with
nOS-V even though it is not placed in the buildInputs.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2023-12-01 16:35:24 +01:00
Raúl Peñacoba
8ceaddfea7 Split OpenMP from Clang in LLVM
As the OpenMP-V implementation requires to be built with nOS-V, we can
split the OpenMP package in a different derivation to prevent rebuilds
of clang. Additionally, as OpenMP-V now can be build alongside the
vanilla OpenMP runtime, we simply build a single openmp derivation with
both runtimes. Only a single build of the clang compiler is now
required.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2023-12-01 16:31:56 +01:00
2a953d811c Build TAMPI with ovni support
By default we build TAMPI with ovni support, as it will be disabled in
runtime unless explicitly enabled by the TAMPI_INSTRUMENT=ovni
environment variable.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-11-29 17:49:30 +01:00
fec4ddf6ab Merge TAMPI release and git in the same file
In order to reduce duplicate information we just place the two sources
in the same file.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-11-29 17:49:22 +01:00
9aaea0da0e Update paraver: 4.10.6 -> 4.11.2
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-11-29 12:19:27 +01:00
ed887b0412 Use tmpfs in /tmp
The /tmp directory was using the SSD disk which is not erased across
boots. Nix will use /tmp to perform the builds, so we want it to be as
fast as possible. In general, all the machines have enough space to
handle large builds like LLVM.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-11-28 12:25:50 +01:00
f55b48ec86 Update ompv test with -fopenmp=libompv flag
Reviewed-By: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
Tested-By: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2023-11-24 16:48:09 +01:00
1e52075c18 TAGASPI 2023.11 update and move to public repo
Reviewed-By: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2023-11-24 16:47:52 +01:00
062b1c3c77 Mercurium 2023.11 update
Reviewed-By: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2023-11-24 16:47:42 +01:00
1520eaa64e TAMPI 2023.11 update
Reviewed-By: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2023-11-24 16:47:38 +01:00
54b4448e4b Ovni 2023.11 update
Reviewed-By: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2023-11-24 16:47:27 +01:00
28a7496fbd nOS-V 2023.11 update
Reviewed-By: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2023-11-24 16:47:22 +01:00
b5dae25e7f NODES 2023.11 update
Reviewed-By: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2023-11-24 16:47:18 +01:00
bdc3670ccc Nanos6 2023.11 update
Reviewed-By: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2023-11-24 16:47:09 +01:00
af590d5ace Clang 2023.11 update
Clang nos-v-merge branch has been merged into master.

Reviewed-By: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2023-11-24 16:47:02 +01:00
8a31895e48 Update nixpkgs commit in default.nix
Reviewed-By: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2023-11-24 16:46:54 +01:00
d9ae85ce4b Add a more strict test for OpenMP+nOS-V
In this test we ensure that the worksharing region is running inside a
nOS-V task, so we know that we are not using the vanilla OpenMP by
accident.

We also keep the previous test test/compilers/clang-openmp.nix as-is, so
we can check that the compiler injects the nosv library dependency in
the final binary on its own.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-11-24 14:52:23 +01:00
20ded0c0df Change the GPI-2 URL to a public repository
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-11-24 14:49:03 +01:00
fe1d3fbb80 Enable runners for pm.bsc.es/gitlab too
The old runners for the PM gitlab were disabled in configuration in the
last outage, but they remained working until we reboot the node. With
this change we enable the runners for both PM and gitlab.bsc.es.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-11-24 14:45:23 +01:00
5234ca32fd Remove complete ceph package from hut
Only the ceph-client is needed.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-11-24 12:58:54 +01:00
cfe0c0e6e6 Fix warning in slurm exporter using vendorHash
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-11-24 12:58:50 +01:00
7afe7344ac Remove old Ceph package overlay
The Ceph package is now integrated in upstream nixpkgs.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-11-24 12:58:47 +01:00
bd83ca53ab flake.lock: Update
Flake lock file updates:

• Updated input 'agenix':
    'github:ryantm/agenix/d8c973fd228949736dedf61b7f8cc1ece3236792' (2023-07-24)
  → 'github:ryantm/agenix/daf42cb35b2dc614d1551e37f96406e4c4a2d3e4' (2023-10-08)
• Updated input 'bscpkgs':
    'git+https://git.sr.ht/~rodarima/bscpkgs?ref=refs/heads/master&rev=f605f8e5e4a1f392589f1ea2b9ffe2074f72a538' (2023-10-31)
  → 'git+https://git.sr.ht/~rodarima/bscpkgs?ref=refs/heads/master&rev=e148de50d68b3eeafc3389b331cf042075971c4b' (2023-11-22)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/e56990880811a451abd32515698c712788be5720' (2023-09-02)
  → 'github:NixOS/nixpkgs/e4ad989506ec7d71f7302cc3067abd82730a4beb' (2023-11-19)

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-11-24 12:57:44 +01:00
e148de50d6 Remove Paraver fast
The build is broken and the official Paraver already merged support
for fast trace loading.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-11-22 15:38:35 +01:00
ff34ab5732 Enable NIX_DEBUG = 1 in the clang-ompss2 test
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-11-22 15:38:31 +01:00
3f17a489ef Fix clang build adding rpath to libstdc++
The binaries generated during the build process of clang are missing the
RPATH of the libstdc++.so library, which is provided by gcc libs.
Similarly, the clang binary itself also needs the rpath to the
libstdc++.so library path.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-11-22 15:38:28 +01:00
501a92376b Fix GPI-2 using an unified rdma-core
The libraries and includes are no longer in the default output, so we
merge them in a single directory using symlinkJoin.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-11-22 15:38:17 +01:00
4033854014 Disable nix-wrap as it is broken
The pkgsStatic.libcap dependency fails to build.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-11-22 15:38:13 +01:00
e6b4af4b16 Rename pkgconfig to pkg-config
The alias pkgconfig has been removed.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-11-22 15:38:05 +01:00
cbf6f03a84 flake.lock: Update
Flake lock file updates:

• Updated input 'nixpkgs':
    'path:/nix/store/s4jqyj35hii03rs7j5n6vn7gpgp6ja81-source?lastModified=1692447944&narHash=sha256-fkJGNjEmTPvqBs215EQU4r9ivecV5Qge5cF/QDLVn3U%3D&rev=d680ded26da5cf104dd2735a51e88d2d8f487b4d' (2023-08-19)
  → 'path:/nix/store/z7y28qzhk7driiwcw78k0mb24laknm0f-source?lastModified=1700390070&narHash=sha256-de9KYi8rSJpqvBfNwscWdalIJXPo8NjdIZcEJum1mH0%3D&rev=e4ad989506ec7d71f7302cc3067abd82730a4beb' (2023-11-19)

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-11-22 15:37:46 +01:00
4316e7b12d Enable separatedebuginfo for common BSC packages
For now, we keep dontStrip for packages that already had it for systems
without the separatedebuginfo support.

Reviewed-By: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
Tested-By: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2023-11-20 16:36:34 +01:00
e7bdc1595a Fix openmp.nix being called with callPackage
callPackage was overriding the inner callPackage override, which made
overriding the clang derivation through the override function impossible.

Reviewed-By: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
Tested-By: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2023-11-15 16:22:53 +01:00
f0f6b7c354 Enable dontStrip on clang if enableDebug is set
Reviewed-By: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2023-11-15 16:22:13 +01:00
0086b9452a Fix llvm postInstall
Some llvm versions do not generate the intel and gomp support libraries
and the post install script fails because it cannot remove them. This
patch makes removal optional.

Reviewed-By: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2023-11-15 16:21:47 +01:00
4111b22f57 Fix clang link flag typo
Reviewed-By: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
Tested-By: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2023-11-14 17:09:23 +01:00
Raúl Peñacoba
85c70a8d6b Always enable assertions in OmpSs-2 LLVM
There are important assertions for OmpSs-2 to catch early bugs. Building
without asserts enabled causes warnings due to unused variables.

Reviewed-By: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
Tested-By: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2023-11-09 16:57:37 +01:00
0d9c99a24e BSC packages are no longer in bsc attribute
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-11-09 13:40:48 +01:00
db98b1f698 flake.lock: Update
Flake lock file updates:

• Updated input 'bscpkgs':
    'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=3a4062ac04be6263c64a481420d8e768c2521b80' (2023-09-14)
  → 'git+https://git.sr.ht/~rodarima/bscpkgs?ref=refs/heads/master&rev=f605f8e5e4a1f392589f1ea2b9ffe2074f72a538' (2023-10-31)

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-11-09 13:40:48 +01:00
84c4b6b81c Switch bscpkgs URL to sourcehut
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-11-09 13:40:48 +01:00
f605f8e5e4 Add clang openmp test for CI
Reviewed-By: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
Tested-By: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2023-10-31 16:52:55 +01:00
8d5714c67b Move nixpkgs reference to its own expression
Reviewed-By: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2023-10-31 16:52:55 +01:00
4727c98354 Remove jemalloc dep from NODES
Reviewed-By: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2023-10-31 16:52:55 +01:00
bb1de835f7 Add clang with nosv-powered OpenMP
Reviewed-By: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2023-10-31 16:52:55 +01:00
ebeb2ff549 Set NOSV_HOME for clang wrapped with nodes
This is needed since nosv must appear as a 1rst level dependency on the
final executable. Clang will add the dependency as long as it knows
where to find nosv (and nodes is used).

Reviewed-By: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2023-10-31 16:52:55 +01:00
9f245946d7 Build NODES with clang dependency if tests enabled
Reviewed-By: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2023-10-31 16:52:55 +01:00
19e195b894 Monitor anella instead of gw.bsc.es
The target gw.bsc.es doesn't reply to our ICMP probes from hut. However,
the anella hop in the tracepath is a good candidate to identify cuts
between the login and the provider and between the provider and external
hosts like Google or Cloudflare DNS.

Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-27 12:46:08 +02:00
54c2bd119f Add ICMP probes
These probes check if we can reach several targets via ICMP, which is
not proxied, so they can be used to see if ICMP forwarding is working in
the login node.

In particular, we test if we can reach the Google (8.8.8.8) and
Cloudflare (1.1.1.1) DNS servers, the BSC gateway which responds to ping
only from the intranet and the login node (ssfhead).

Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-25 17:13:03 +02:00
e5d85c1b38 Enable proxy for Grafana too
The alerts need to contact the slack endpoint, so we add the proxy
environment variables to the grafana systemd service.

Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-25 16:55:56 +02:00
f1486b84c1 Make blackbox exporter use the proxy
By default it was trying to reach the targets using the default gateway,
but since the electrical cut of 2023-10-20, the login node has not
enabled forwarding again. So better if we don't rely on it.

Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-25 16:55:24 +02:00
51e331a9d9 Update sonar to 0.2.0 and use GitHub
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-25 16:06:25 +02:00
472f4b0334 Don't log SLURM connection attempts from ssfhead 2023-10-06 15:22:04 +02:00
425dca3e00 Add docker runner too 2023-10-06 15:17:07 +02:00
e4080cf931 Monitor gitlab.bsc.es too 2023-10-06 15:17:07 +02:00
fc9285f89d Monitor PM webpage via blackbox 2023-10-06 15:17:07 +02:00
fbe238f5b6 Temporarily disable pm runners 2023-10-06 15:17:07 +02:00
9874da566d Add runner for gitlab.bsc.es 2023-10-06 15:17:07 +02:00
91cdc91738 Fetch sonar tag from refs/tags 2023-10-06 14:52:12 +02:00
db391ee9c2 Enable verbose output in nix build for CI 2023-10-06 14:39:47 +02:00
bab7a45587 Fix commit for GPI-2 and tagaspi 2023-10-06 14:34:03 +02:00
8731a4797d Enable packages served by PM gitlab 2023-10-06 14:24:57 +02:00
9e889884c9 Don't build ovni in verbose mode 2023-10-05 08:01:43 +02:00
5412e14dba Patch shebangs in ovni runners 2023-10-04 13:44:03 +02:00
41a93cd176 Enable verbose build and tests for ovni 2023-10-04 13:40:02 +02:00
873d2f1abc Enable tests in ovni 2023-10-04 13:31:55 +02:00
867e61acde Remove --rebuild flag 2023-10-04 12:45:51 +02:00
7ace376e4e Also define no RT clang stdenv 2023-10-04 12:43:47 +02:00
ce4b196010 Remove CONTRIBUTING file 2023-10-03 12:26:26 +02:00
f9c832654e Remove NOISE file 2023-10-03 12:25:45 +02:00
4533c94b4f Remove garlic from bscpkgs 2023-10-03 12:24:58 +02:00
7b72b38023 Remove garlic from README 2023-10-03 12:23:01 +02:00
779247691f Add metadata for Nanos6 2023-10-03 10:00:34 +02:00
c724ad2ad3 Remove old CI derivation 2023-10-02 11:17:01 +02:00
2a3b269b9c Mark packages affected by PM GitLab 2023-10-02 11:05:55 +02:00
7f3d3b953d Always rebuild CI target 2023-10-02 10:57:53 +02:00
0184f5e382 Print list of CI paths when building 2023-10-02 10:57:53 +02:00
916e4f49a6 Move packages from bsc/ to pkgs/ 2023-10-02 10:57:53 +02:00
8fe7458969 Remove deprecated pkgs and improve CI 2023-10-02 10:57:53 +02:00
be25283da5 Update mcxx to 2023.05 2023-10-02 10:57:53 +02:00
1864c08c95 Disable packages from PM GitLab while is down 2023-10-02 10:57:53 +02:00
bead8aea0a Run the tests from the jungle flake 2023-09-28 11:28:00 +02:00
dd802e2ec9 Use flakes for the CI build command 2023-09-28 09:16:04 +02:00
8dbd1a3c34 Port clang and intel packages and enable tests 2023-09-28 09:15:34 +02:00
ce7238c780 Remove tracing output from intel packages 2023-09-28 09:15:09 +02:00
552ebdbede Export the runtime home for clang if given 2023-09-28 09:14:36 +02:00
ebc5c4d84f Allow anonymous access to grafana 2023-09-22 10:51:30 +02:00
8634a9e133 Remove user/group when using DynamicUsers 2023-09-22 10:13:06 +02:00
0ce79ed79e Set the SLURM_CONF variable 2023-09-21 22:22:00 +02:00
5f492ee1d7 Enable slurm-exporter service 2023-09-21 21:40:02 +02:00
9071a4de8b Add prometheus-slurm-exporter package 2023-09-21 21:34:18 +02:00
3040a803b2 Mount the hut nix store for SLURM jobs 2023-09-20 19:38:43 +02:00
70a9e855cf Enable direnv integration 2023-09-20 09:32:58 +02:00
51dcc6896e Begin moving bsc packages to root attribute 2023-09-19 10:33:32 +02:00
fd766d8ff8 Don't build nanos6 with debug symbols by default 2023-09-15 19:05:55 +02:00
aa64e9ef24 Remove bscpkgs from the registry and nixPath
This is done to prevent accidental evaluations where the nixpkgs input
of bscpkgs is still pointing to a different version that the one
specified in the jungle flake. Instead use jungle#bscpkgs.X to get a
package from bscpkgs.
2023-09-15 12:00:33 +02:00
ba2b74fd5a Add bscpkgs and nixpkgs top level attributes
Allows the evaluation of packages of the intermediate overlays.
2023-09-15 12:00:33 +02:00
1ae5d9e25e Use hut packages as the default package set
Allows the user to directly access nixpkgs and bscpkgs from the top
level as `nix build jungle#htop` and `nix build jungle#bsc.ovni`.
2023-09-15 12:00:28 +02:00
ff98ba47c4 Don't fetch registry flakes from the net 2023-09-15 12:00:28 +02:00
599b23ef52 flake.lock: Update
Flake lock file updates:

• Updated input 'bscpkgs':
    'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=6122fef92701701e1a0622550ac0fc5c2beb5906' (2023-09-07)
  → 'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=3a4062ac04be6263c64a481420d8e768c2521b80' (2023-09-14)
2023-09-15 11:50:47 +02:00
3a4062ac04 Revert "Remove flake-lock file"
This reverts commit a3e1047f515e90292eadbe9dbc5f3aa00ba87730.
2023-09-14 18:21:50 +02:00
a3e1047f51 Remove flake-lock file 2023-09-14 17:54:32 +02:00
8dbee06d1d Revert "Update slurm to 23.02.05.1"
This reverts commit 7bfd786c01c36131cd00b90fc6a9503fd1226578.
2023-09-14 15:46:18 +02:00
d522113cb9 Open ports in firewall of compute nodes 2023-09-14 15:45:43 +02:00
7bfd786c01 Update slurm to 23.02.05.1 2023-09-13 17:44:24 +02:00
5a5f4672cd Monitor storage nodes via IPMI too 2023-09-13 15:57:13 +02:00
2646ad4b70 Enable fstrim service 2023-09-12 16:39:45 +02:00
b120a7ca85 Serve the nix store from hut 2023-09-12 12:19:43 +02:00
2a0254b684 Add encrypted munge key with agenix 2023-09-08 19:05:45 +02:00
e3e6e7662d Remove unused large port hole in firewall 2023-09-08 18:22:48 +02:00
868f825e26 Make exporters listen in localhost only 2023-09-08 18:13:04 +02:00
f231dc81f1 Allow only some ports for srun 2023-09-08 17:51:37 +02:00
a758eef354 Block ssfhead from reaching our slurm daemon 2023-09-08 17:36:28 +02:00
9c9c41fb57 Poweroff idle slurm nodes after 1 hour 2023-09-08 16:49:53 +02:00
1a1708f16f Add IB and IPMI node host names 2023-09-08 13:21:37 +02:00
efe1b7e399 flake.lock: Update
Flake lock file updates:

• Updated input 'bscpkgs':
    'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=ee24b910a1cb95bd222e253da43238e843816f2f' (2023-09-01)
  → 'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=6122fef92701701e1a0622550ac0fc5c2beb5906' (2023-09-07)
2023-09-07 11:13:45 +02:00
6122fef927 Don't replace the shebang in nix-wrap 2023-09-07 09:07:25 +00:00
Raúl Peñacoba
8597bb97ab Add nix-wrap, which enables isolated environment in clusters 2023-09-07 09:07:25 +00:00
7d4c9a57c6 Update ovni to 1.3.0 2023-09-07 10:54:15 +02:00
3efc10e57d Use version tag for sonar 2023-09-07 10:50:15 +02:00
065ab83083 Use release for bench6 dependencies 2023-09-07 09:13:12 +02:00
4883b750bd Fix bench6 commit 2023-09-07 09:08:36 +02:00
ee5cbd08dd Update sonar commit 2023-09-06 17:57:18 +02:00
61bd7ee947 Fix ovni gitUrl input parameter 2023-09-06 16:20:59 +02:00
abfd8484ee Add sonar library 2023-09-06 15:33:55 +02:00
a63f578c99 Update clangOmpss2 to 2023.05.1 2023-09-06 15:12:51 +02:00
01e07d559c Link clang with the dynamic llvm library
It dramatically reduces the size of the installation to 250 MiB. We also
need to inject the rpath of the libraries during the build phase with
CMAKE_BUILD_RPATH as well as zlib. The CMAKE_BUILD_WITH_INSTALL_PATH
option is disabled, as it contradicts the former.
2023-09-06 14:14:40 +02:00
4b06175b42 Only build clangOmpss2 to target the host 2023-09-05 17:55:38 +02:00
eb9876aff6 Unlock ovni gitlab runners 2023-09-05 16:59:45 +02:00
8d31c552f5 flake.lock: Update
Flake lock file updates:

• Updated input 'bscpkgs':
    'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=18d64c352c10f9ce74aabddeba5a5db02b74ec27' (2023-08-31)
  → 'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=ee24b910a1cb95bd222e253da43238e843816f2f' (2023-09-01)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/d680ded26da5cf104dd2735a51e88d2d8f487b4d' (2023-08-19)
  → 'github:NixOS/nixpkgs/e56990880811a451abd32515698c712788be5720' (2023-09-02)
2023-09-05 15:03:26 +02:00
68f4d54dd1 Add agenix to all nodes 2023-09-04 22:10:43 +02:00
2042d58b72 Add agenix module to ceph 2023-09-04 22:07:07 +02:00
2c8c90e6e4 Remove old secrets 2023-09-04 22:04:32 +02:00
208dcb7dde Mount /ceph in owl1 and owl2 2023-09-04 22:00:36 +02:00
e2f82a6383 Warn about the owl2 omnipath device 2023-09-04 22:00:17 +02:00
d704816de9 Clean owl2 configuration 2023-09-04 21:59:56 +02:00
74ec4eb22a Move the ceph client config to an external module 2023-09-04 21:59:04 +02:00
0a5f9b55f5 Reorganize secrets and ssh keys
The agenix tools needs to read the secrets from a standalone file, but
we also need the same information for the SSH keys.
2023-09-04 21:36:31 +02:00
900de39e2f Add anavarro user 2023-09-04 16:00:01 +02:00
1e466d07df Set zsh inc_append_history option 2023-09-03 16:57:53 +02:00
13807c5e8f Set zsh shell for rarias 2023-09-03 16:46:27 +02:00
d8d6d6d421 Enable zsh and fix key bindings 2023-09-03 16:42:04 +02:00
a242ddd39c Keep a log over time with the config commits 2023-09-03 00:02:14 +02:00
a2c5fe1f5e Configure bscpkgs.nixpkgs to follow nixpkgs 2023-09-02 23:37:59 +02:00
2c52ef9ff0 Store nixos config in /etc/nixos/config.rev 2023-09-02 23:37:11 +02:00
ee24b910a1 Use clang++ for C++ tests 2023-09-01 16:51:32 +02:00
4b1d4c18af Set the host triple in clang
Fixes the problem where the triple used by newer versions of
config.guess don't match due to a change in x86 from
x86_64-unknown-linux-gnu to x86_64-pc-linux-gnu.
2023-09-01 16:50:44 +02:00
fd5fb5c055 Add asan test for clangOmpss2 2023-09-01 16:43:49 +02:00
acb91695ac Enable binary emulation for other architectures 2023-08-31 17:27:08 +02:00
18d64c352c Add pkg-config dependency for paraverKernel 2023-08-31 12:56:35 +02:00
124cb6a4c3 Update nixpkgs in default.nix too 2023-08-31 12:43:07 +02:00
bcf2df64c8 Add initial flake.lock 2023-08-31 12:41:15 +02:00
c30851d6e9 Add packages to flake.nix 2023-08-31 12:40:54 +02:00
9d93760e6f Enable watchdog 2023-08-30 16:32:17 +02:00
aad67b9d99 Enable all osd on boot in lake2 2023-08-30 16:32:17 +02:00
e1d406023d Scrape lake2 too 2023-08-29 12:33:26 +02:00
db6bb90af8 Also enable monitoring in lake2 2023-08-29 12:29:41 +02:00
1266c8f04e Scrape metrics from bay 2023-08-29 11:58:00 +02:00
2b7823788c Add monitoring in the bay node 2023-08-29 11:53:32 +02:00
86eacdd3e5 Add fio tool 2023-08-29 11:27:50 +02:00
4fa074f893 Add ceph tools in hut too 2023-08-28 17:58:21 +02:00
a260a1bc1b Switch ceph logs to journal 2023-08-28 17:58:08 +02:00
8912d2b9bc Update ceph to 18.2.0 in overlay 2023-08-25 18:20:21 +02:00
b4015ded86 Move pkgs overlay to overlay.nix 2023-08-25 18:12:00 +02:00
0f54d63a46 Enable ceph osd daemons in lake2 2023-08-25 14:54:51 +02:00
6c656182f1 Add the lake2 hostname to the hosts 2023-08-25 14:44:35 +02:00
be4187de3c Use the sda for lake2 2023-08-25 13:40:10 +02:00
0b22a1b8a4 Remove netboot module 2023-08-25 13:39:01 +02:00
f18f1937ae Disable pixiecore in hut for now 2023-08-25 13:21:00 +02:00
4b78ec9134 Add PXE helper 2023-08-25 12:05:33 +02:00
6c0c26b3aa Enable netboot again for PXE 2023-08-24 19:08:23 +02:00
fb1744306d Specify the disk by path 2023-08-24 15:27:37 +02:00
394c7ecd7b Prepare lake2 config after bootstrap
The disk ID is different under NixOS.
2023-08-24 13:54:53 +02:00
3276f54e86 Add lake2 bootstrap config 2023-08-24 12:30:46 +02:00
4c806b8ae9 Add section to enable serial console 2023-08-24 12:29:44 +02:00
832866cbfa Add agenix to PATH in hut 2023-08-23 17:42:50 +02:00
9fc393bb6a Store ceph secret key in age
This allows a node to mount the ceph FS without any extra ceph
configuration in /etc/ceph.
2023-08-23 17:26:44 +02:00
d81d9d58e1 Add rarias key for secrets 2023-08-23 17:15:26 +02:00
d54dcc8d8f Add ceph metrics to prometheus 2023-08-22 16:33:55 +02:00
a5fae4a289 Mount the ceph filesystem in hut 2023-08-22 16:15:46 +02:00
a355926cf0 Add ceph config in bay 2023-08-22 15:58:48 +02:00
d7a4420205 Add the bay host name 2023-08-22 15:56:09 +02:00
0b55ce3d02 Remove netboot and fixes 2023-08-22 12:12:15 +02:00
0ce574800e Add bay node 2023-08-22 12:12:15 +02:00
a7e09e55df Update flake 2023-08-22 11:28:54 +02:00
1622b3e7fc Monitor power from other nodes via LAN 2023-08-22 11:28:54 +02:00
3424cac761 Increase prometheus retention time to one year 2023-08-22 11:28:54 +02:00
f98af9aeef Don't set all_proxy 2023-08-22 11:28:54 +02:00
b4a20d7c3a Update NODES to 1.0.1 2023-07-28 18:00:45 +02:00
8c14b75e44 Update nixpkgs to fix docker problem 2023-07-28 14:24:51 +02:00
e497e1b88b Allow access to devices for node_exporter 2023-07-28 13:55:35 +02:00
07411beb49 GRUB version no longer needed 2023-07-27 17:22:20 +02:00
e8bab9928d Upgrade flake: nixpkgs, bscpkgs and agenix 2023-07-27 17:19:17 +02:00
544d5a3d69 Kill slurmd remaining processes on upgrade 2023-07-27 14:49:20 +02:00
976cdd5a4d Update ovni to 1.2.2 2023-07-26 16:00:02 +02:00
312f2cb368 koro: Add vlopez user 2023-07-21 13:00:43 +02:00
45ac6e95e9 Add koro node 2023-07-21 13:00:08 +02:00
e6bb6e735d eudy: Add fcsv3 and intermediate versions for testing 2023-07-21 11:27:51 +02:00
cfbfcdbe8c eudy: Enable memory overcommit 2023-07-21 11:27:51 +02:00
c31bfd6b4d eudy: disable all cpu mitigations 2023-07-21 11:27:51 +02:00
f015e5f71c Use builtin.fetchurl to see the progress 2023-07-17 10:52:28 +02:00
534c5dd261 Cache intel oneAPI package list 2023-07-17 10:52:03 +02:00
caf0e9545a Fix gitUrl input name 2023-07-14 16:44:24 +02:00
d20fa359d9 Enable NTP using the BSC time server 2023-06-30 14:02:15 +02:00
9be15fdad2 Add the ssfhead node as gateway 2023-06-30 14:01:35 +02:00
f2f024b82d Add zlib to the rpath
Instead of using LD_LIBRARY_PATH we provide the rpath from cmake, as
otherwise the clang compiler is also missing the dependency.
2023-06-28 11:18:35 +02:00
932d273ec7 Build OmpSs-2 llvm with zlib
The zlib is required by the lld linker to work with zlib compressed
sections in the ELF header. We also set the LD_LIBRARY_PATH during the
build, as otherwise the llvm-tblgen binary is unable to find the zlib
library, as its missing the directory in the rpath.
2023-06-28 10:45:03 +02:00
13e365002c Use our host names first by default 2023-06-23 16:22:18 +02:00
a38072762f Add DNS tools to resolve hosts 2023-06-23 16:15:45 +02:00
adf1ff29a7 Lower perf_event_paranoid to -1 2023-06-23 16:01:27 +02:00
1ec8d7a625 Set perf paranoid to 0 by default 2023-06-21 16:24:19 +02:00
f78f4f5822 Add perf to packages 2023-06-21 15:41:06 +02:00
67a57cb3e5 Allow srun to specify the cpu binding
The task/affinity plugin needs to be selected.
2023-06-21 13:16:23 +02:00
85896f8546 Move authorized keys to users.nix 2023-06-20 14:08:34 +02:00
5e728773c3 Add rpenacob user 2023-06-20 12:54:26 +02:00
0a06cf564b Add osumb to the system packages 2023-06-16 19:22:41 +02:00
db26b2ae37 flake.lock: Update
Flake lock file updates:

• Updated input 'bscpkgs':
    'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs%2fheads%2fmaster&rev=c775ee4d6f76aded05b08ae13924c302f18f9b2c' (2023-04-26)
  → 'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs%2fheads%2fmaster&rev=cbe9af5d042e9d5585fe2acef65a1347c68b2fbd' (2023-06-16)
2023-06-16 18:33:54 +02:00
f7d00dec25 Set mpi to mpich by default in bscpkgs 2023-06-16 18:26:51 +02:00
2053ec82b7 Add missing parameter to extend 2023-06-16 18:26:51 +02:00
f2434a17c2 Use explicit order in overlays 2023-06-16 18:26:51 +02:00
1f7045fcfe Replace mpi inside bsc attribute 2023-06-16 18:26:51 +02:00
0c4a1efa27 Add mpich overlay 2023-06-16 18:26:51 +02:00
530958496b Add coments in slurm config 2023-06-16 18:26:50 +02:00
df378a2933 Add eudy host key to known hosts 2023-06-16 17:29:48 +02:00
2a0fe5a137 Rename xeon08 to eudy
From Eudyptula, a little penguin.
2023-06-16 17:16:05 +02:00
cbe9af5d04 Update TAMPI to 2.0 2023-06-16 17:05:36 +02:00
b2283efd46 Install the Intel MPI libmpi.so into lib/ 2023-06-16 17:05:09 +02:00
7f18deaf69 Update nixpkgs to d6b863fd to match nixos 2023-06-16 16:36:16 +02:00
b953fd4b2f Update osu benchmarks to 7.1-1 2023-06-16 16:27:47 +02:00
080811fe9d Add missing lib in osu benchmarks 2023-06-16 15:50:39 +02:00
e7647f1d99 Add pmix 4 2023-06-16 15:33:39 +02:00
aad2c276aa Update intel mpi to 2021.9 2023-06-16 15:32:49 +02:00
ce5577f14e Export intel-mpi from oneapi 2023-06-16 15:32:17 +02:00
e23392fccd Set install dir variable for Intel MPI 2023-06-16 15:31:58 +02:00
dfbeafa2b2 Update rebuild script for all nodes 2023-06-16 12:13:07 +02:00
7d4281a5c1 Add ssh host keys 2023-06-16 12:01:12 +02:00
dfea0be2d9 Set the name of the slurm cluster to jungle 2023-06-16 12:00:54 +02:00
df91da8c34 Change owl hostnames 2023-06-16 11:42:39 +02:00
30c21155af Add owl and all partition 2023-06-16 11:34:00 +02:00
a43016ebee Simplify flake and expose host pkgs
The configuration of the machines is now moved to m/
2023-06-16 11:31:31 +02:00
801bb4ba3c Rename xeon07 to hut 2023-06-14 17:28:40 +02:00
a9d740e95a Remove profiles older than 30 days with gc 2023-06-14 17:28:39 +02:00
08eaf312f2 Add ncdu to system packages 2023-06-14 17:28:39 +02:00
0b57bbc6e3 Move arocanon user from xeon08 to common 2023-06-14 16:22:43 +02:00
6558a6ab77 xeon08: Add config for kernel non-voluntary preemption 2023-06-14 16:17:33 +02:00
0d196af473 xeon08: Add perf 2023-06-14 15:42:20 +02:00
d35becb663 xeon08: Enable lttng lockdep tracepoints 2023-06-14 15:42:20 +02:00
5421eab09a xeon08: Add lttng module and tools 2023-06-14 15:42:20 +02:00
1c7de2f7c9 Serve grafana in https://jungle.bsc.es/grafana 2023-05-31 18:12:14 +02:00
c7692995f4 Add tree command 2023-05-31 18:11:34 +02:00
0af185afd8 Add file to system packages 2023-05-31 18:11:34 +02:00
470b3d2512 Add gnumake to system packages 2023-05-31 18:11:34 +02:00
1bf6747b3a Add cmake to system packages 2023-05-31 18:11:34 +02:00
59bf51dfde Add ix to common packages 2023-05-31 18:11:34 +02:00
f5dcaf831b Update ovni to release and add useGit option 2023-05-30 16:44:39 +02:00
feb39f404a Remove git attributes from derivation in nosv 2023-05-30 16:23:39 +02:00
11e897c10a Always build nodes with ovni 2023-05-30 16:14:04 +02:00
1da216bab5 Add nodes release and git version 2023-05-30 15:31:56 +02:00
d8c19eb4b4 Add nos-v release by default 2023-05-30 15:31:12 +02:00
0e176cb2a9 Update OmpSs-2 clang to 2023.05 2023-05-30 14:52:42 +02:00
3a249c5d88 Update nanos6 and merge release with git 2023-05-30 14:51:36 +02:00
df32aa62d0 Update nanos6 to last release 2023-05-30 09:41:17 +02:00
b72d9936a2 Improve documentation 2023-05-26 11:38:27 +02:00
5ebb57deff Add gitignore 2023-05-26 11:38:27 +02:00
5b82a72647 Set intel_pstate=passive and disable frequency boost 2023-05-26 11:38:26 +02:00
a5c7205481 Add xeon08 basic config 2023-05-26 11:38:26 +02:00
fd1b467a60 Add nixos-config.nix to easily enable nix repl 2023-05-26 11:29:59 +02:00
933cd1e3c7 Show commands being executed in clang test 2023-05-22 19:30:40 +02:00
5553ee79a9 Populate OMPSS2_RUNTIME in clang 2023-05-22 19:30:15 +02:00
bb6129a77e Reduce ompss2 test verbosity 2023-05-22 13:55:23 +02:00
b8f7c16d1c Use clang from git for NODES 2023-05-22 13:54:41 +02:00
3f4f3e1105 Export proper OmpSs-2 runtime home variable 2023-05-22 13:54:29 +02:00
a34e619333 Update Intel oneAPI 2023-05-19 18:52:32 +02:00
46a3465e78 Build clang with a new LLVM
Older LLVM 11 version produces a broken compiler, see:
https://pm.bsc.es/gitlab/llvm-ompss/llvm-mono/-/issues/183
2023-05-19 18:34:11 +02:00
1d788aeff2 Update bench6 to use cmake 2023-05-19 18:33:37 +02:00
9a500dd3d6 Update Nanos6 git with ./autogen.sh 2023-05-19 18:32:35 +02:00
0605bc4ceb Add bench6 new dependencies 2023-05-18 20:40:29 +02:00
882161b21e Automatically resume restarted nodes in SLURM 2023-05-18 12:48:04 +02:00
5e8ff50c98 Allow public dashboards in grafana 2023-05-09 18:53:31 +02:00
cdb0688ec1 Add hal ssh key 2023-05-09 18:37:38 +02:00
ebb5e94416 Increase the number of CPUs to 56 for nOS-V docker 2023-05-02 17:47:57 +02:00
89049d0b1f Allow 5 concurrent buils in the gitlab-runner 2023-05-02 17:38:10 +02:00
6d16772d07 Simplify bash prompt 2023-04-28 18:15:04 +02:00
e37f9e2b0f Roolback to bash as default shell
Zsh doesn't behave properly, it needs further configuration.
2023-04-28 17:59:19 +02:00
9767238c76 Use pmix by default in slurm 2023-04-28 17:07:48 +02:00
a5a0fd9b6f Increase locked memory to 1 GiB 2023-04-28 12:34:51 +02:00
be69070f61 Use the latest kernel 2023-04-28 11:51:38 +02:00
53f6dcec8d Disable osnoise and hwlat tracer for now
Reuse nix cache to avoid rebuilding the kernel.
2023-04-28 11:19:47 +02:00
87c4521de3 Update nixpkgs to nixos-unstable 2023-04-28 11:18:37 +02:00
461d6d2f34 Update nixpkgs 2023-04-28 11:13:46 +02:00
ef2ffa61c3 Update ib interface name in xeon02
It seems to be plugged in another PCI port
2023-04-27 18:29:32 +02:00
c0b23ad450 Add steps in install documentation 2023-04-27 17:30:53 +02:00
f12ba9f8b0 Add minimal netboot module to build kexec image 2023-04-27 16:36:15 +02:00
a211e9ebee Add xeon02 configuration 2023-04-27 16:28:12 +02:00
5dbbb27c43 Refacto slurm configuration into compute/control 2023-04-27 16:27:04 +02:00
69bb2128db Lock flakes and add inputs 2023-04-27 13:52:59 +02:00
c775ee4d6f Add flakes support 2023-04-26 17:07:08 +02:00
de7cae6208 Test flakes 2023-04-26 14:27:02 +02:00
de4ac8cbd6 Enable slurm in xeon01 2023-04-26 14:10:36 +02:00
e1dcad50d0 Use xeon07 as control machine 2023-04-26 14:10:36 +02:00
0120be66fb Remove xeon07 overlay to load upstream slurm 2023-04-26 14:10:36 +02:00
6cb079a44e Add script to rebuild configuration 2023-04-26 14:09:23 +02:00
a5449067a7 Add configuration for xeon01 2023-04-26 11:44:00 +00:00
1009736d81 Load overlays from /config 2023-04-26 11:44:00 +00:00
a94765e8ae Move net.nix to common 2023-04-26 11:44:00 +00:00
9630b23ce2 Remove host specific network options from net.nix 2023-04-26 11:44:00 +00:00
ed158ee87f Move ssh.nix to common 2023-04-26 11:44:00 +00:00
480dd95d9b Move overlays.nix to common 2023-04-26 11:44:00 +00:00
f7b18098b1 Move users.nix to common 2023-04-26 11:44:00 +00:00
c580254dde Move common options from configuration.nix 2023-04-26 11:44:00 +00:00
7e6c395ff8 Move the remaining hw config to common 2023-04-26 11:44:00 +00:00
6978677cb5 Move boot config to common/boot.nix 2023-04-26 11:44:00 +00:00
f5b4580dae Move filesystems config to common/fs.nix 2023-04-26 11:44:00 +00:00
035becd018 Use partition labels for / and swap 2023-04-26 11:44:00 +00:00
a7fb69ab92 Move fs.nix to common 2023-04-26 11:44:00 +00:00
733eb93f23 Move boot.nix to common 2023-04-26 11:44:00 +00:00
b60e821eaa Move disk selection to configuration.nix 2023-04-26 11:44:00 +00:00
f43d549294 Add common directory 2023-04-26 11:44:00 +00:00
ef2631b699 Build paraver with wxGTK32 2023-04-11 20:34:11 +02:00
9d2de00b0c Update Intel OneAPI package list 2023-04-11 20:33:50 +02:00
2627552a0f Use python 3 2023-04-11 20:33:35 +02:00
03c7256767 Disable extrae as is broken 2023-04-11 20:32:49 +02:00
a46a2ee794 Update nixpkgs commit to match xeon07 2023-04-11 20:31:54 +02:00
94fa0de4fc Add slurm 16.05.8.1 and hwloc 1.11.6 2023-04-02 21:39:30 +02:00
054d70d23b Add bench6 package 2023-03-14 14:58:42 +01:00
91a5bdb344 Add comment about __noChroot in OmpSs-2 test 2023-03-14 10:32:59 +01:00
f148a71c6c Print buildInputs in CI test derivation 2023-03-13 16:17:02 +01:00
243ed2331a Add new package NODES 2023-03-13 16:16:34 +01:00
9fae434553 Enable -fPIC in Nanos6 loader 2023-03-13 16:08:38 +01:00
898534ee52 bsc: add nosv package 2023-03-13 14:58:59 +01:00
bf28263cc5 Add OmpSs-2 test with tasks 2023-03-13 14:54:40 +01:00
84623ea9d0 Use lld linker for clangOmpss2 for LTO 2023-03-06 11:47:01 +01:00
5753f0c312 Remove old files 2023-03-03 18:28:44 +01:00
b57a17dd52 Remove alya 2023-03-03 18:25:06 +01:00
115e9beb59 Add Intel oneAPI 2023 with compilation tests 2023-03-03 18:18:51 +01:00
fd84af45f0 Add Intel OneAPI 2022 2023-03-02 19:03:58 +01:00
833d58a875 intel: add libstdc++ as dependency for patchelf 2023-03-02 18:46:52 +01:00
Raúl Peñacoba
5789b4a77a icc2021: added stdc++ includes 2023-03-02 18:46:52 +01:00
ef5e98e06d icc2021: fix wrapper names and ldflags 2023-03-02 18:46:52 +01:00
a6549c1908 intel: add intel compiler 2021.2.0 2023-03-02 18:46:52 +01:00
180fa4c992 intel: remove unused nanos6 dependency 2023-03-02 18:46:52 +01:00
4cfad119ce Update OmpSs-2 clang to 2022.11 2023-03-02 18:41:02 +01:00
a2e02bb136 Update mcxx to 2022.11 2023-03-02 18:40:57 +01:00
246aa8e7d1 Update Nanos6 to 2.8 2023-03-02 18:10:06 +01:00
f28817c3bf Build wxparaver and paraver-kernel from github 2023-03-02 17:58:53 +01:00
bff0395872 Update mpich to follow upstream 2023-03-02 13:08:01 +01:00
d18a95f8ed Update paraver to 4.10.6
Dropping wxpropgrid. Now it needs to link with openssl.
2023-03-02 12:47:25 +01:00
1a99a7eb73 Add more packages to CI 2023-03-02 12:08:02 +01:00
4b5a948918 Add PTR patch to Extrae 2023-03-02 12:03:12 +01:00
7ef24b88e4 Allow insecure packages 2023-03-02 11:46:27 +01:00
c28618b95c Update extrae to 4.0.1 2023-03-02 11:43:35 +01:00
20c5446743 CI build with verbose flag 2023-03-02 11:17:39 +01:00
38220140ec Update nixpkgs to release 22.11 2023-03-02 11:16:54 +01:00
f5987a0094 Add ci derivation 2023-03-02 11:15:25 +01:00
60b2f9f6cc Enable CI builds 2023-03-02 11:14:47 +01:00
b60698b791 pkgs: update nixpkgs to 1614b96a 2023-03-02 10:38:43 +01:00
e57107024e ovni: Update repository URL and unset commit 2022-07-25 14:50:34 +02:00
1ffca6c9e0 bsc: add ovni package 2022-07-25 14:50:18 +02:00
7d5e3f1845 nanos6: don't strip debug symbols 2022-04-05 15:40:08 +01:00
df7c79f34b clangOmpss2: merge both nix files into one
Add clangOmpss2Git to the overlay overriding the src attribute of the
release nix derivation, so we only keep one derivation for both
variants.
2022-04-05 15:40:08 +01:00
a2195aef43 clangOmpss2: use relative path for BINDIR
Absolute paths are not supported and cause a silent error when
installing the clang++ and other symlinks. See:

d93a11c138
58580e922a
2022-04-05 15:40:08 +01:00
6e7a7febd4 clangOmpss2: wrap clang++ too 2022-04-05 15:40:08 +01:00
0b319b8a63 clangOmpss2: 2021.06 -> 2021.11
libelf is now replaced by elfutils
2022-04-05 15:40:08 +01:00
315cf1d0de mcxx: update 2021.06 -> 2021.11 2022-04-05 15:40:08 +01:00
3e3ce35237 nanos6: update 2.6 -> 2.7 2022-04-05 15:40:08 +01:00
2227f08814 gpi-2 and tagaspi: enable parallel build 2022-04-05 15:34:10 +01:00
f74446b225 gpi-2: use last tagaspi tag 2022-04-05 15:33:30 +01:00
a5af7890b8 bsc: add GPI-2 and TAGASPI 2022-04-05 15:25:25 +01:00
Raúl Peñacoba
d2834624c2 paraver: enable OpenMP 2021-11-10 15:25:14 +01:00
35d19c262c garlic: add tool to query experiments 2021-11-03 16:19:49 +01:00
c0362b6639 paraver: add fast version for ordered traces 2021-11-02 16:18:52 +01:00
08aabfa657 icc: renew license until 2026-12-31 2021-07-14 15:29:55 +02:00
762fe8b82c wxpropgrid: disable parallel build
The parallel build fails with high probability due to incorrect
dependencies in the Makefile, which lead to an attemp to link too soon.
2021-07-12 18:38:10 +02:00
53d99d41cf mcxx: use git tag for the 2021.06 release 2021-06-30 19:24:56 +02:00
9eb5c74cd6 mcxx: update 2.2.98 -> 2.2.98+da8945d0 2021-06-30 17:19:39 +02:00
aa083b1b66 clangOmpss2: update 2020.11 -> 2021.06 2021-06-30 16:44:30 +02:00
58fab3b87e tampi: update 1.0.2 -> 1.1 2021-06-30 16:34:39 +02:00
f2c6a3cb15 nanos6: update 2.5.1 -> 2.6 2021-06-30 16:33:51 +02:00
b6d742380b cn6: test slow converter 2021-06-30 15:54:31 +02:00
c083d96b79 cn6: add experiment with nbody 2021-06-28 11:50:53 +02:00
1a9b8470bb cn6: add clock sync experiment 2021-06-22 15:36:49 +02:00
f9581cfb59 cn6: enable extra test utils 2021-06-22 15:36:49 +02:00
3be896d90d mcxx: add mcxx from the git repo 2021-06-02 14:08:16 +02:00
4125e39ce0 nanos6: don't strip symbols 2021-06-02 12:41:47 +02:00
11a521ff51 nanos6: allow the git url and branch to be specified 2021-06-02 11:35:30 +02:00
fb2b3cbe06 sh: add garlic-add-copyright tool 2021-05-11 12:25:15 +02:00
776a6ca1e4 sh: add commit propagator tool 2021-05-10 17:01:54 +02:00
83921d1788 creams: increase granularity with size 2021-05-09 11:37:53 +02:00
b6f563f621 creams: add granularity experiment with 16 nodes 2021-05-09 11:36:43 +02:00
5d6f691045 creams: add size experiment 2021-05-03 15:02:57 +02:00
3892167e7d creams: use python script to generate the input 2021-05-03 15:02:57 +02:00
6937ffcfe9 ds: link the resultTree in the dataset 2021-05-03 12:48:49 +02:00
760787858a bsc: disable hardening in some packages 2021-04-22 12:07:15 +02:00
a4b8f8e94b apps: disable hardening in all garlic apps 2021-04-21 18:23:27 +02:00
df62451fcd sh: add helper script to fix the figure subtitle 2021-04-21 17:58:59 +02:00
a9e1579242 osu: remove gsub hack 2021-04-21 17:54:08 +02:00
d5626851de osu: remove gsub hack and clean bw figure 2021-04-21 17:51:05 +02:00
5de45cb247 rplot: patch generated PDFs to use hyphen
The ISOLatin1 encoding uses /minus as char 45, while the (-) symbol used
in the paths is a /hyphen. This hack allows the paths in the generated
PDFs to be copied directly into a terminal.
2021-04-21 17:42:40 +02:00
92cd88e365 fig: use the $out path in the subtitle
The input dataset is not enough to determine which script produced a
given plot.
2021-04-21 13:40:25 +02:00
5a49611bf6 fwi: add gitTable to params 2021-04-20 18:51:36 +02:00
9fc2a2025c lulesh: add gitTable 2021-04-20 18:42:19 +02:00
2cc0c85635 ifsker: remove preferLocalBuild 2021-04-20 18:20:59 +02:00
c075498f71 ifsker: add gitTable 2021-04-20 18:20:27 +02:00
Kevin Sala
e0197950a6 ifsker: update app 2021-04-20 18:15:56 +02:00
cafc67d107 hpcg: add gitTable 2021-04-20 18:12:17 +02:00
c0a0eeec7f hpccg: fix indentation 2021-04-20 18:09:24 +02:00
fb1d50e9dd hpccg: update app and add gitTable 2021-04-20 18:02:53 +02:00
a359cc9d32 nbody: add gitTable 2021-04-20 17:47:49 +02:00
1402111e40 saiph: add gitTable 2021-04-20 17:36:33 +02:00
9377adf787 fwi: add gitTable 2021-04-20 17:36:33 +02:00
20e99f122f creams: use gitTable for all branches 2021-04-20 17:36:33 +02:00
53c098d921 heat: split the git table and use fetchGarlicApp 2021-04-20 17:36:33 +02:00
d2222f6868 sh: Format the git table in a single attribute set 2021-04-20 17:36:33 +02:00
dbdcfea019 tools: add fetchGarlicApp helper
Allows easy migration of the git server for all the apps and
reduces the boiler plate in the derivations.
2021-04-20 17:36:33 +02:00
375a79d27a heat: pin commits using gitTable 2021-04-20 17:36:33 +02:00
2aa099f0e2 sh: add script to build the gitTable 2021-04-20 17:36:33 +02:00
600e1b9987 tools: add helper function to find the git commit 2021-04-20 17:36:33 +02:00
a4752603e9 cn6: pin commit 2021-04-20 17:34:53 +02:00
5b4bb30e55 nbody: update and simplify figures 2021-04-20 17:16:17 +02:00
e1433fedb8 nbody: refactor experiments into common.nix 2021-04-20 17:13:29 +02:00
f729fc4006 nbody: rename granularity experiment file 2021-04-19 17:27:52 +02:00
Antoni Navarro
03298228e4 nbody: add strong scaling experiment 2021-04-19 17:27:52 +02:00
Antoni Navarro
58294d4467 nbody: add "nodes or sockets" experiment 2021-04-19 17:27:52 +02:00
Antoni Navarro
48a61dc292 nbody: update indexes 2021-04-19 17:27:52 +02:00
Antoni Navarro
5815a9af09 nbody: move "old" experiments to another folder 2021-04-19 17:27:52 +02:00
Antoni Navarro
ea66d7e4e0 nbody: update granularity tests 2021-04-19 17:27:52 +02:00
3e197da8a3 hpcg: update figures and remove old ones 2021-04-19 16:05:10 +02:00
866d4561d3 hpcg: remove old experiments 2021-04-19 16:01:11 +02:00
9a88319153 hpcg: add granularity experiment 2021-04-19 16:00:55 +02:00
a96839d11a hpcg: merge weak scaling and add size experiment
The scaling.nix file defines both the strong and weak experiments by
using the parameter "enableStrong".
2021-04-19 15:57:31 +02:00
a71ae9c2c6 hpcg: avoid mismatching names for gen units 2021-04-16 16:15:16 +02:00
d490ef2694 hpcg: remove unused extrae.xml file 2021-04-16 16:14:48 +02:00
b4e37a15a9 hpcg: refactor ss and gen using a common file
- The file gen.nix now provides an experiment for each unit, to reduce
  the evaluation time.

- The pipeline is specified in the common.nix file only.

- The input dataset path is no longer symlinked, but is specified in the
  "--load" argument.

- The size is renamed to "sizePerTask" instead of "n".
2021-04-16 11:51:34 +02:00
9bb570af7f tools: add floatTruncate function 2021-04-16 11:49:37 +02:00
Raúl Peñacoba
4d629fe8f7 hpcg: remove old comments 2021-04-16 09:32:28 +02:00
Raúl Peñacoba
f5c8d0cb88 hpcg: choose a smaller strong scaling problem size 2021-04-16 09:32:28 +02:00
Raúl Peñacoba
cb6577b439 hpcg: add strongscaling
HPCG rounds problem size axis when its value is < 16
2021-04-16 09:32:28 +02:00
Raúl Peñacoba
b60a46b683 hpcg: add weakscaling over some nblocks to check which axis is better 2021-04-16 09:32:28 +02:00
Raúl Peñacoba
1a6075a2b1 hpcg: add first granularity/scalability exps for tampi+isend+oss+task
- oss.nix runs valid hpcg layouts whereas slices.nix does not
2021-04-16 09:32:28 +02:00
12ff1fd506 garlicd: send logs to the builder 2021-04-16 09:29:33 +02:00
732b0c0e9c garlic tool: improve unit status information 2021-04-16 09:29:33 +02:00
64f077c4f6 stages: prepend the stage name to messages 2021-04-16 09:29:33 +02:00
7c94997023 control: add trap for bad exit 2021-04-16 09:29:33 +02:00
fb0dee4b61 exp: move exit1 experiment to slurm 2021-04-16 09:29:33 +02:00
bde54c69c5 sbatch: store queued status 2021-04-16 09:29:33 +02:00
2151e20bd6 exp: add exit1 experiment
Tests unit bad exits
2021-04-16 09:29:33 +02:00
886d16bcc6 garlic tool: add jq as dependency
So we can parse the experiment configuration in JSON
2021-04-16 09:29:33 +02:00
5c0f179830 stdexp: rename "name" to "clusterName" 2021-04-16 09:29:33 +02:00
422d359b48 script: stop on error by default 2021-04-16 09:29:33 +02:00
60248ab06b article: remove not used figures 2021-04-16 09:29:33 +02:00
1cb63b464d osu: adjust figures for publication 2021-04-16 09:29:33 +02:00
821b4f0d15 rplot: patch scales and fontconfig 2021-04-16 09:29:33 +02:00
0cf35decc5 osu: add mtu and eager experiments 2021-04-16 09:29:33 +02:00
26e3a86c78 garlic tool: check the presence of all the units
This check prevents a user from removing units between the
execution of the experiment and the fetch.
2021-04-16 09:29:33 +02:00
b96c39e0ba noise: add srun signal bug to the list 2021-04-16 09:29:33 +02:00
f842f1e01d slurm: add sigsegv experiment
Ensure that we can catch a sigsegv signal before and after the
MPI_Finalize call.
2021-04-16 09:29:33 +02:00
71c06d02da stages: add baywatch stage to check the exit code
This workaround stage prevents srun from returning 0 to the upper stages
when a signal happens after MPI_Finalize. It writes the return code to a
file named .srun.rc.$rank and later checks that exists and contains a 0.

When the program is killed, exits with non-zero and the error is
propagated to the baywatch stage, which aborts immediately without
creating the rc file.
2021-04-16 09:29:26 +02:00
604cfd90a3 test: add sigsegv after MPI_Finalize test
The current srun version used in MN4 returns 0 if the program crashes
after MPI_Finalize, as shown by this test.
2021-04-16 09:28:02 +02:00
07253c3fa0 fwi: update figure index 2021-04-14 17:18:46 +02:00
eab323a13a fwi: update io figure 2021-04-14 17:18:24 +02:00
8ce2a68cd7 fwi: update strong scaling figure script 2021-04-14 17:16:12 +02:00
99c6196734 fwi: update granularity figure 2021-04-14 17:05:09 +02:00
dd75a840ce fwi: use enableIO instead of ioFreq 2021-04-12 20:09:17 +02:00
e49e3b087f fwi: rename big io experiment 2021-04-12 19:49:31 +02:00
59040d9355 fwi: fix inverted resources 2021-04-12 19:31:35 +02:00
6422741cb7 fwi: merge io experiments into one file
The enableExtended parameter control if the experiment runs with
multiple nodes or only one.
2021-04-12 19:27:45 +02:00
99beac9b23 fwi: generate the model in every node
As we are using local storage, we need a copy of the input in every
node. The current method is to run the generator only in the rank which
has assigned the cpu 0 in the mask.
2021-04-12 19:01:10 +02:00
58dc277d3d fwi: refactor ss-io with common.nix
Also, keep the names short and consistent.
2021-04-12 17:57:46 +02:00
47b326c646 fwi: generate the input at runtime 2021-04-12 17:46:07 +02:00
419e7f95cc fwi: avoid input generation
The ModelGenerator is now included in the fwi-params, so that the input
can be generated at runtime.
2021-04-12 17:43:30 +02:00
b0af9b8608 srun: add postSrun hook 2021-04-12 17:41:59 +02:00
4afda7dbfb fwi: use common.nix in sync_io experiment 2021-04-12 16:27:18 +02:00
02a103565c fwi: use common.nix in reuse experiment 2021-04-12 15:48:59 +02:00
788dd13ebd fwi: merge mpi pure experiment
The getResources function is used to assign the proper cpu binding
depending on the version. However, additional contraints are required to
ensure that we have enough points in Y.

By default the mpi+send+seq branch is disabled.
2021-04-12 15:37:39 +02:00
41665bc6fc fwi: refactor config generation into common.nix 2021-04-12 15:01:25 +02:00
9aa07993b2 fwi: refactor ss and granularity experiments
A common.nix file contains the shared stages
2021-04-12 14:41:26 +02:00
e0a68c077c fwi: merge forkjoin ss experiment into one file
Additional options are only active with enableExtended = true
2021-04-12 12:51:10 +02:00
989f6ee018 fwi: adjust input size to meet timing constraints
The previous iniput size for both granularity and strong scaling tests
where too big to meet the timing constrains needed for garlic. This
patch sets a new, smaller, input size.

Also, a minor cleanup is applied to the rest of the fwi experiments
and figures.
2021-04-07 12:44:14 +02:00
3e5a56ebdb fwi: add tampi non-blocking variant 2021-04-07 12:44:14 +02:00
3ef4a505d3 fwi: add strong scalability tests 2021-04-07 12:44:14 +02:00
aadce016e1 fwi: add granularity and data reuse experiments
The data reuse experiment shows the effect of poor data locality versus
task granularity.
2021-04-07 12:44:14 +02:00
1d9a5c4721 fwi: fix input derivation
The fwiInput derivation must be the same used when compiled the fwi app
as the fwi-input used in the experiment.
2021-04-07 12:44:14 +02:00
11e400abb5 fwi: remove old experiment 2021-04-07 12:44:14 +02:00
a8477b1b05 fwi: add test figure with the time 2021-04-07 12:44:14 +02:00
7a6cbd3a9e fwi: update test experiment 2021-04-07 12:44:14 +02:00
3de7b5a0b6 fwi: save the params and frequencies files 2021-04-07 12:44:14 +02:00
485b9150e5 fwi: add problem size parameters 2021-04-07 12:44:14 +02:00
fa0e9f591f fwi: update repo url to PM server 2021-04-07 12:44:13 +02:00
de175b2380 fwi: fix input name 2021-04-07 12:44:13 +02:00
bfbbc294ae fwi: split into input and solver
All branches compile with several hacks.
2021-04-07 12:44:13 +02:00
9bea3cc264 fwi: add oss experiment 2021-04-07 12:44:13 +02:00
f10f8472ac fwi: add seq test experiment 2021-04-07 12:44:13 +02:00
26ad3e49f7 fwi: add gitBranch and copy params 2021-04-07 12:44:13 +02:00
312656ce54 heat: rename granul -> granularity experiment 2021-04-06 18:42:49 +02:00
63aa07dad5 heat: update granularity plot with modern ggplot 2021-04-06 18:40:19 +02:00
d1c32869c1 heat: split granularity with extended mode
The HWC version is not yet complete.
2021-04-06 18:38:15 +02:00
3566cf0152 develop: add paraver package 2021-04-06 11:14:30 +02:00
0b7e92b6f9 heat: add bar plot with time distribution 2021-04-06 11:05:56 +02:00
f8122f3c8b heat: use the hcut tool to limit the cpus 2021-04-06 11:05:56 +02:00
699404bafe bsc: add cpuid program 2021-04-06 11:05:56 +02:00
d68ce914ba heat: use cut to partition the trace
The awk script doesn't take in consideration the events close to the cut
points, which are significative with low parallelism.
2021-04-06 11:05:51 +02:00
cb482fa3ea heat: remove perf from the ctf experiment
As we would be extracting perf stats from the trace processing steps.
2021-04-06 11:05:10 +02:00
3c150d3910 doc: add contributing file 2021-04-06 10:50:39 +02:00
8a97fefafa saiph: simplify and update figure scripts 2021-04-01 19:25:38 +02:00
10b1ff8f7a saiph: simplify granularity and ss experiments 2021-04-01 19:25:38 +02:00
0e0f1b265f saiph: add extra parameters for the app 2021-04-01 19:25:38 +02:00
5ea9ff5ad8 machines: add cache sizes 2021-04-01 19:25:38 +02:00
Sandra
2b36e33b7e saiph: modify apps parameters 2021-04-01 19:25:37 +02:00
Sandra
b64b864194 saiph: clean exps and figs 2021-04-01 19:25:37 +02:00
Sandra
72e7a8dab7 shell: add clangOmpss2 and gdb 2021-04-01 19:25:24 +02:00
Sandra
46536548ca saiph: update scaling exp and figures 2021-04-01 19:24:38 +02:00
Sandra
8406c1c4e5 saiph: add total number of local blocks (#tasks) parameter 2021-04-01 19:24:38 +02:00
Sandra
bc912162a0 index: add vtk and boost 2021-04-01 19:24:38 +02:00
Sandra
4e727bf632 shell: add nix-diff 2021-04-01 19:24:38 +02:00
Sandra
5c7af00dfa saiph: add debug/asan flags parameters 2021-04-01 19:24:38 +02:00
Sandra
5caf2f79f3 saiph: change scaling R script 2021-04-01 19:24:38 +02:00
Sandra
a90c044c3e saiph: add manual global blocking
Ensure cuts in a single dimension
2021-04-01 19:24:38 +02:00
Sandra
99532c9c60 saiph: add manual distribution and nbl/nbg 2021-04-01 19:24:34 +02:00
Sandra
ddef901e2f saiph: add nsteps parameter to experiments 2021-03-30 18:54:35 +02:00
Sandra
1ae5acfe6a saiph: add nsteps in saiph app 2021-03-30 18:54:35 +02:00
d108306a29 saiph: add blocking experiments to index
Remove unused environment variables as well.
2021-03-30 18:54:35 +02:00
e0fbbe32a6 saiph: update granularity experiment and R script 2021-03-30 18:54:35 +02:00
Sandra
37e11c749f saiph: add cacheline compilation parameter 2021-03-30 18:54:35 +02:00
Sandra
02a62c18ac saiph: add strong scaling experiment 2021-03-30 18:54:35 +02:00
Sandra
0ac0205366 saiph: add figures for blocking experiment 2021-03-30 18:54:35 +02:00
Sandra
a2306eb941 saiph: add some blocking experiments 2021-03-30 18:54:35 +02:00
Sandra
38d4d0b48c saiph: delete extrae XML configuration files 2021-03-30 18:54:35 +02:00
63b08fa4e8 saiph: use nby for granularity plot 2021-03-30 18:54:35 +02:00
992af14c7f saiph: add scaling experiment 2021-03-30 18:54:35 +02:00
99f3326609 saiph: allow custom gitCommit 2021-03-30 18:54:35 +02:00
a4b2dfddb4 saiph: update granularity experiment 2021-03-30 18:54:35 +02:00
830d648925 saiph: reduce the number of loops
The current app Heat3D_vect has a long initialization time
2021-03-30 18:54:16 +02:00
e4ab177d6c saiph: remove dangerous Intel MPI envvar
It is no longer used, as we have moved to the release library version.
2021-03-30 17:56:26 +02:00
b7dcf7bc69 rplot: add support for gziped datasets 2021-03-30 16:35:47 +02:00
5ac581b573 creams: remove pure mpi from granularity 2021-03-30 16:14:32 +02:00
b900cb95f0 creams: make configurations unique 2021-03-30 16:14:11 +02:00
389d3f6310 creams: simplify granularity figure 2021-03-30 16:07:14 +02:00
76deac0a63 creams: update figures using one single pipeline 2021-03-30 15:59:52 +02:00
87f751185c creams: merge similar experiments together
Large experiments have the enableExtended parameter disabled by default,
which enables more tests.
2021-03-30 15:55:57 +02:00
ec056d97e5 rplot: add total job time in the plots 2021-03-30 15:49:40 +02:00
872ad1a289 stdexp: allow preSrun attribute in the srun stage
This option allows an experiment to inject commands before srun starts,
while keeping the standard srun stage options.
2021-03-29 17:46:19 +02:00
Pedro Martinez
617ef21d38 creams: redefine granularity figures 2021-03-24 13:52:26 +01:00
Pedro Martinez
5cd9894636 creams: redefine granularity experiments 2021-03-24 13:52:26 +01:00
Pedro Martinez
bfc32ef4b7 creams: readjust granularity for strong scalability 2021-03-24 13:52:26 +01:00
Pedro Martinez
cb4d27aefb creams: bugfix in granularity values 2021-03-24 13:52:26 +01:00
Pedro Martinez
d27c696259 creams: reduce granularity combinations to 8 2021-03-24 13:52:26 +01:00
Pedro Martinez
a55019c6ef creams: add more nodes for granularity experiments 2021-03-24 13:52:26 +01:00
8a81c6bfba creams: add granularity figure
Only the hybrid experiment is used by now
2021-03-24 13:52:26 +01:00
c59f298ae2 creams: reduce granularity experiment units 2021-03-24 13:52:26 +01:00
6818b29d02 creams: fix outdated nanos6.toml
This temporal fix allows the experiment to ignore the nanos6.toml in the
git repository, and only set version.dependencies variable.
2021-03-24 13:52:26 +01:00
Pedro Martinez
8445fb0928 creams: run the cp command in one process only 2021-03-24 13:52:26 +01:00
Pedro Martinez
1aa0e77157 creams: avoid race condition
Ensure only one Slurm process performs environment operations
2021-03-24 13:52:26 +01:00
Pedro Martinez
938246322f creams: add OpenMP branches 2021-03-24 13:52:26 +01:00
Pedro Martinez
6c0f4ec1b3 creams: add granularity experiments 2021-03-24 13:52:26 +01:00
46f7add84c garlicd: use head instead of the read builtin
It seems that bash is unable to propagate the SIGINT while
reading from the FIFO. This fixes the anoying ^C^C^C problems
found when running garlicd.
2021-03-22 18:43:01 +01:00
87fa3bb336 sbatch: assert types to avoid silent parse errors 2021-03-19 16:37:31 +01:00
9c8282362a cn6: use install target from the Makefile
The PREFIX must be set both at build and install time.
2021-03-19 11:39:58 +01:00
74cd3d4fbc rplot: fix fontconfig warning 2021-03-12 19:53:24 +01:00
c41456412c examples: Add granularity examples 2021-03-12 19:33:40 +01:00
f0ae0df341 Add MIT license 2021-03-12 13:57:22 +01:00
9d38a37787 doc: link to the user guide in the readme 2021-03-12 13:28:05 +01:00
7d66b34140 nbody: fix converter rename in nanos6 CTF options 2021-03-12 12:58:41 +01:00
0781e8b28e nbody: remove jemalloc experiments
Nanos6 has jemalloc enabled by default
2021-03-12 12:58:41 +01:00
88087bb4b7 nbody: add time-node plot 2021-03-12 12:58:41 +01:00
637c57b388 nbody: improve unit name 2021-03-12 12:58:41 +01:00
26ab2d9bbd nbody: fix indentation in baseline R script 2021-03-12 12:58:41 +01:00
133ef50bb4 nbody: show time points 2021-03-12 12:58:41 +01:00
3a2694ad36 nbody: add mpi branch in scaling experiment 2021-03-12 12:58:37 +01:00
5804b167db nbody: add scaling figure 2021-03-12 12:57:01 +01:00
425479c9fc nbody: add scaling experiment 2021-03-12 12:57:01 +01:00
a286488979 rplot: add egg package for ggarange function 2021-03-12 12:56:58 +01:00
d70adae9ec heat: add figure for the mode experiment 2021-03-12 12:14:51 +01:00
854707103c heat: add ctf stage to analyze mode times 2021-03-12 12:13:10 +01:00
972be56eed heat: patch to print the start and end time
It will be used to cut the CTF traces to take only the computation part
in cosideration.
2021-03-12 12:11:24 +01:00
56c625bfe4 ds: add ctf mode analysis 2021-03-12 12:10:18 +01:00
968accd552 cn6: install dur utility for extracting statistics 2021-03-12 12:09:02 +01:00
3445a72686 garlic tool: copy recursively from .garlic/
It allows an experiment to store a CTF trace in the resultTree (which is
not recommended for large traces).
2021-03-12 11:13:35 +01:00
f68564efe6 nanos6: add debug version for for libstdc++ 2021-03-11 17:57:50 +01:00
4780a81d70 nanos6: add patch to use CLOCK_MONOTONIC in CTF 2021-03-11 17:57:50 +01:00
b192fc44f5 heat: refactor cache into granul experiment 2021-03-09 18:45:33 +01:00
7b4da07dbf heat: add more figures from perf counters 2021-03-09 18:21:59 +01:00
3bcbc62a98 fig: add fig.heat.cache to fig.article 2021-03-09 18:21:22 +01:00
52360c9459 rplot: add viridis R package 2021-03-09 18:20:40 +01:00
71a1396955 ds: parse time with perf generator 2021-03-09 11:07:19 +01:00
b600f64fcc heat: add cache miss experiment and figure 2021-03-05 18:31:31 +01:00
14fbb1499b ds: add perf stat parser
We can only read one output file by now, located at:
.garlic/perf.csv
2021-03-05 18:29:43 +01:00
c1efba1e65 heat: rename test -> granul experiment 2021-03-05 18:28:32 +01:00
29d7245135 heat: add figure with heatmap 2021-03-05 16:21:13 +01:00
363700eb9a heat: update test experiment 2021-03-05 16:18:51 +01:00
7e10a43b40 heat: update new app version
The blocksize is now specified at runtime
2021-03-05 16:16:06 +01:00
c4e49ea249 llvm-ompss2: update to last commit d2d451fb 2021-03-04 12:43:17 +01:00
d4ca58db2c mcxx: update to tag 2.2.98 2021-03-04 12:42:44 +01:00
d5912c3889 tampi: update to last commit 2021-03-03 19:06:25 +01:00
cb12aa2d94 nanos6: enable jemalloc by default 2021-03-03 19:05:33 +01:00
5fae560ce9 nanos6: update 2.5 -> 2.5.1 2021-03-03 19:05:03 +01:00
6b6b54f757 timetable: add total_time column 2021-03-03 19:00:36 +01:00
b79951c9fe bsc: add new package lmbench 2021-03-03 17:43:09 +01:00
c684b1870a osu: update 5.6.3 -> 5.7 2021-03-03 17:41:05 +01:00
5afe819724 osu: add impi figure 2021-03-03 12:42:19 +01:00
651d91ef79 fig: improve indentation 2021-03-03 12:42:19 +01:00
14211c9895 osu: use ggsave and reduce verbosity 2021-03-03 12:42:19 +01:00
6973f48638 osu: add an experiment for Intel MPI tunning 2021-03-03 12:42:19 +01:00
4786953eeb garlic: fix self/super with correct scope
The callPackage function was trying to find packages in bsc.self before
the self of the input parameters.
2021-03-03 12:42:19 +01:00
a6815dc7cf fig: add article figure directory 2021-03-03 12:41:31 +01:00
6f2375804d nixtools: pin commit 2021-03-03 11:53:12 +01:00
4ffb609261 osu: add figures using the fast generators 2021-03-01 12:21:10 +01:00
1d015c7e1e ds: add osu fast generators 2021-03-01 12:00:58 +01:00
ed932c9921 osu: add bw test 2021-03-01 11:58:23 +01:00
a36d912022 osu: add multithread benchmark 2021-03-01 11:55:13 +01:00
8373751f67 rplot: remove suffix from input link
We may have compressed input datasets
2021-03-01 11:41:28 +01:00
2f7032aca6 pp: remove unused derivations and helpers 2021-03-01 11:40:56 +01:00
6dd41fd96f fig: use the fast timetable generator by default 2021-03-01 11:38:28 +01:00
09a0348b0e ds: add fast timetable generator 2021-03-01 11:16:03 +01:00
051a74b85d srun: allow commands to run before srun 2021-02-26 17:00:09 +01:00
8a77900201 srun: don't expand variables on install 2021-02-26 16:59:29 +01:00
1291b90b7f user guide: correct typo 2021-02-26 12:18:50 +01:00
8e130604aa machines: set the hardware revision for MN4
This change will cause a rebuild of all experiments.
2021-02-25 20:45:20 +01:00
0015c7e4cd pp: remove launcher
It has now been integrated with resultTree in pp/store.nix
2021-02-25 12:29:12 +01:00
9612c69aec doc: add garlic configuration section
Update the garlicd usage as well.
2021-02-25 11:38:29 +01:00
6e0e2f0bf6 garlicd: drop bscpkgs argument requirement
The bscpkgs/default.nix is not longer read, as the garlic tool only
needs it for the fetching (-F), when it runs nix-build.
2021-02-25 11:38:29 +01:00
48820ee2d3 resultTree: garlic must be used from the nix shell 2021-02-25 11:38:29 +01:00
9277e60079 timetable: enable verbose processing 2021-02-25 11:38:29 +01:00
c869b6e3b4 garlic: enable verbose rsync fetch 2021-02-25 11:38:29 +01:00
0b95ea20b7 garlicd: allow manual experiment executions 2021-02-25 11:37:58 +01:00
ceb25e5d18 osu: add figure for latency tests 2021-02-23 17:52:48 +01:00
0c9e89dcc0 osu: update experiments using stdexp 2021-02-23 15:22:56 +01:00
ebcbf91fbe exec: allow manual specification of program path 2021-02-23 15:22:18 +01:00
3e2b369e3e garlicd: allow nix builders write to the pipes 2021-02-17 10:28:34 +01:00
d4947a40b9 Fix ssh missing shell 2021-02-17 10:28:11 +01:00
243d022620 cn6: update name and add to the shell 2021-02-15 17:44:20 +01:00
0ee2747215 garlicd: avoid no match fail
We check the result in the next if.
2021-02-15 16:32:06 +01:00
5fd2a62684 doc: update garlicd usage from the nix-shell 2021-02-15 16:22:45 +01:00
0e0bf9e7a7 garlic: add shell with the garlic tools 2021-02-15 16:22:06 +01:00
cb5bcd7097 garlicd: add to index and check for error
The garlicd is now available under garlic.garlid and it requires the
extra-sandbox-path option to be properly set.
2021-02-15 16:20:06 +01:00
d51fe5db48 garlic tool: ensure the mountpoint is enabled 2021-02-15 16:18:21 +01:00
c36b724e9a Add experimental garlicd doc 2021-02-15 13:00:19 +01:00
cdf48181e5 user guide: add time measurement sections 2021-02-08 19:05:46 +01:00
a6b7b14d5e user guide: add initialization time limit 2021-02-08 19:05:46 +01:00
2ca58c46b4 user guide: Add postprocessing section 2021-02-08 19:05:46 +01:00
25208a8158 user guide: add tar.gz target for the web 2021-02-08 19:05:19 +01:00
c46feb4bf2 user guide: use ms macros
Added HTML output
2021-02-08 19:05:19 +01:00
4d626bff97 user guide: test ms macros 2021-02-08 19:05:19 +01:00
042876a287 user guide: generate html with css 2021-02-08 19:05:19 +01:00
edd71815eb pp: fix code block for html 2021-02-08 19:05:19 +01:00
39c360b413 user guide: add readme and branch name conventions 2021-02-08 19:04:51 +01:00
3ce0d3934b user guide: reorder development 2021-02-08 19:04:45 +01:00
60cab85fc4 user guide: expand the develop section 2021-02-08 19:04:36 +01:00
95809bd2bf user guide: add stub with mm macro 2021-02-08 19:04:30 +01:00
e5561b8735 control: save total execution time 2021-02-08 14:14:08 +01:00
ed1cd75d56 impi: update 2019.9.304 -> 2019.10.317 2021-02-05 10:16:57 +01:00
d4dfbb7501 Remove garlic.ds attribute
Datasets are now handled directly when creating the figures, via the
timetable attribute.
2021-02-03 15:34:25 +01:00
b65a442cb0 fig: use timetable attribute in other plots 2021-02-03 15:34:10 +01:00
9c6b7a9f87 timetable: missing quote 2021-02-03 13:51:24 +01:00
d84ccf566b launcher: fix typo 2021-02-03 13:51:04 +01:00
0faf22a43f fig: add nbody example using timetable attribute 2021-02-03 13:07:55 +01:00
e89139284a stdexp: add result and timetable targets
These targets allow one experiment to directly refer to another
experiment results, thus a dependency chain can be formed to ensure
execution order.

It also simplifies the dataset definition, as they can be automatically
fetched from the experiment directly.
2021-02-03 12:37:54 +01:00
b453c12253 pp: Add automatic launcher 2021-02-03 12:36:59 +01:00
32d8636ae1 timetable: prevent empty time lines 2021-02-03 12:27:23 +01:00
e4e427b7f6 garlicd: add daemon to launch experiments 2021-02-03 12:08:25 +01:00
fe760c0023 garlic: sent trebuchet output to stderr 2021-02-03 11:50:31 +01:00
4591eca1fd garlic: Add blackbox diagram 2021-02-01 11:11:54 +01:00
9beda65778 lulesh: add experiment with all variants 2021-01-28 15:27:21 +01:00
0f62151dcf lulesh: follow garlic git branches 2021-01-28 15:27:21 +01:00
0bc81c8943 tools: add range2 function 2021-01-28 15:00:43 +01:00
a3804e31f2 develop: Simplify packages 2021-01-26 14:59:57 +01:00
ed4a9e1bc3 Add branch diagram 2021-01-25 17:54:24 +01:00
57c60821ce icc: Update url
Fixes #87
2021-01-25 15:25:42 +01:00
1e84dc196a ctfast: Add experimental ctf conversor 2021-01-18 18:43:26 +01:00
3d0e93b4d3 Add slides pdf 2021-01-14 13:06:20 +01:00
8262fd3104 Add slides for meeting 3 2021-01-13 20:19:28 +01:00
2b9c3da911 Add script stage 2021-01-12 18:19:49 +01:00
aeac1a6068 exec: Force newlines
Allow single line commands like pre="true"
2021-01-11 19:15:37 +01:00
130fe39c8e exec: Abort on error
We need exit on the first error, as otherwise we cannot track a bad
execution when no exec is done (when post is not empty).
2021-01-11 18:29:30 +01:00
5c2bd13c3d Add unstable nix for MN4
Fixes the fallocate problem
2021-01-11 16:42:30 +01:00
140598a28b Pin nixpkgs 2021-01-11 16:41:56 +01:00
892fb35d27 nbody: Fix infinite recursion
We want to override the previous layer (super), not the last one (self).
2021-01-11 14:30:12 +01:00
afd333adef creams: fix indentation 2020-12-18 12:43:06 +01:00
76f2ef4b95 creams: add figures for scalability 2020-12-18 12:26:40 +01:00
ed5f6bc22b nanos6Git: Correct typo 2020-12-17 15:27:50 +01:00
3b80c2fcb9 creams: Merge hybrid and pure datasets into ss.all 2020-12-17 15:26:25 +01:00
b5cadefca9 Allow a space before time tag
This matches the fortran format for creams
2020-12-17 15:24:23 +01:00
Pedro Martinez
203dc9f295 Configure the nanos6 environment and get the right hardware attributes 2020-12-15 19:34:49 +01:00
Pedro Martinez
2e18761b48 Use the 'hw' attributes 2020-12-15 19:11:29 +01:00
Pedro Martinez
748d335a39 Define variables 'ntasksPerNode' and 'cpusPerTask' for each experiment and other minor changes 2020-12-15 19:11:29 +01:00
9646a1298d Fix propagation of bsc.extend
Fixes #82
2020-12-11 17:15:05 +01:00
5a8cc1e514 stdexp: Run python snippets and import the result 2020-12-10 15:41:49 +01:00
7d4db6b6de control: Exit on error
This prevents srun from silently returning with an error, without
actually queueing the job of a run.
2020-12-07 16:33:40 +01:00
756c5dff92 Update PM git server 2020-12-07 13:47:17 +01:00
9a0ea08d72 Reorganization
- All garlic stuff is moved into garlic/
- Group the overlay index by sections
- Add a garlic/default.nix link to the main default.nix, so we can
  build derivations at garlic/
2020-12-07 13:33:42 +01:00
a8db596b35 Add lulesh and hpccg again 2020-12-04 11:26:18 +01:00
90d7c83261 Add a hwloc test 2020-12-04 11:18:44 +01:00
d70316a25a fwi: disable nanos6 in ModelGenerator 2020-12-04 11:17:15 +01:00
eb4adf9520 ifsker: initial version 2020-12-03 18:49:28 +01:00
266fffdb5f miniamr: initial version for OmpSs-2 2020-12-03 18:09:47 +01:00
f65e4d01c3 Simplify compiler name variables 2020-12-03 18:06:51 +01:00
53d8e535b5 clang: Use llvm 11 by default 2020-12-03 16:59:51 +01:00
1bdeca9e7d unit: Remove dangerous slash from index names 2020-12-03 16:33:48 +01:00
5e9adf3fe6 nbody: Fix x label 2020-12-03 13:22:48 +01:00
c858f521bf isolate: add $TMPDIR in the namespace 2020-12-03 13:22:10 +01:00
bdaadd4ef7 nbody: add ctf tests 2020-12-03 13:20:40 +01:00
b8a1ea3f72 develop: Fix inputrc missing key codes 2020-12-03 13:09:42 +01:00
eea9539258 develop: Set shell and hisfile 2020-12-03 12:14:04 +01:00
3dbb24dd9e develop: add more tools 2020-12-03 12:05:24 +01:00
da4bbf8533 isolate: only load some files from /etc 2020-12-03 12:04:51 +01:00
df1f22c122 develop: support for srun 2020-12-02 13:38:43 +01:00
f87d830218 isolate: preserve TERM 2020-12-02 13:06:55 +01:00
3d352fee19 isolate: allow argument passing 2020-12-02 13:06:35 +01:00
284662d6cd develop: fix bash PS1 2020-12-02 12:22:20 +01:00
84a8060bc5 intel: Upgrade expired license 2020-12-02 12:13:34 +01:00
1340d1d2e8 develop: Experimental interactive support 2020-12-02 11:58:00 +01:00
1f841649f8 exec: add support for nixPrefix 2020-12-02 11:57:40 +01:00
8d5853bba9 Add vite and otf packages 2020-11-30 20:23:44 +01:00
dd5832b39d Fix nanos6 jemalloc typo 2020-11-30 20:08:59 +01:00
ad7c04845b Add paraverExtra with some patches 2020-11-30 20:07:59 +01:00
6483d645d1 babeltrace2: enable parallel build 2020-11-27 20:13:11 +01:00
4000dbd0b8 Rename slides and generalize makefile 2020-11-24 18:05:16 +01:00
6fa3facfb1 Preliminar version for the slides 2020-11-24 17:58:14 +01:00
ed95cb0a04 Add wip presentation 2020-11-23 19:06:15 +01:00
aca7e36fc7 bigsort: add experiment with input generation 2020-11-20 15:41:27 +01:00
0bb5c76aad bigsort: add extra programs 2020-11-20 15:40:17 +01:00
2153e58baf bigsort: add the shuffle program 2020-11-20 15:39:34 +01:00
ceeb0f7f41 bigsort: add genseq program 2020-11-20 15:38:26 +01:00
a147a396d9 trebuchet: add the experiment as attribute 2020-11-20 15:35:36 +01:00
8bc5656461 tools: recursive getExperiment
It allows getExperimentStage to be called from any stage above the
experiment.
2020-11-20 15:34:14 +01:00
d192a59fdc control: Export the run iteration 2020-11-20 15:32:41 +01:00
734d494d96 stdexp: Allow extra mounts 2020-11-20 15:30:47 +01:00
2863ab6ae1 machines: Use fs topology 2020-11-20 15:29:03 +01:00
4f0da10321 bigsort: Use cpusPerTask instead of cpuBind 2020-11-20 13:57:12 +01:00
David Alvarez
0c438d4dac Setup for test experiment 2020-11-20 13:57:12 +01:00
David Alvarez
a0dac209e3 First test experiment 2020-11-20 13:57:12 +01:00
David Alvarez
37bd4c33f2 Add BigSort MPI+OpenMP 2020-11-20 13:57:12 +01:00
e8f649327a exec: Avoid variable expansion at build
All bash variables passed in env, pre or post are now expanded at
execution time..
2020-11-20 13:54:45 +01:00
daadcc93d0 ompss2: fix to the last release 2020-11-19 18:50:30 +01:00
e65c801a20 paraver: Downgrade wx to 2.8 and add wxpropgrid
Fixes a problem with i3 when opening a new timeline view, which caused a
rapid switch between paraver main window and the timeline.
2020-11-19 16:36:47 +01:00
a076d7d3d0 Add paraver with some patches for tiling WM 2020-11-18 14:00:19 +01:00
d2d3ccf332 Idea for FS naming convention 2020-11-17 18:33:57 +01:00
e1e34ddf75 exec: add pre and post code to allow cleanup tasks 2020-11-17 16:09:38 +01:00
33f6ae7e55 Add bundled report example 2020-11-17 15:51:09 +01:00
fe0bd8b200 creams: fix pure experiment
Use machine agnostic specification for resources
2020-11-17 12:31:03 +01:00
bcb9cf31a3 Add datasets for creams experiments 2020-11-17 11:42:34 +01:00
dcb56643d5 nbody: add a small experiment 2020-11-17 11:36:42 +01:00
ef4bb13a7d Add all experiments in one dummy target 2020-11-17 11:32:06 +01:00
69af473241 Disable old hpcg experiments 2020-11-17 11:31:34 +01:00
016422cede Update nbody experiment
Generate the input based on the target machine description.
2020-11-17 11:26:35 +01:00
5e50ef19fe Update experiments with cpusPerTask
Try to avoid manually setting the hardware specs and rather use
the hw attrset.
2020-11-17 11:17:57 +01:00
641e752bd5 Add a trace message at unit evaluation 2020-11-17 11:12:12 +01:00
74537e682c Use divisors in the slurm cpu experiment 2020-11-17 11:01:34 +01:00
433c8864ea Add divisor generator 2020-11-17 11:01:34 +01:00
e0ca33569b garlic tools: rename divList -> halfList 2020-11-17 11:01:34 +01:00
65918bca21 dummy: Set the programPath for experiments 2020-11-17 11:01:34 +01:00
dea523460a Add slurm affinity experiment 2020-11-17 11:01:34 +01:00
b4a3bb0ede New stdexp resource specification
Now the options	cpusPerTask ntasksPerNode nodes and jobName are required
for the sbatch stage. Also cpuBind has been removed and is always set to
"cores,verbose" in the srun stage.
2020-11-17 11:01:34 +01:00
dabc6be640 Add more helper functions 2020-11-17 11:01:34 +01:00
2a42c1e53e Fix aliases 2020-11-17 10:57:17 +01:00
18afcb1f44 Avoid nixpkgs reevaluation
The bsc attrset is now extensible: replacing a few bsc packages is very
fast. Also we allow the complete bscpkgs to be within other custom
overlays (not tested yet).
2020-11-17 10:49:45 +01:00
3372f94855 noise: cut long lines and move vim line to bottom 2020-11-13 10:38:33 +01:00
288318b556 Merge branch 'saiph' into 'master'
Saiph

See merge request rarias/bscpkgs!6
2020-11-13 10:25:47 +01:00
42f2227a9f sbatch: Use experiment reservation if given 2020-11-13 10:17:54 +01:00
Sandra
4ae66adb9a Saiph: adding granularity experiment and figures 2020-11-13 09:56:40 +01:00
Sandra
86d1d426ec Saiph: Removing devMode parameter 2020-11-12 19:10:43 +01:00
5333058741 report: build only required figures
Introduces a intermediate derivation that can be imported into the
report derivation, which contains a string cmd that expands the fig
variable as needed.
2020-11-11 19:03:02 +01:00
9a7e59a076 nanos6: fix the git commit
Until nanos6 release is complete, we don't want frequent large rebuilds
2020-11-11 19:00:33 +01:00
0b0f6ac9f0 rplot: Add a reference to the dataset 2020-11-11 18:59:57 +01:00
74ce07b193 rdma-core: use upstream by default
The systemd dependency is pulled anyway
2020-11-11 17:31:41 +01:00
f2610361a7 icc: fix hash and internal version number 2020-11-11 17:06:36 +01:00
acc3390b6b WIP: icc update 2020-11-11 17:06:36 +01:00
9faa4ef101 rdma-core: only remove binaries 2020-11-11 17:06:03 +01:00
cd3afe4ad6 rdma-core: drop systemd dependency 2020-11-11 14:45:02 +01:00
4111535a9d clangOmpss2: Remove clang from the inputs
Is already provided in stdenv as we use llvm10, and otherwise it
will pull clang 7 as dependency.
2020-11-11 13:17:31 +01:00
09361fae77 extrae: use pname to get the version 2020-11-11 13:17:03 +01:00
dc3e84a148 tampi: use the last release by default 2020-11-11 12:25:21 +01:00
1838178761 tampi: Update to 1.0.2 and use fetchFromGitHub
The configure flags are no longer required.
2020-11-11 12:24:41 +01:00
63f966e3c1 extrae: Add patch to follow upstream 2020-11-11 12:19:07 +01:00
966606b62d hpcg: precompute the input 2020-11-09 17:48:46 +01:00
5763b91d39 Use the trebuchet only to specify an experiment 2020-11-09 17:46:11 +01:00
47f67dcd85 Extend abstract 2020-11-09 12:16:56 +01:00
48869d6e4a Clarify some sections 2020-11-09 12:09:54 +01:00
92f58651b8 Increase margins and enable utf8 targets 2020-11-09 12:09:22 +01:00
c0669d7dc8 Update Intel MPI: 2019.8.254 -> 2019.9.304 2020-11-06 15:45:00 +01:00
31f7d17a41 Add manual diagram for nroff 2020-11-06 14:26:11 +01:00
538d595d30 Add MN4 ssh key section 2020-11-06 14:23:55 +01:00
dec183b221 Fix execution out path 2020-11-06 12:31:39 +01:00
92eee2ede8 Use target machine notation 2020-11-06 12:31:31 +01:00
a8208480c1 Add a nix shell for playing with plots 2020-11-05 20:01:26 +01:00
dd0823876a hpcg: add plot for oss experiment 2020-11-05 19:59:47 +01:00
9d878eeb4a saiph: add dataset for numcomm 2020-11-05 19:58:01 +01:00
11ac02da08 heat: Add test experiment and plot 2020-11-05 19:56:26 +01:00
074a75facb saiph: name the experiment and units in numcomm 2020-11-05 19:53:38 +01:00
7a80d1ca98 heat: Use clang by default 2020-11-05 19:52:37 +01:00
9e477a2313 hpcg: smaller input size 2020-11-05 19:46:34 +01:00
5bd042ef67 nbody: mark the points with bad std 2020-11-05 19:43:39 +01:00
d7be13f88d Update garlic options in store stage 2020-11-05 19:38:45 +01:00
476c2f20f0 Add manual and update the garlic tool 2020-11-05 19:31:21 +01:00
de6b4864ee Add garlic tool manual 2020-11-05 19:29:40 +01:00
33682ef48d Document the results and pp stages 2020-11-05 14:52:57 +01:00
634d2040b5 Add reference index 2020-11-04 13:00:42 +01:00
df4d908f1c Add more rendered files to ignore 2020-11-04 12:57:22 +01:00
f0122d557f WIP: postprocessing doc 2020-11-04 12:56:35 +01:00
62c9da2474 Add hpcg oss experiment dataset 2020-11-03 19:10:00 +01:00
0c58bb63b5 hpcg: add exp and unit name 2020-11-03 19:10:00 +01:00
de46366985 nbody: plot nb/cpu rather than nb 2020-11-03 19:10:00 +01:00
376ab9b32a nbody: Remove test and use baseline 2020-11-03 19:10:00 +01:00
5eea48c5b0 Add exp and unit name to nbody tampi experiment 2020-11-03 19:10:00 +01:00
f1f75c1c11 Rearrange experiment datasets 2020-11-03 19:10:00 +01:00
c3988dacd2 WIP: documentation for the pp pipeline 2020-11-03 19:10:00 +01:00
e778ad75b3 Reorder garlic sets 2020-11-03 19:10:00 +01:00
317409f6ac Move index and out inside the user directory 2020-11-03 19:10:00 +01:00
3eae92bdc4 Remove old pp stages 2020-11-03 19:10:00 +01:00
8bc0dc202d New fetching mechanism with garlic tool 2020-11-03 19:10:00 +01:00
6b40e6f9e9 Experimental garlic tool 2020-11-03 19:10:00 +01:00
d5d42b3c09 Add unit and exp name to nbody test 2020-11-03 19:10:00 +01:00
0bcfe5d25b Add new store pp stage 2020-11-03 19:10:00 +01:00
5e2797bcde Create index files for the experiments 2020-11-03 19:10:00 +01:00
efd7df068e Print full experiment path 2020-11-03 19:10:00 +01:00
7c5345f4bc report: Idea to reduce build time 2020-11-03 19:10:00 +01:00
43991e9173 nbody: plot nblocks in test 2020-11-03 19:10:00 +01:00
7b26b59988 Use rsync to fetch only the logs 2020-11-03 19:10:00 +01:00
a66cdb52fb nbody: Fix test experiment 2020-11-03 19:10:00 +01:00
3bd4e61f3f WIP: Testing with automatic fetching 2020-11-03 19:09:59 +01:00
59346fa97e control: Add status file 2020-11-03 19:09:59 +01:00
fd1229ddc0 nbody: add simple test figure 2020-11-03 19:09:59 +01:00
8ce88ef046 Add dataset attrset in garlic
Modify nbody to evenly distribute blocks per cpu
2020-11-03 19:09:59 +01:00
06c29b573f Add exp.nbody.tampi variants 2020-11-03 19:09:59 +01:00
7852d86a3f Fix plot details 2020-11-03 19:09:59 +01:00
4beb069627 WIP: postprocessing pipeline
Now each run is executed in a independent folder
2020-11-03 19:09:59 +01:00
1321b6a888 Add experiments with jemalloc and CPU affinity 2020-11-03 19:09:59 +01:00
ed8a6416a0 Add support for nanos6 with jemalloc 2020-11-03 19:09:59 +01:00
81d144d716 Remove exp attrset from report
Fixes #43
2020-11-03 19:09:59 +01:00
30ad4219d9 Add example report 2020-11-03 19:09:59 +01:00
067fb0c0a2 Add R shell for quick plots 2020-11-03 19:09:59 +01:00
308673f7f6 Increase nbody test cases 2020-11-03 19:09:59 +01:00
1bd9cb6c0f Move the plot script to R 2020-11-03 19:09:59 +01:00
ede25b6736 Use the stage names as inputs 2020-11-03 19:09:59 +01:00
2680dcb66f Don't nest the unit results
The experiment directory now contains symlinks to the units, keeping the
old structure. The unit results are directly placed in the garlic out
directory.
2020-11-03 19:09:58 +01:00
be0506bc21 Remove unused test for exec stage 2020-11-03 19:09:58 +01:00
f33137a55e WIP: Add experimental figure pipeline 2020-11-03 19:09:58 +01:00
65745e0aaf WIP: Add another nbody experiment 2020-11-03 19:09:58 +01:00
c3659d316d Add perf stage 2020-11-03 19:09:58 +01:00
4f901c1b9c WIP: add postprocessing stages 2020-11-03 19:09:58 +01:00
74f83b5c11 WIP: manual plot 2020-11-03 19:09:58 +01:00
11601703ce Fix shebang in nanos6 master
Also perl is a new dependency now
2020-11-03 19:08:19 +01:00
6f60e3cab2 Fix groff PDF engine
Fixes issue #51
2020-11-03 11:16:58 +01:00
72ba080db1 Merge branch 'nbody' into 'master'
NBody Experiments

See merge request rarias/bscpkgs!5
2020-10-30 16:25:27 +01:00
dad70761ad Merge branch 'hpcg' into 'master'
Hpcg

See merge request rarias/bscpkgs!4
2020-10-30 16:24:24 +01:00
Raúl Peñacoba
9c20537f91 Since mpi+omp version uses 6 threads, change nblocks values 2020-10-30 15:11:23 +01:00
Raúl Peñacoba
56584c9e97 Remove the osu set of tests 2020-10-30 15:01:50 +01:00
Raúl Peñacoba
6a1375726f Fix problem sizes to be equivalent between versions 2020-10-30 14:44:33 +01:00
Raúl Peñacoba
d757332448 Remove extrae home 2020-10-30 14:35:09 +01:00
Raúl Peñacoba
58e3d48a16 Use mask_cpu and n.x n.y n.z instead of n 2020-10-30 14:08:56 +01:00
Raúl Peñacoba
b856e2147a Use discrete deps in nanos6. Pass nblocks to omp version and use the same experiments as oss 2020-10-30 14:08:56 +01:00
Raúl Peñacoba
22a294f9cc Forgot to set one task per node 2020-10-30 14:08:55 +01:00
Raúl Peñacoba
ea0272c212 Add OmpSs-2 (no mpi) version 2020-10-30 14:08:55 +01:00
Raúl Peñacoba
e20061254b WIP: Add mpi, omp and mpi+omp experiments. See more.
Seems that gcc compilation with OpenMP throws an error. Investigate.
I think I've forgot to add an override of mpicxx compiler backend
2020-10-30 14:08:55 +01:00
01b2584688 Update hpcg experiments 2020-10-30 14:08:55 +01:00
Raúl Peñacoba
7bf3e81233 WIP: trying to make mpi branch working 2020-10-30 14:08:55 +01:00
Raúl Peñacoba
6bd7e12cff WIP: forgot to add the folder 2020-10-30 14:08:55 +01:00
Raúl Peñacoba
b5fb3730ac WIP: first serial experiment. Don't know how to add gcc to compile 2020-10-30 14:08:55 +01:00
Raúl Peñacoba
a44042615a WIP 2020-10-30 14:08:55 +01:00
Antoni Navarro
05ce36e158 Add the MPI-weak scaling experiment and strong scaling experiments 2020-10-29 16:31:21 +01:00
Antoni Navarro
6ccc159487 Fix one of the CPU Masks in the weak scaling experiment 2020-10-29 16:30:55 +01:00
Antoni Navarro
8b985de65d Add a few scalability experiments for some variants 2020-10-28 15:35:09 +01:00
ae6a3f9206 Enable python bindings in babeltrace 1 2020-10-16 19:31:43 +02:00
327a155907 Add babeltrace2 for nanos6 2020-10-16 18:18:31 +02:00
80ccd1240a Less verbose execution 2020-10-14 16:29:22 +02:00
9d8f7d9074 Print the experiment being run 2020-10-14 16:28:27 +02:00
c7d2e2d866 Write the unit config in a file 2020-10-14 16:27:47 +02:00
148c614540 Add MN4 hw description 2020-10-14 16:24:56 +02:00
478535b4d1 Define CC and CXX for gcc 2020-10-13 17:43:23 +02:00
7a37913b4e Set the ssh host from the machine config 2020-10-13 14:30:03 +02:00
05b37aa11d Remove cluster scripts from nixtools 2020-10-13 14:17:23 +02:00
04328d81ff Add runexp stage documentation 2020-10-13 14:07:34 +02:00
a38ff31cca Introduce the runexp stage 2020-10-13 13:00:59 +02:00
d0a259f15d Ignore generated doc 2020-10-13 12:17:14 +02:00
f2b39decba Update execution doc with isolation 2020-10-13 12:16:48 +02:00
251103ffd3 Fix tbl preprocessor option 2020-10-13 12:16:46 +02:00
6ab448b10a Fix trebuchet description 2020-10-09 20:28:00 +02:00
aa1ffa5208 Remove unused experiments 2020-10-09 20:17:35 +02:00
4de20d3aa5 Remove old stages and update some 2020-10-09 20:12:52 +02:00
27bc977590 Remove strace from isolate stage 2020-10-09 19:50:28 +02:00
1b703bd431 Fix saiph numcomm experiment 2020-10-09 19:40:49 +02:00
298c7362b3 New config design 2020-10-09 19:33:06 +02:00
9020f87765 Simplify saiph numcomm experiment 2020-10-09 17:20:50 +02:00
53dca32469 Simplify experiment 2020-10-09 17:19:00 +02:00
9d2ce2a1c2 Remove old experiments 2020-10-09 16:43:00 +02:00
e6e42dcec9 Remove old apps 2020-10-09 16:42:06 +02:00
332b738889 Move apps into garlic/apps 2020-10-09 16:42:06 +02:00
a576be8031 WIP stage redesign 2020-10-09 16:42:06 +02:00
654e243735 Include an index in the trebuchet 2020-10-09 16:42:06 +02:00
45afe7d391 Simplify experiment stage 2020-10-09 16:42:06 +02:00
d599b8c52f New naming convention 2020-10-09 16:42:06 +02:00
697d4e652e Ignore pdf and generated txt 2020-10-09 16:42:06 +02:00
26ea326ded Group stages 2020-10-09 16:42:06 +02:00
66a5e06ada Generate trebuchet from nix 2020-10-09 16:42:06 +02:00
e8d884a627 Document the execution pipeline 2020-10-09 16:42:06 +02:00
81004b5ee6 control: Fix bashism 2020-10-09 16:42:06 +02:00
4ea0d16926 WIP isolation 2020-10-09 16:42:06 +02:00
ba221c5200 Add rw test 2020-10-09 16:42:06 +02:00
effcc2d20b Working isolated environment 2020-10-09 16:42:06 +02:00
2a01ee7f24 WIP isolate execution 2020-10-09 16:42:06 +02:00
896ebd4ace WIP nix-isolate 2020-10-09 16:42:06 +02:00
0a26c72440 extrae: Remove dangerous home 2020-10-08 18:34:20 +02:00
4ce514de9b Merge branch 'saiph' into 'master'
Saiph

See merge request rarias/bscpkgs!3
2020-10-07 14:56:33 +02:00
Sandra
c36fc8a08b NOISE verbose fixes 2020-10-07 14:53:06 +02:00
Sandra
ec555e59e7 Setting a developer mode and its implication 2020-10-07 11:53:47 +02:00
Sandra
8f65030161 Saiph: numcomm experiment changes 2020-10-07 11:38:57 +02:00
Sandra
30630a74be Saiph: Vectorisation compiler info flags 2020-10-07 11:38:02 +02:00
6d413c946c nbody: Remove libtampi-c patch
See #37
2020-10-05 12:39:34 +02:00
533d8e9768 Fix tampi experiment to use multiple CPUs per task 2020-10-05 10:47:16 +02:00
d4ea0fe607 tampi: remove hacks from configure flags
The verbose make flag is added to ensure the log contains the complete
compilation line.
2020-10-05 10:45:11 +02:00
2f56488197 Merge branch 'master' of bscpm02.bsc.es:rarias/bscpkgs 2020-10-05 10:31:15 +02:00
3dd609f7db Switch to TAMPI from gitlab as default 2020-10-05 10:31:09 +02:00
368aa57cb7 nbody: Remove OpenMPI dirty hack
Was fixed in 7e1a5128b6
2020-10-05 10:28:38 +02:00
18081b3485 Merge branch 'creams' into 'master'
Creams

See merge request rarias/bscpkgs!2
2020-10-02 18:40:45 +02:00
Pedro Martinez
231672a222 Rename files to improve consistency 2020-10-02 18:28:13 +02:00
Pedro Martinez
b403fbefe1 Add hybrid strong scalability experiments 2020-10-02 17:48:00 +02:00
Pedro Martinez
c85b2976ef Fix non-hybrid strong scalability experiments 2020-10-02 16:47:45 +02:00
Pedro Martinez
6ae71cc5e9 Improvement the experiment based on CREAMS 2020-10-02 16:40:43 +02:00
Pedro Martinez
5cbc8e4fbb First attempt to create an experiment with CREAMS: strong scaling from 1 to 16 nodes using the pure MPI version 2020-10-02 16:40:43 +02:00
50eeca2257 hist: Add -S option and allow joined plots 2020-10-02 15:30:55 +02:00
61a2db03dc Add postprocessing hist tool 2020-10-02 11:58:04 +02:00
fd47044bfb Merge branch 'saiph' into 'master'
Saiph changes

- nix-shell changes
- useless exports avoided

See merge request rarias/bscpkgs!1
2020-10-02 10:52:12 +02:00
Sandra
79a4a4d16b saiph: removing home paths 2020-10-02 10:46:56 +02:00
Sandra
cec7a280c0 saiph: removing nix-shell to avoid its use!! 2020-10-01 17:52:50 +02:00
Sandra
dcf64bd1f6 adding NOISE point 2020-10-01 17:52:50 +02:00
Sandra
ce7566cf7a saiph: removing useless exports 2020-10-01 17:52:50 +02:00
Sandra
78b96c1bc6 saiph: including reservation option 2020-10-01 17:52:50 +02:00
Sandra
6a2d865225 saiph: adding ministat app to saiph shell 2020-10-01 17:52:50 +02:00
Sandra
8f5c5146b3 Saiph: update branchname according to garlic nomenclature 2020-10-01 17:52:50 +02:00
Sandra
e3349bb864 saiph: exp: adding extrae config files 2020-10-01 17:52:50 +02:00
Sandra
ef592c060f Saiph: saiph shell 2020-10-01 17:52:50 +02:00
d210e96d18 Mark the launcher for upload 2020-10-01 10:48:54 +02:00
35f4ba545a Experimental GDB stage 2020-09-30 16:00:34 +02:00
a227084e39 tampi: add gitlab repo in tampiGit 2020-09-30 09:35:23 +02:00
ec21ba98b5 nbody: Allow custom reservation 2020-09-30 09:32:25 +02:00
69b1dcf08a nbody: forgot nixsetup attr 2020-09-30 09:24:14 +02:00
eb46e8f41b tampi: Disable the C++ MPI interface for OpenMPI
Fixes #30
2020-09-29 16:55:14 +02:00
fa734deaca extrae: remove home path from the xml 2020-09-28 14:11:14 +02:00
f72a4e9bc8 Enable symbolizer for asan 2020-09-28 13:07:07 +02:00
ae2cdf8790 numcomm: disable extrae 2020-09-28 13:06:35 +02:00
dadc02ca99 Update libpsm2: disabled by now 2020-09-28 13:01:31 +02:00
ff4d39233a Add valgrind stage 2020-09-28 13:00:59 +02:00
985091130d clang: use the commit hash as version 2020-09-28 12:56:13 +02:00
724b8f232a Intel MPI: release_mt -> release
Fixes issue #28
2020-09-28 12:21:41 +02:00
sandra
c1b64e8897 saiph: if extrae add some env var 2020-09-23 13:13:51 +02:00
sandra
c8915dfc89 saiph: cc is a experiment parameter 2020-09-23 13:10:12 +02:00
sandra
3419db1fc6 saiph: nixsetup to re-enter nix after the new added stage 2020-09-23 13:06:16 +02:00
sandra
32ac89b97f Saiph: cc is a parameter of the app, not defined at stdenv anymore
[its default value is clangOmpss2]
2020-09-23 12:58:51 +02:00
79fae204c2 Typo 2020-09-22 18:39:29 +02:00
ed7f6e3e97 nbody: Clean environment 2020-09-22 18:39:11 +02:00
1d5b528cd0 Change output log files 2020-09-22 18:38:37 +02:00
e3623b05fd Print the env via stderr 2020-09-22 18:38:25 +02:00
ebd947c544 Set default mpi implementation to Intel MPI 2020-09-22 18:02:32 +02:00
e044ce918e Add OpenMP noise section 2020-09-22 18:01:42 +02:00
7de0593e4b nanos6: Use git commit hash as version only 2020-09-22 17:42:36 +02:00
58e6c76349 Move apps to garlic 2020-09-22 17:41:40 +02:00
c5e225c778 saiph:Remove old experiments 2020-09-22 17:40:26 +02:00
edf429c932 Avoid loading .bashrc 2020-09-22 17:39:26 +02:00
cd37d513e8 saiph: Extrae with the correct MPI 2020-09-22 14:26:01 +02:00
ad4df5e05d saiph: Up to 4 numcomm experiment 2020-09-21 19:54:12 +02:00
5920c964d2 saiph: fix hardening and affinity 2020-09-21 19:23:17 +02:00
cc101ad1d3 Add saiph experiments 2020-09-21 17:30:24 +02:00
126f05e92c Simplify paths 2020-09-21 14:34:08 +02:00
dba1cc22bc New design with overlays 2020-09-16 12:22:55 +02:00
847b5b3e0a Add noise experiment with nbody 2020-09-03 16:19:52 +02:00
c4dc42c2a4 Add acctg-freq to sbatch stage 2020-09-03 16:19:19 +02:00
be95827927 Add loops param to control stage 2020-09-03 16:18:50 +02:00
bdc221ba81 Add perf for linux 4.9 2020-09-02 17:07:34 +02:00
8110bc2976 New stage design 2020-09-02 17:07:09 +02:00
d469ccd59d Add extrae and perf stages 2020-09-02 10:44:13 +02:00
d05d32edbf Fix repo path and bashrc 2020-08-31 17:56:58 +02:00
68c8691916 Update title 2020-08-31 17:34:37 +02:00
4fa8d8f683 Remove duplicated section 2020-08-31 17:33:37 +02:00
8613253395 Add MN4 section and rename 2020-08-31 17:29:32 +02:00
0cc5fe92e5 Add documentation on sources of variability 2020-08-28 20:01:58 +02:00
196b681586 mpich: add enableDebug option 2020-08-26 19:21:14 +02:00
87809ef903 Update extrae and enable man pages 2020-08-26 19:20:17 +02:00
09c2b9005a Testing nbody blocksize with impi
Weird run times with srun: Two exceed 20%. Relative times:

0.998649        0.998936        0.999409        1.00018         1.00191
0.998684        0.998936        0.999432        1.00041         1.00222
0.998776        0.999065        0.999527        1.00126         1.0024
0.998786        0.999084        0.999558        1.00138         1.00242
0.998856        0.999102        0.999727        1.00155         1.25585
0.998895        0.9992          0.999849        1.0018          1.27138
2020-08-25 18:39:31 +02:00
cfa5187988 nbody: use intel cc and mpi by default 2020-08-25 18:38:31 +02:00
27fbecf970 nbody: Use garlic git URL 2020-08-25 18:37:50 +02:00
839489d20f Remove nix-setup verbose info 2020-08-25 18:36:33 +02:00
d1e152a917 Exit on error in control script 2020-08-25 18:35:58 +02:00
f44f5b4338 Merge branch 'master' of bscpm02.bsc.es:rarias/bsc-nixpkgs 2020-08-25 15:16:51 +02:00
fa1f06ce31 Use nix copy to upload to mn4, fixes #15 2020-08-25 15:16:19 +02:00
cff653d164 Simplify dummy 2020-08-25 14:52:18 +02:00
67ac951289 Merge branch 'master' of bscpm02.bsc.es:rarias/bsc-nixpkgs into master 2020-08-25 13:00:10 +02:00
5b1a296640 Add build debug section 2020-08-25 12:59:44 +02:00
76b0a239e3 sbatch: Add reservation flag 2020-08-24 18:07:09 +02:00
1473874563 Use relative path in sbatch 2020-08-24 18:06:47 +02:00
4b27ceec6d Add clsync tool 2020-08-21 19:49:23 +02:00
5314f343b6 Add static nix with shell set to /bin/sh 2020-08-19 18:16:00 +02:00
14684040a5 Intra/inter node latency tests 2020-08-19 11:07:21 +02:00
c70d35cd50 Add MPICH with libfabric enabled 2020-08-19 11:06:23 +02:00
1e07be863a Add OSU test benchmarks 2020-08-18 18:28:30 +02:00
ecc01e4314 Add old SLURM, pmix and pmi2 versions 2020-08-17 18:55:01 +02:00
23fa7d8654 Update and fix Intel MPI, fixes #9 2020-08-17 18:51:51 +02:00
01295487d8 Add srun wrapper and use pmi2 2020-08-17 18:50:18 +02:00
df18435dfc Provide argvWrapper 2020-08-12 14:00:04 +02:00
338736d257 Add control and nix-setup layers 2020-08-11 12:05:43 +02:00
ef1aeb2cfa Run each experiment in a unique directory 2020-08-10 18:25:53 +02:00
8db4ef2594 Tidy nbody experiment 2020-08-10 16:06:42 +02:00
b777fbc6d5 Merge branch 'master' of bscpm02.bsc.es:rarias/bsc-nixpkgs 2020-08-10 14:15:07 +02:00
b9e9409a59 Success sbatch launch in MN4 with nbody seq 2020-08-10 14:13:28 +02:00
f4cbd654e2 slurm17: Add pmix library 2020-08-05 17:44:03 +02:00
9631f4c223 Add slurm 17.11.9-2, builds ok. 2020-08-05 10:57:05 +02:00
bab4c696d8 First successful execution with SLURM 2020-08-04 18:38:33 +02:00
39a639ac10 Testing SLURM jobs with ppong 2020-08-04 11:51:09 +02:00
85c15e9f3f Testing sbatch job 2020-07-31 18:47:33 +02:00
c7c8d858f4 Test runner script WIP 2020-07-29 18:38:39 +02:00
7c92f713cd Add ParaStation MPI implementation 2020-07-29 18:38:27 +02:00
5df174f24e Print the app being run 2020-07-29 18:36:35 +02:00
272511f058 Use local build for experiments 2020-07-27 19:14:29 +02:00
f1e891b6bf Show loop optimization problems 2020-07-27 19:13:21 +02:00
f6137a7bc0 Allow multiple space-separated flags 2020-07-27 19:13:11 +02:00
b93851ba93 Testing experiments with nbody 2020-07-27 17:55:56 +02:00
b042e783e5 Add CC and CXX names to compilers passthru 2020-07-27 17:55:35 +02:00
ea81c34f31 Merge branch 'master' of bscpm02.bsc.es:rarias/bsc-nixpkgs 2020-07-27 15:24:26 +02:00
97d69d25ee Fix Intel URLs
Fixes #5
2020-07-27 15:23:42 +02:00
0eec726335 Merge branch 'master' of bscpm02.bsc.es:rarias/bsc-nixpkgs 2020-07-27 13:18:18 +02:00
76ec5d5f16 Add dummy app 2020-07-27 13:17:52 +02:00
11901e77de Rename gauss-seidel to heat 2020-07-27 13:17:08 +02:00
979888eede Add generators for experiments 2020-07-27 11:14:33 +02:00
bbc851db78 Add config generation 2020-07-24 18:34:18 +02:00
9cba2d609c Working proof of concept for garlic experiments 2020-07-24 15:30:28 +02:00
bad6f3c761 Add garlic group 2020-07-24 13:24:30 +02:00
ac1523d946 Merge branch 'master' of bscpm02.bsc.es:rarias/bsc-nixpkgs 2020-07-24 11:33:44 +02:00
cf72d526ee Add mpptest 2020-07-24 11:33:05 +02:00
419418781f Revert "icc: use fetchTarball"
This reverts commit 215b104174e7bdcf9dfe6727683261eea54d036f.
2020-07-23 19:10:37 +02:00
f842b22330 Merge branch 'master' of bscpm02.bsc.es:rarias/bsc-nixpkgs 2020-07-23 19:00:09 +02:00
215b104174 icc: use fetchTarball 2020-07-23 19:00:03 +02:00
0a09affbc4 impi: use fetchTarball 2020-07-23 18:47:20 +02:00
1e54fbdc43 Fix libcxx include path 2020-07-21 16:31:31 +02:00
10b061aa96 icc: Fix updated url 2020-07-21 09:33:41 +02:00
ab0aa74590 Add garlic group with all apps 2020-07-20 17:31:17 +02:00
f07d87e97e impi: fix sed path and add link to intel64 2020-07-20 17:06:14 +02:00
ca0c1445ba Add custom mcxx version 2020-07-20 16:08:15 +02:00
b8d15e7d84 Ignore source folder 2020-07-20 16:07:26 +02:00
f20ef93c56 impi: allow echo as compiler for mpitool 2020-07-20 16:06:00 +02:00
ba13d37694 Remove custom nanos6 2020-07-20 16:05:32 +02:00
528cd7d205 hpcg: Missing TAMPI patch 2020-07-20 15:58:06 +02:00
60fdba40ae fwi: Use 4_MPI_ompss variant.
The -D_GNU_SOURCE define is required before mcc includes nanos6.h
2020-07-20 15:32:00 +02:00
c50158e3be Add fwi app 2020-07-20 12:58:54 +02:00
81bcf20419 hpccg: Copy binaries to output 2020-07-20 12:39:39 +02:00
321bfa290c Set serial compiler to Intel 2020-07-20 12:06:22 +02:00
3b23b230ed Add hpccg app 2020-07-20 12:04:15 +02:00
11b1652617 Unify package versions 2020-07-20 11:59:58 +02:00
d634538223 Use upstream nanos6 from git and disable hardening
The bindnow hardening option is incompatible with the ifunc symbol
resolution mechanism. All hardening is disabled as well.
2020-07-15 12:21:48 +02:00
cd409677b0 Add hpcg app 2020-07-13 16:46:44 +02:00
0b2f9df3ea lulesh: Use nanos6 from git 2020-07-13 16:45:36 +02:00
3298c5442c Add lulesh app 2020-07-13 14:09:20 +02:00
99b716db87 icc: Propagate gcc as is required to build 2020-07-13 14:07:24 +02:00
a78f0caec9 intel: Enable ifort compiler 2020-07-10 17:04:21 +02:00
dc12cbe045 creams: Cleaning unused dependencies 2020-07-10 17:02:33 +02:00
577a7c3190 Add CREAMS app 2020-07-10 16:49:39 +02:00
7c68efe743 mcxx: remove build dependency with icc 2020-07-10 16:42:33 +02:00
261d304961 Add ifort to intel compilers 2020-07-10 13:42:55 +02:00
0daa0b9c35 Remove patch phase for gauss seidel app 2020-07-10 13:19:48 +02:00
114a6b081f Add icc in mcxx to enable imc* wrappers 2020-07-10 13:17:45 +02:00
fdc8b68d9a Disable libstdcxxHook 2020-07-08 15:00:39 +02:00
5df94bfc66 Use current gcc version with mcxx 2020-07-08 14:59:19 +02:00
Kevin Sala
7b2c88be78 Adding Gauss-Seidel benchmark.
It does not work yet due to a gcc compilation issue.
2020-07-08 13:35:46 +02:00
Rodrigo
6f06022aa5 Typo in git repo 2020-07-08 12:16:59 +02:00
599e504f1a Remove libgomp and libiomp from clang 2020-07-06 15:58:09 +02:00
c03ac6d05a Remove unused clang nix file 2020-07-06 15:32:55 +02:00
a95f7fa35e Add details for xeon07 2020-07-06 11:19:20 +02:00
018bebc264 Disable debug in clang+ompss2 compiler 2020-07-06 11:15:55 +02:00
ee5964a984 Disable assertions in clang 2020-07-03 18:34:57 +02:00
0f2b4754fd Add a dummy bin for the examples 2020-07-03 18:26:04 +02:00
91c38d70a8 Add README 2020-07-03 18:25:22 +02:00
0663895b3f Ignore vim swap files 2020-07-03 15:14:08 +02:00
bdfcb65b7e Delete .swp file 2020-07-03 15:12:57 +02:00
7d8f86eaad saiph: sanitize address and compile for avx2 2020-07-03 11:13:41 +02:00
1e02ac9023 Enable compiler-rt for asan and update clang-ompss2 2020-07-02 21:10:44 +02:00
e0c5a3ebca Prefer makeFlags and use local directory 2020-07-02 15:54:41 +02:00
940c494d8e Use last intel compiler 2020 version 2020-07-02 15:32:52 +02:00
sandra
8032825765 Merge branch 'master' of bscpm02.bsc.es:rarias/bsc-nixpkgs 2020-07-02 14:47:28 +02:00
sandra
2189436619 Saiph compilation details 2020-07-02 14:47:10 +02:00
61f055e258 Remove nix debug from nbody 2020-07-02 12:59:37 +02:00
9662ff4138 Test nbody with icc 2020-07-02 12:36:38 +02:00
1f36743459 Add intel compiler 2020-07-02 12:36:22 +02:00
9ca29d5cf8 Use autoPatchelfHook for Intel MPI 2020-07-01 17:57:31 +02:00
9d65f2ae2c Add icc bin to out dir 2020-07-01 13:08:05 +02:00
Rodrigo
61c799e7e4 Intel compiler stub 2020-07-01 10:25:33 +02:00
33a46f41ce Add support for mcc and clang in Intel mpicc 2020-06-30 15:41:18 +02:00
a1f33444b5 Testing saiph app 2020-06-30 12:19:36 +02:00
74222706bf Add Intel MPI 2020-06-29 20:46:30 +02:00
5064170b31 Add mcxx to nbody: now builds 2020-06-29 17:40:25 +02:00
3ddd1721f4 Use gcc9 to compile mcxx 2020-06-29 17:39:12 +02:00
71430b3552 Add mercurium mcxx compiler 2020-06-29 17:32:30 +02:00
19c18627be Update nanos6 to last release (not working) 2020-06-29 16:53:57 +02:00
d6093681cc Move cpic to apps directory 2020-06-29 16:53:37 +02:00
08a3512bf1 Add nbody package (not working yet) 2020-06-29 16:42:25 +02:00
9a5759c45e Update nanos6-git version 2020-06-29 16:41:17 +02:00
Rodrigo Arias
a4d20edd8b Update nanos6 git 2020-06-29 14:44:17 +02:00
Rodrigo Arias
bd9788961b Use autoreconfHook for TAMPI 2020-06-26 10:16:41 +02:00
Rodrigo Arias
67c692b648 Add test subset 2020-06-25 21:02:49 +02:00
Rodrigo Arias
a83627890e Place packages together 2020-06-25 20:43:35 +02:00
Rodrigo Arias
53aebe5846 Use new format for urls 2020-06-25 20:05:12 +02:00
Rodrigo Arias
6b5e5aafa9 Add patched nix for BeeGFS 2020-06-25 15:13:20 +02:00
Rodrigo Arias
6dc2f8045d Update nanos6 2020-06-17 17:10:41 +02:00
Rodrigo Arias
040f205538 Use cpic from git 2020-06-17 16:39:04 +02:00
Rodrigo Arias
210e705653 Quiet cpic compilation 2020-06-17 16:31:05 +02:00
Rodrigo Arias
f5484cf5c3 Clean unused derivations 2020-06-17 13:26:14 +02:00
Rodrigo Arias
57f09c1967 Ignore result build folder 2020-06-17 13:23:59 +02:00
Rodrigo Arias
86b4b016b2 Remove unused helpers 2020-06-17 13:23:29 +02:00
Rodrigo Arias
ed829aace0 Clean cpic dependencies 2020-06-17 13:22:06 +02:00
Rodrigo Arias
d9ec42614c Fix libstdc++.so path 2020-06-17 13:21:44 +02:00
Rodrigo Arias
19e4e12126 Working stdenv with clang+ompss2 2020-06-17 13:00:49 +02:00
Rodrigo Arias
a8523c4b4e Add sandbox build test 2020-06-16 15:39:11 +02:00
Rodrigo Arias
63c78f50de Fix OpenMPI and Extrae clash 2020-06-15 17:19:36 +02:00
Rodrigo Arias
cae91fdcc0 Dont strip cpic symbols 2020-06-15 12:58:27 +02:00
Rodrigo Arias
fbbdf0740a Fix TAMPI derivation 2020-06-15 12:45:16 +02:00
Rodrigo Arias
98b51cfa6d Update nanos6-git from upstream 2020-06-15 12:04:05 +02:00
Rodrigo Arias
5cec4b02de Merge branch 'master' of https://github.com/rodarima/bsc-nixpkgs 2020-06-15 12:03:19 +02:00
Rodrigo Arias
ebea6f1e81 Use nanos6 from git for cpic 2020-06-15 12:03:11 +02:00
rodarima
2feaafb104
Delete .git.nix.swp 2020-06-15 11:56:44 +02:00
Rodrigo Arias
3c2b7c163f cpic: Compilation ok but fails to run 2020-06-15 11:54:22 +02:00
Rodrigo Arias
a331ec5f14 Add mode packages and cpic app 2020-06-11 19:04:16 +02:00
Rodrigo
ceaf273219 Proper install phase for llvm-ompss2 2020-06-11 11:33:29 +02:00
Rodrigo
3805eb0ceb Experimental llvm derivation 2020-06-10 19:35:11 +02:00
Rodrigo
37b49e1dd3 Add nanos6 git version 2020-06-10 18:55:30 +02:00
Rodrigo
b600bb77d4 Compile extrae with clang 2020-06-10 14:28:10 +02:00
Rodrigo
20e3f4d4f0 Add compilers 2020-06-09 18:21:02 +02:00
Rodrigo
5a4068b497 Enable extrae mpi implementation input 2020-06-08 18:31:23 +02:00
Rodrigo
83770803e5 Initial test packages 2020-06-08 18:01:33 +02:00
278 changed files with 56091 additions and 470 deletions

20
.gitea/workflows/ci.yaml Normal file
View File

@ -0,0 +1,20 @@
name: CI
on:
push:
branches:
- master
pull_request:
branches:
- master
jobs:
build:all:
runs-on: native
steps:
- uses: https://gitea.com/ScMi1/checkout@v1.4
- run: nix build -L --no-link --print-out-paths .#bsc.ci.all
build:cross:
runs-on: native
steps:
- uses: https://gitea.com/ScMi1/checkout@v1.4
- run: nix build -L --no-link --print-out-paths .#bsc.ci.cross

3
.gitignore vendored Normal file
View File

@ -0,0 +1,3 @@
**.swp
/result
/misc

6
.gitlab-ci.yml Normal file
View File

@ -0,0 +1,6 @@
build:bsc-ci.all:
stage: build
tags:
- nix
script:
- nix build -L --no-link --print-out-paths .#bsc-ci.all

21
COPYING Normal file
View File

@ -0,0 +1,21 @@
Copyright (c) 2020-2025 Barcelona Supercomputing Center
Copyright (c) 2003-2020 Eelco Dolstra and the Nixpkgs/NixOS contributors
Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

9
README.md Normal file
View File

@ -0,0 +1,9 @@
# Jungle
This repository provides two components that can be used independently:
- A Nix overlay with packages used at BSC (formerly known as bscpkgs). Access
them directly with `nix shell .#<pkgname>`.
- NixOS configurations for jungle machines. Use `nixos-rebuild switch --flake .`
to upgrade the current machine.

19
default.nix Normal file
View File

@ -0,0 +1,19 @@
let
bscOverlay = import ./overlay.nix;
# read flake.lock and determine revision from there
lock = builtins.fromJSON (builtins.readFile ./flake.lock);
inherit (lock.nodes.nixpkgs.locked) rev narHash;
fetchedNixpkgs = builtins.fetchTarball {
url = "https://github.com/NixOS/nixpkgs/archive/${rev}.tar.gz";
sha256 = narHash;
};
in
{ overlays ? [ ]
, nixpkgs ? fetchedNixpkgs
, ...
}@attrs:
import nixpkgs (
(builtins.removeAttrs attrs [ "overlays" "nixpkgs" ]) //
{ overlays = [ bscOverlay ] ++ overlays; }
)

176
doc/install.md Normal file
View File

@ -0,0 +1,176 @@
# Installing NixOS in a new node
This article shows the steps to install NixOS in a node following the
configuration of the repo.
## Enable the serial console
By default, the nodes have the serial console disabled in the GRUB and also boot
without the serial enabled.
To enable the serial console in the GRUB, set in /etc/default/grub the following
lines:
```
GRUB_TERMINAL="console serial"
GRUB_SERIAL_COMMAND="serial --speed=115200 --unit=0 --word=8 --parity=no --stop=1"
```
To boot Linux with the serial enabled, so you can see the boot log and login via
serial set:
```
GRUB_CMDLINE_LINUX="console=ttyS0,115200n8 console=tty0"
```
Then update the grub config:
```
# grub2-mkconfig -o /boot/grub2/grub.cfg
```
And reboot.
## Prepare the disk
Create a main partition and label it `nixos` following [the manual][1].
[1]: https://nixos.org/manual/nixos/stable/index.html#sec-installation-manual-partitioning.
```
# disk=/dev/sdX
# parted $disk -- mklabel msdos
# parted $disk -- mkpart primary 1MB -8GB
# parted $disk -- mkpart primary linux-swap -8GB 100%
# parted $disk -- set 1 boot on
```
Then create an etx4 filesystem, labeled `nixos` where the system will be
installed. **Ensure that no other partition has the same label.**
```
# mkfs.ext4 -L nixos "${disk}1"
# mkswap -L swap "${disk}2"
# mount ${disk}1 /mnt
# lsblk -f $disk
NAME FSTYPE LABEL UUID MOUNTPOINT
sdX
`-sdX1 ext4 nixos 10d73b75-809c-4fa3-b99d-4fab2f0d0d8e /mnt
```
## Prepare nix and nixos-install
Mount the nix store from the hut node in read-only /nix.
```
# mkdir /nix
# mount -o ro hut:/nix /nix
```
Get the nix binary and nixos-install tool from hut:
```
# ssh hut 'readlink -f $(which nix)'
/nix/store/0sxbaj71c4c4n43qhdxm31f56gjalksw-nix-2.13.3/bin/nix
# ssh hut 'readlink -f $(which nixos-install)'
/nix/store/9yq8ps06ysr2pfiwiij39ny56yk3pdcs-nixos-install/bin/nixos-install
```
And add them to the PATH:
```
# export PATH=$PATH:/nix/store/0sxbaj71c4c4n43qhdxm31f56gjalksw-nix-2.13.3/bin
# export PATH=$PATH:/nix/store/9yq8ps06ysr2pfiwiij39ny56yk3pdcs-nixos-install/bin/
# nix --version
nix (Nix) 2.13.3
```
## Adapt owl configuration
Clone owl repo:
```
$ git clone git@bscpm03.bsc.es:rarias/owl.git
$ cd owl
```
Edit the configuration to your needs.
## Install from another Linux OS
Install nixOS into the storage drive.
```
# nixos-install --flake --root /mnt .#xeon0X
```
At this point, the nixOS grub has been installed into the nixos device, which
is not the default boot device. To keep both the old Linux and NixOS grubs, add
an entry into the old Linux grub to jump into the new grub.
```
# echo "
menuentry 'NixOS' {
insmod chain
search --no-floppy --label nixos --set root
configfile /boot/grub/grub.cfg
} " >> /etc/grub.d/40_custom
```
Rebuild grub config.
```
# grub2-mkconfig -o /boot/grub/grub.cfg
```
To boot into NixOS manually, reboot and select NixOS in the grub menu to boot
into NixOS.
To temporarily boot into NixOS only on the next reboot run:
```
# grub2-reboot 'NixOS'
```
To permanently boot into NixOS as the default boot OS, edit `/etc/default/grub/`:
```
GRUB_DEFAULT='NixOS'
```
And update grub.
```
# grub2-mkconfig -o /boot/grub/grub.cfg
```
## Build the nixos kexec image
```
# nix build .#nixosConfigurations.xeon02.config.system.build.kexecTree -v
```
## Chain NixOS in same disk with other systems
To install NixOS on a partition along another system which controls the GRUB,
first disable the grub device, so the GRUB is not installed in the disk by
NixOS (only the /boot files will be generated):
```
boot.loader.grub.device = "nodev";
```
Then add the following entry to the old GRUB configuration:
```
menuentry 'NixOS' {
insmod chain
search --no-floppy --label nixos --set root
configfile /boot/grub/grub.cfg
}
```
The partition with NixOS must have the label "nixos" for it to be found. New
system configuration entries will be stored in the GRUB configuration managed
by NixOS, so there is no need to change the old GRUB settings.

30
doc/maintainers.md Normal file
View File

@ -0,0 +1,30 @@
# Maintainers
## Role of a maintainer
The responsibilities of maintainers are quite lax, and similar in spirit to
[nixpkgs' maintainers][1]:
The main responsibility of a maintainer is to keep the packages they
maintain in a functioning state, and keep up with updates. In order to do
that, they are empowered to make decisions over the packages they maintain.
That being said, the maintainer is not alone in proposing changes to the
packages. Anybody (both bots and humans) can send PRs to bump or tweak the
package.
In practice, this means that when updating or proposing changes to a package,
we will notify maintainers by mentioning them in Gitea so they can test changes
and give feedback.
Since we do bi-yearly release cycles, there is no expectation from maintainers
to update packages at each upstream release. Nevertheless, on each release cycle
we may request help from maintainers when updating or testing their packages.
## Becoming a maintainer
You'll have to add yourself in the `maintainers.nix` list; your username should
match your `bsc.es` email. Then you can add yourself to the `meta.maintainers`
of any package you are interested in maintaining.
[1]: [https://github.com/NixOS/nixpkgs/tree/nixos-25.05/maintainers]

46
doc/trim.sh Executable file
View File

@ -0,0 +1,46 @@
#!/bin/sh
# Trims the jungle repository by moving the website to its own repository and
# removing it from jungle. It also removes big pdf files and kernel
# configurations so the jungle repository is small.
set -e
if [ -e oldjungle -o -e newjungle -o -e website ]; then
echo "remove oldjungle/, newjungle/ and website/ first"
exit 1
fi
# Clone the old jungle repo
git clone gitea@tent:rarias/jungle.git oldjungle
# First split the website into a new repository
mkdir website && git -C website init -b master
git-filter-repo \
--path web \
--subdirectory-filter web \
--source oldjungle \
--target website
# Then remove the website, pdf files and big kernel configs
mkdir newjungle && git -C newjungle init -b master
git-filter-repo \
--invert-paths \
--path web \
--path-glob 'doc*.pdf' \
--path-glob '**/kernel/configs/lockdep' \
--path-glob '**/kernel/configs/defconfig' \
--source oldjungle \
--target newjungle
set -x
du -sh oldjungle newjungle website
# 57M oldjungle
# 2,3M newjungle
# 6,4M website
du -sh --exclude=.git oldjungle newjungle website
# 30M oldjungle
# 700K newjungle
# 3,5M website

27
flake.lock generated Normal file
View File

@ -0,0 +1,27 @@
{
"nodes": {
"nixpkgs": {
"locked": {
"lastModified": 1752436162,
"narHash": "sha256-Kt1UIPi7kZqkSc5HVj6UY5YLHHEzPBkgpNUByuyxtlw=",
"owner": "NixOS",
"repo": "nixpkgs",
"rev": "dfcd5b901dbab46c9c6e80b265648481aafb01f8",
"type": "github"
},
"original": {
"owner": "NixOS",
"ref": "nixos-25.05",
"repo": "nixpkgs",
"type": "github"
}
},
"root": {
"inputs": {
"nixpkgs": "nixpkgs"
}
}
},
"root": "root",
"version": 7
}

52
flake.nix Normal file
View File

@ -0,0 +1,52 @@
{
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixos-25.05";
};
outputs = { self, nixpkgs, ... }:
let
mkConf = name: nixpkgs.lib.nixosSystem {
system = "x86_64-linux";
specialArgs = { inherit nixpkgs; theFlake = self; };
modules = [ "${self.outPath}/m/${name}/configuration.nix" ];
};
# For now we only support x86
system = "x86_64-linux";
pkgs = import nixpkgs {
inherit system;
overlays = [ self.overlays.default ];
config.allowUnfree = true;
};
in
{
nixosConfigurations = {
hut = mkConf "hut";
tent = mkConf "tent";
owl1 = mkConf "owl1";
owl2 = mkConf "owl2";
eudy = mkConf "eudy";
koro = mkConf "koro";
bay = mkConf "bay";
lake2 = mkConf "lake2";
raccoon = mkConf "raccoon";
fox = mkConf "fox";
apex = mkConf "apex";
weasel = mkConf "weasel";
};
bscOverlay = import ./overlay.nix;
overlays.default = self.bscOverlay;
# full nixpkgs with our overlay applied
legacyPackages.${system} = pkgs;
hydraJobs = self.legacyPackages.${system}.bsc.hydraJobs;
# propagate nixpkgs lib, so we can do bscpkgs.lib
lib = nixpkgs.lib // {
maintainers = nixpkgs.lib.maintainers // {
bsc = import ./pkgs/maintainers.nix;
};
};
};
}

37
keys.nix Normal file
View File

@ -0,0 +1,37 @@
# As agenix needs to parse the secrets from a standalone .nix file, we describe
# here all the public keys
rec {
hosts = {
hut = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICO7jIp6JRnRWTMDsTB/aiaICJCl4x8qmKMPSs4lCqP1 hut";
owl1 = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIMqMEXO0ApVsBA6yjmb0xP2kWyoPDIWxBB0Q3+QbHVhv owl1";
owl2 = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIHurEYpQzNHqWYF6B9Pd7W8UPgF3BxEg0BvSbsA7BAdK owl2";
eudy = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIL+WYPRRvZupqLAG0USKmd/juEPmisyyJaP8hAgYwXsG eudy";
koro = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIImiTFDbxyUYPumvm8C4mEnHfuvtBY1H8undtd6oDd67 koro";
bay = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICvGBzpRQKuQYHdlUQeAk6jmdbkrhmdLwTBqf3el7IgU bay";
lake2 = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAINo66//S1yatpQHE/BuYD/Gfq64TY7ZN5XOGXmNchiO0 lake2";
fox = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIDwItIk5uOJcQEVPoy/CVGRzfmE1ojrdDcI06FrU4NFT fox";
tent = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFAtTpHtdYoelbknD/IcfBlThwLKJv/dSmylOgpg3FRM tent";
apex = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIBvUFjSfoxXnKwXhEFXx5ckRKJ0oewJ82mRitSMNMKjh apex";
weasel = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFLJrQ8BF6KcweQV8pLkSbFT+tbDxSG9qxrdQE65zJZp weasel";
raccoon = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGNQttFvL0dNEyy7klIhLoK4xXOeM2/K9R7lPMTG3qvK raccoon";
};
hostGroup = with hosts; rec {
compute = [ owl1 owl2 fox raccoon ];
playground = [ eudy koro weasel ];
storage = [ bay lake2 ];
monitor = [ hut ];
login = [ apex ];
system = storage ++ monitor ++ login;
safe = system ++ compute;
all = safe ++ playground;
};
admins = {
"rarias@hut" = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIE1oZTPtlEXdGt0Ak+upeCIiBdaDQtcmuWoTUCVuSVIR rarias@hut";
"rarias@tent" = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIIwlWSBTZi74WTz5xn6gBvTmCoVltmtIAeM3RMmkh4QZ rarias@tent";
"rarias@fox" = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIDSbw3REAKECV7E2c/e2XJITudJQWq2qDSe2N1JHqHZd rarias@fox";
root = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIII/1TNArcwA6D47mgW4TArwlxQRpwmIGiZDysah40Gb root@hut";
};
}

69
m/apex/configuration.nix Normal file
View File

@ -0,0 +1,69 @@
{ lib, config, pkgs, ... }:
{
imports = [
../common/xeon.nix
../common/ssf/hosts.nix
../module/ceph.nix
../module/hut-substituter.nix
../module/slurm-server.nix
./nfs.nix
./wireguard.nix
];
# Don't install grub MBR for now
boot.loader.grub.device = "nodev";
boot.initrd.kernelModules = [
"megaraid_sas" # For HW RAID
];
environment.systemPackages = with pkgs; [
storcli # To manage HW RAID
];
fileSystems."/home" = {
device = "/dev/disk/by-label/home";
fsType = "ext4";
};
# No swap, there is plenty of RAM
swapDevices = lib.mkForce [];
networking = {
hostName = "apex";
defaultGateway = "84.88.53.233";
nameservers = [ "8.8.8.8" ];
# Public facing interface
interfaces.eno1.ipv4.addresses = [ {
address = "84.88.53.236";
prefixLength = 29;
} ];
# Internal LAN to our Ethernet switch
interfaces.eno2.ipv4.addresses = [ {
address = "10.0.40.30";
prefixLength = 24;
} ];
# Infiniband over Omnipath switch (disconnected for now)
# interfaces.ibp5s0 = {};
nat = {
enable = true;
internalInterfaces = [ "eno2" ];
externalInterface = "eno1";
};
};
networking.firewall = {
extraCommands = ''
# Blackhole BSC vulnerability scanner (OpenVAS) as it is spamming our
# logs. Insert as first position so we also protect SSH.
iptables -I nixos-fw 1 -p tcp -s 192.168.8.16 -j nixos-fw-refuse
# Same with opsmonweb01.bsc.es which seems to be trying to access via SSH
iptables -I nixos-fw 2 -p tcp -s 84.88.52.176 -j nixos-fw-refuse
'';
};
}

48
m/apex/nfs.nix Normal file
View File

@ -0,0 +1,48 @@
{ ... }:
{
services.nfs.server = {
enable = true;
lockdPort = 4001;
mountdPort = 4002;
statdPort = 4000;
exports = ''
/home 10.0.40.0/24(rw,async,no_subtree_check,no_root_squash)
/home 10.106.0.0/24(rw,async,no_subtree_check,no_root_squash)
'';
};
networking.firewall = {
# Check with `rpcinfo -p`
extraCommands = ''
# Accept NFS traffic from compute nodes but not from the outside
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 111 -j nixos-fw-accept
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 2049 -j nixos-fw-accept
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 4000 -j nixos-fw-accept
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 4001 -j nixos-fw-accept
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 4002 -j nixos-fw-accept
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 20048 -j nixos-fw-accept
# Same but UDP
iptables -A nixos-fw -p udp -s 10.0.40.0/24 --dport 111 -j nixos-fw-accept
iptables -A nixos-fw -p udp -s 10.0.40.0/24 --dport 2049 -j nixos-fw-accept
iptables -A nixos-fw -p udp -s 10.0.40.0/24 --dport 4000 -j nixos-fw-accept
iptables -A nixos-fw -p udp -s 10.0.40.0/24 --dport 4001 -j nixos-fw-accept
iptables -A nixos-fw -p udp -s 10.0.40.0/24 --dport 4002 -j nixos-fw-accept
iptables -A nixos-fw -p udp -s 10.0.40.0/24 --dport 20048 -j nixos-fw-accept
# Accept NFS traffic from wg0
iptables -A nixos-fw -p tcp -i wg0 -s 10.106.0.0/24 --dport 111 -j nixos-fw-accept
iptables -A nixos-fw -p tcp -i wg0 -s 10.106.0.0/24 --dport 2049 -j nixos-fw-accept
iptables -A nixos-fw -p tcp -i wg0 -s 10.106.0.0/24 --dport 4000 -j nixos-fw-accept
iptables -A nixos-fw -p tcp -i wg0 -s 10.106.0.0/24 --dport 4001 -j nixos-fw-accept
iptables -A nixos-fw -p tcp -i wg0 -s 10.106.0.0/24 --dport 4002 -j nixos-fw-accept
iptables -A nixos-fw -p tcp -i wg0 -s 10.106.0.0/24 --dport 20048 -j nixos-fw-accept
# Same but UDP
iptables -A nixos-fw -p udp -i wg0 -s 10.106.0.0/24 --dport 111 -j nixos-fw-accept
iptables -A nixos-fw -p udp -i wg0 -s 10.106.0.0/24 --dport 2049 -j nixos-fw-accept
iptables -A nixos-fw -p udp -i wg0 -s 10.106.0.0/24 --dport 4000 -j nixos-fw-accept
iptables -A nixos-fw -p udp -i wg0 -s 10.106.0.0/24 --dport 4001 -j nixos-fw-accept
iptables -A nixos-fw -p udp -i wg0 -s 10.106.0.0/24 --dport 4002 -j nixos-fw-accept
iptables -A nixos-fw -p udp -i wg0 -s 10.106.0.0/24 --dport 20048 -j nixos-fw-accept
'';
};
}

42
m/apex/wireguard.nix Normal file
View File

@ -0,0 +1,42 @@
{ config, ... }:
{
networking.firewall = {
allowedUDPPorts = [ 666 ];
};
age.secrets.wgApex.file = ../../secrets/wg-apex.age;
# Enable WireGuard
networking.wireguard.enable = true;
networking.wireguard.interfaces = {
# "wg0" is the network interface name. You can name the interface arbitrarily.
wg0 = {
ips = [ "10.106.0.30/24" ];
listenPort = 666;
privateKeyFile = config.age.secrets.wgApex.path;
# Public key: VwhcN8vSOzdJEotQTpmPHBC52x3Hbv1lkFIyKubrnUA=
peers = [
{
name = "fox";
publicKey = "VfMPBQLQTKeyXJSwv8wBhc6OV0j2qAxUpX3kLHunK2Y=";
allowedIPs = [ "10.106.0.1/32" ];
endpoint = "fox.ac.upc.edu:666";
# Send keepalives every 25 seconds. Important to keep NAT tables alive.
persistentKeepalive = 25;
}
{
name = "raccoon";
publicKey = "QUfnGXSMEgu2bviglsaSdCjidB51oEDBFpnSFcKGfDI=";
allowedIPs = [ "10.106.0.236/32" "192.168.0.0/16" "10.0.44.0/24" ];
}
];
};
};
networking.hosts = {
"10.106.0.1" = [ "fox" ];
"10.106.0.236" = [ "raccoon" ];
"10.0.44.4" = [ "tent" ];
};
}

108
m/bay/configuration.nix Normal file
View File

@ -0,0 +1,108 @@
{ config, pkgs, lib, ... }:
{
imports = [
../common/ssf.nix
../module/hut-substituter.nix
../module/monitoring.nix
];
# Select the this using the ID to avoid mismatches
boot.loader.grub.device = "/dev/disk/by-id/wwn-0x55cd2e414d53562d";
boot.kernel.sysctl = {
"kernel.yama.ptrace_scope" = lib.mkForce "1";
};
environment.systemPackages = with pkgs; [
ceph
];
networking = {
hostName = "bay";
interfaces.eno1.ipv4.addresses = [ {
address = "10.0.40.40";
prefixLength = 24;
} ];
interfaces.ibp5s0.ipv4.addresses = [ {
address = "10.0.42.40";
prefixLength = 24;
} ];
firewall = {
extraCommands = ''
# Accept all incoming TCP traffic from lake2
iptables -A nixos-fw -p tcp -s lake2 -j nixos-fw-accept
# Accept monitoring requests from hut
iptables -A nixos-fw -p tcp -s hut -m multiport --dport 9283,9002 -j nixos-fw-accept
# Accept all Ceph traffic from the local network
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 -m multiport --dport 3300,6789,6800:7568 -j nixos-fw-accept
'';
};
};
services.ceph = {
enable = true;
global = {
fsid = "9c8d06e0-485f-4aaf-b16b-06d6daf1232b";
monHost = "10.0.40.40";
monInitialMembers = "bay";
clusterNetwork = "10.0.40.40/24"; # Use Ethernet only
};
extraConfig = {
# Only log to stderr so it appears in the journal
"log_file" = "/dev/null";
"mon_cluster_log_file" = "/dev/null";
"log_to_stderr" = "true";
"err_to_stderr" = "true";
"log_to_file" = "false";
};
mds = {
enable = true;
daemons = [ "mds0" "mds1" ];
extraConfig = {
"host" = "bay";
};
};
mgr = {
enable = true;
daemons = [ "bay" ];
};
mon = {
enable = true;
daemons = [ "bay" ];
};
osd = {
enable = true;
# One daemon per NVME disk
daemons = [ "0" "1" "2" "3" ];
extraConfig = {
"osd crush chooseleaf type" = "0";
"osd journal size" = "10000";
"osd pool default min size" = "2";
"osd pool default pg num" = "200";
"osd pool default pgp num" = "200";
"osd pool default size" = "3";
};
};
};
# Missing service for volumes, see:
# https://www.reddit.com/r/ceph/comments/14otjyo/comment/jrd69vt/
systemd.services.ceph-volume = {
enable = true;
description = "Ceph Volume activation";
unitConfig = {
Type = "oneshot";
After = "local-fs.target";
Wants = "local-fs.target";
};
path = [ pkgs.ceph pkgs.util-linux pkgs.lvm2 pkgs.cryptsetup ];
serviceConfig = {
KillMode = "none";
Environment = "CEPH_VOLUME_TIMEOUT=10000";
ExecStart = "/bin/sh -c 'timeout $CEPH_VOLUME_TIMEOUT ${pkgs.ceph}/bin/ceph-volume lvm activate --all --no-systemd'";
TimeoutSec = "0";
};
wantedBy = [ "multi-user.target" ];
};
}

22
m/common/base.nix Normal file
View File

@ -0,0 +1,22 @@
{
# All machines should include this profile.
# Includes the basic configuration for an Intel server.
imports = [
./base/agenix.nix
./base/always-power-on.nix
./base/august-shutdown.nix
./base/boot.nix
./base/env.nix
./base/fs.nix
./base/hw.nix
./base/net.nix
./base/nix.nix
./base/sys-devices.nix
./base/ntp.nix
./base/rev.nix
./base/ssh.nix
./base/users.nix
./base/watchdog.nix
./base/zsh.nix
];
}

8
m/common/base/agenix.nix Normal file
View File

@ -0,0 +1,8 @@
{ pkgs, ... }:
{
imports = [ ../../module/agenix.nix ];
# Add agenix to system packages
environment.systemPackages = [ pkgs.agenix ];
}

View File

@ -0,0 +1,8 @@
{
imports = [
../../module/power-policy.nix
];
# Turn on as soon as we have power
power.policy = "always-on";
}

View File

@ -0,0 +1,14 @@
{
# Shutdown all machines on August 3rd at 22:00, so we can protect the
# hardware from spurious electrical peaks on the yearly electrical cut for
# manteinance that starts on August 4th.
systemd.timers.august-shutdown = {
description = "Shutdown on August 3rd for maintenance";
wantedBy = [ "timers.target" ];
timerConfig = {
OnCalendar = "*-08-03 22:00:00";
RandomizedDelaySec = "10min";
Unit = "systemd-poweroff.service";
};
};
}

37
m/common/base/boot.nix Normal file
View File

@ -0,0 +1,37 @@
{ lib, pkgs, ... }:
{
# Use the GRUB 2 boot loader.
boot.loader.grub.enable = true;
# Enable GRUB2 serial console
boot.loader.grub.extraConfig = ''
serial --unit=0 --speed=115200 --word=8 --parity=no --stop=1
terminal_input --append serial
terminal_output --append serial
'';
boot.kernel.sysctl = {
"kernel.perf_event_paranoid" = lib.mkDefault "-1";
# Allow ptracing (i.e. attach with GDB) any process of the same user, see:
# https://www.kernel.org/doc/Documentation/security/Yama.txt
"kernel.yama.ptrace_scope" = "0";
};
boot.kernelPackages = pkgs.linuxPackages_latest;
#boot.kernelPatches = lib.singleton {
# name = "osnoise-tracer";
# patch = null;
# extraStructuredConfig = with lib.kernel; {
# OSNOISE_TRACER = yes;
# HWLAT_TRACER = yes;
# };
#};
boot.initrd.availableKernelModules = [ "ahci" "xhci_pci" "ehci_pci" "nvme" "usbhid" "sd_mod" ];
boot.initrd.kernelModules = [ ];
boot.kernelModules = [ "kvm-intel" ];
boot.extraModulePackages = [ ];
}

37
m/common/base/env.nix Normal file
View File

@ -0,0 +1,37 @@
{ pkgs, config, ... }:
{
environment.systemPackages = with pkgs; [
vim wget git htop tmux pciutils tcpdump ripgrep nix-index nixos-option
nix-diff ipmitool freeipmi ethtool lm_sensors cmake gnumake file tree
ncdu config.boot.kernelPackages.perf ldns pv
# From bsckgs overlay
osumb
];
programs.direnv.enable = true;
# Increase limits
security.pam.loginLimits = [
{
domain = "*";
type = "-";
item = "memlock";
value = "1048576"; # 1 GiB of mem locked
}
];
environment.enableAllTerminfo = true;
environment.variables = {
EDITOR = "vim";
VISUAL = "vim";
};
programs.bash.promptInit = ''
PS1="\h\\$ "
'';
time.timeZone = "Europe/Madrid";
i18n.defaultLocale = "en_DK.UTF-8";
}

24
m/common/base/fs.nix Normal file
View File

@ -0,0 +1,24 @@
{ ... }:
{
fileSystems."/" =
{ device = "/dev/disk/by-label/nixos";
fsType = "ext4";
};
# Trim unused blocks weekly
services.fstrim.enable = true;
swapDevices =
[ { device = "/dev/disk/by-label/swap"; }
];
# Tracing
fileSystems."/sys/kernel/tracing" = {
device = "none";
fsType = "tracefs";
};
# Mount a tmpfs into /tmp
boot.tmp.useTmpfs = true;
}

14
m/common/base/hw.nix Normal file
View File

@ -0,0 +1,14 @@
# Do not modify this file! It was generated by nixos-generate-config
# and may be overwritten by future invocations. Please make changes
# to /etc/nixos/configuration.nix instead.
{ config, lib, pkgs, modulesPath, ... }:
{
imports =
[ (modulesPath + "/installer/scan/not-detected.nix")
];
nixpkgs.hostPlatform = lib.mkDefault "x86_64-linux";
powerManagement.cpuFreqGovernor = lib.mkDefault "powersave";
hardware.cpu.intel.updateMicrocode = lib.mkDefault config.hardware.enableRedistributableFirmware;
}

23
m/common/base/net.nix Normal file
View File

@ -0,0 +1,23 @@
{ pkgs, lib, ... }:
{
networking = {
enableIPv6 = false;
useDHCP = false;
firewall = {
enable = true;
allowedTCPPorts = [ 22 ];
};
# Make sure we use iptables
nftables.enable = lib.mkForce false;
hosts = {
"84.88.53.236" = [ "ssfhead.bsc.es" "ssfhead" ];
"84.88.51.142" = [ "raccoon-ipmi" ];
"192.168.11.12" = [ "bscpm04.bsc.es" ];
"192.168.11.15" = [ "gitlab-internal.bsc.es" ];
};
};
}

59
m/common/base/nix.nix Normal file
View File

@ -0,0 +1,59 @@
{ pkgs, nixpkgs, theFlake, ... }:
{
nixpkgs.overlays = [
(import ../../../overlay.nix)
];
nixpkgs.config.allowUnfree = true;
nix = {
nixPath = [
"nixpkgs=${nixpkgs}"
"jungle=${theFlake.outPath}"
];
registry = {
nixpkgs.flake = nixpkgs;
jungle.flake = theFlake;
};
settings = {
experimental-features = [ "nix-command" "flakes" ];
sandbox = "relaxed";
trusted-users = [ "@wheel" ];
flake-registry = pkgs.writeText "global-registry.json"
''{"flakes":[],"version":2}'';
keep-outputs = true;
};
gc = {
automatic = true;
dates = "weekly";
options = "--delete-older-than 30d";
};
};
# The nix-gc.service can begin its execution *before* /home is mounted,
# causing it to remove all gcroots considering them as stale, as it cannot
# access the symlink. To prevent this problem, we force the service to wait
# until /home is mounted as well as other remote FS like /ceph.
systemd.services.nix-gc = {
# Start remote-fs.target if not already being started and fail if it fails
# to start. It will also be stopped if the remote-fs.target fails after
# starting successfully.
bindsTo = [ "remote-fs.target" ];
# Wait until remote-fs.target fully starts before starting this one.
after = [ "remote-fs.target"];
# Ensure we can access a remote path inside /home
unitConfig.ConditionPathExists = "/home/Computational";
};
# This value determines the NixOS release from which the default
# settings for stateful data, like file locations and database versions
# on your system were taken. Its perfectly fine and recommended to leave
# this value at the release version of the first install of this system.
# Before changing this value read the documentation for this option
# (e.g. man configuration.nix or on https://nixos.org/nixos/options.html).
system.stateVersion = "22.11"; # Did you read the comment?
}

9
m/common/base/ntp.nix Normal file
View File

@ -0,0 +1,9 @@
{ pkgs, ... }:
{
services.ntp.enable = true;
# Use the NTP server at BSC, as we don't have direct access
# to the outside world
networking.timeServers = [ "84.88.52.36" ];
}

21
m/common/base/rev.nix Normal file
View File

@ -0,0 +1,21 @@
{ theFlake, ... }:
let
# Prevent building a configuration without revision
rev = if theFlake ? rev then theFlake.rev
else throw ("Refusing to build from a dirty Git tree!");
in {
# Save the commit of the config in /etc/configrev
environment.etc.configrev.text = rev + "\n";
# Keep a log with the config over time
system.activationScripts.configRevLog.text = ''
BOOTED=$(cat /run/booted-system/etc/configrev 2>/dev/null || echo unknown)
CURRENT=$(cat /run/current-system/etc/configrev 2>/dev/null || echo unknown)
NEXT=${rev}
DATENOW=$(date --iso-8601=seconds)
echo "$DATENOW booted=$BOOTED current=$CURRENT next=$NEXT" >> /var/configrev.log
'';
system.configurationRevision = rev;
}

18
m/common/base/ssh.nix Normal file
View File

@ -0,0 +1,18 @@
{ lib, ... }:
let
keys = import ../../../keys.nix;
hostsKeys = lib.mapAttrs (name: value: { publicKey = value; }) keys.hosts;
in
{
# Enable the OpenSSH daemon.
services.openssh.enable = true;
programs.ssh.knownHosts = hostsKeys // {
"gitlab-internal.bsc.es".publicKey = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIF9arsAOSRB06hdy71oTvJHG2Mg8zfebADxpvc37lZo3";
"bscpm03.bsc.es".publicKey = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIM2NuSUPsEhqz1j5b4Gqd+MWFnRqyqY57+xMvBUqHYUS";
"bscpm04.bsc.es".publicKey = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPx4mC0etyyjYUT2Ztc/bs4ZXSbVMrogs1ZTP924PDgT";
"glogin1.bsc.es".publicKey = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFsHsZGCrzpd4QDVn5xoDOtrNBkb0ylxKGlyBt6l9qCz";
"glogin2.bsc.es".publicKey = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFsHsZGCrzpd4QDVn5xoDOtrNBkb0ylxKGlyBt6l9qCz";
};
}

View File

@ -0,0 +1,9 @@
{
nix.settings.system-features = [ "sys-devices" ];
programs.nix-required-mounts.enable = true;
programs.nix-required-mounts.allowedPatterns.sys-devices.paths = [
"/sys/devices/system/cpu"
"/sys/devices/system/node"
];
}

203
m/common/base/users.nix Normal file
View File

@ -0,0 +1,203 @@
{ pkgs, ... }:
{
imports = [
../../module/jungle-users.nix
];
users = {
mutableUsers = false;
users = {
# Generate hashedPassword with `mkpasswd -m sha-512`
root.openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIKBOf4r4lzQfyO0bx5BaREePREw8Zw5+xYgZhXwOZoBO ram@hop"
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAINa0tvnNgwkc5xOwd6xTtaIdFi5jv0j2FrE7jl5MTLoE ram@mio"
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIF3zeB5KSimMBAjvzsp1GCkepVaquVZGPYwRIzyzaCba aleix@bsc"
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIII/1TNArcwA6D47mgW4TArwlxQRpwmIGiZDysah40Gb root@hut"
];
rarias = {
uid = 1880;
isNormalUser = true;
linger = true;
home = "/home/Computational/rarias";
description = "Rodrigo Arias";
group = "Computational";
extraGroups = [ "wheel" ];
hashedPassword = "$6$u06tkCy13enReBsb$xiI.twRvvTfH4jdS3s68NZ7U9PSbGKs5.LXU/UgoawSwNWhZo2hRAjNL5qG0/lAckzcho2LjD0r3NfVPvthY6/";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIKBOf4r4lzQfyO0bx5BaREePREw8Zw5+xYgZhXwOZoBO ram@hop"
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAINa0tvnNgwkc5xOwd6xTtaIdFi5jv0j2FrE7jl5MTLoE ram@mio"
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGYcXIxe0poOEGLpk8NjiRozls7fMRX0N3j3Ar94U+Gl rarias@hal"
];
shell = pkgs.zsh;
};
arocanon = {
uid = 1042;
isNormalUser = true;
home = "/home/Computational/arocanon";
description = "Aleix Roca";
group = "Computational";
extraGroups = [ "wheel" "tracing" ];
hashedPassword = "$6$hliZiW4tULC/tH7p$pqZarwJkNZ7vS0G5llWQKx08UFG9DxDYgad7jplMD8WkZh5k58i4dfPoWtnEShfjTO6JHiIin05ny5lmSXzGM/";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIF3zeB5KSimMBAjvzsp1GCkepVaquVZGPYwRIzyzaCba aleix@bsc"
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGdphWxLAEekicZ/WBrvP7phMyxKSSuLAZBovNX+hZXQ aleix@kerneland"
];
};
};
jungleUsers = {
rpenacob = {
uid = 2761;
isNormalUser = true;
home = "/home/Computational/rpenacob";
description = "Raúl Peñacoba";
group = "Computational";
hosts = [ "apex" "owl1" "owl2" "hut" "tent" "fox" ];
hashedPassword = "$6$TZm3bDIFyPrMhj1E$uEDXoYYd1z2Wd5mMPfh3DZAjP7ztVjJ4ezIcn82C0ImqafPA.AnTmcVftHEzLB3tbe2O4SxDyPSDEQgJ4GOtj/";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFYfXg37mauGeurqsLpedgA2XQ9d4Nm0ZGo/hI1f7wwH rpenacob@bsc"
];
};
anavarro = {
uid = 1037;
isNormalUser = true;
home = "/home/Computational/anavarro";
description = "Antoni Navarro";
group = "Computational";
hosts = [ "apex" "hut" "tent" "raccoon" "fox" "weasel" ];
hashedPassword = "$6$EgturvVYXlKgP43g$gTN78LLHIhaF8hsrCXD.O6mKnZSASWSJmCyndTX8QBWT6wTlUhcWVAKz65lFJPXjlJA4u7G1ydYQ0GG6Wk07b1";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIMsbM21uepnJwPrRe6jYFz8zrZ6AYMtSEvvt4c9spmFP toni@delltoni"
];
};
abonerib = {
uid = 4541;
isNormalUser = true;
home = "/home/Computational/abonerib";
description = "Aleix Boné";
group = "Computational";
hosts = [ "apex" "owl1" "owl2" "hut" "tent" "raccoon" "fox" "weasel" ];
hashedPassword = "$6$V1EQWJr474whv7XJ$OfJ0wueM2l.dgiJiiah0Tip9ITcJ7S7qDvtSycsiQ43QBFyP4lU0e0HaXWps85nqB4TypttYR4hNLoz3bz662/";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIIIFiqXqt88VuUfyANkZyLJNiuroIITaGlOOTMhVDKjf abonerib@bsc"
];
};
vlopez = {
uid = 4334;
isNormalUser = true;
home = "/home/Computational/vlopez";
description = "Victor López";
group = "Computational";
hosts = [ "apex" "koro" ];
hashedPassword = "$6$0ZBkgIYE/renVqtt$1uWlJsb0FEezRVNoETTzZMx4X2SvWiOsKvi0ppWCRqI66S6TqMBXBdP4fcQyvRRBt0e4Z7opZIvvITBsEtO0f0";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGMwlUZRf9jfG666Qa5Sb+KtEhXqkiMlBV2su3x/dXHq victor@arch"
];
};
dbautist = {
uid = 5649;
isNormalUser = true;
home = "/home/Computational/dbautist";
description = "Dylan Bautista Cases";
group = "Computational";
hosts = [ "apex" "hut" "tent" "raccoon" ];
hashedPassword = "$6$a2lpzMRVkG9nSgIm$12G6.ka0sFX1YimqJkBAjbvhRKZ.Hl090B27pdbnQOW0wzyxVWySWhyDDCILjQELky.HKYl9gqOeVXW49nW7q/";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAb+EQBoS98zrCwnGKkHKwMLdYABMTqv7q9E0+T0QmkS dbautist@bsc-848818791"
];
};
dalvare1 = {
uid = 2758;
isNormalUser = true;
home = "/home/Computational/dalvare1";
description = "David Álvarez";
group = "Computational";
hosts = [ "apex" "hut" "tent" "fox" ];
hashedPassword = "$6$mpyIsV3mdq.rK8$FvfZdRH5OcEkUt5PnIUijWyUYZvB1SgeqxpJ2p91TTe.3eQIDTcLEQ5rxeg.e5IEXAZHHQ/aMsR5kPEujEghx0";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGEfy6F4rF80r4Cpo2H5xaWqhuUZzUsVsILSKGJzt5jF dalvare1@ssfhead"
];
};
varcila = {
uid = 5650;
isNormalUser = true;
home = "/home/Computational/varcila";
description = "Vincent Arcila";
group = "Computational";
hosts = [ "apex" "hut" "tent" "fox" ];
hashedPassword = "$6$oB0Tcn99DcM4Ch$Vn1A0ulLTn/8B2oFPi9wWl/NOsJzaFAWjqekwcuC9sMC7cgxEVb.Nk5XSzQ2xzYcNe5MLtmzkVYnRS1CqP39Y0";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIKGt0ESYxekBiHJQowmKpfdouw0hVm3N7tUMtAaeLejK vincent@varch"
];
};
pmartin1 = {
# Arbitrary UID but large so it doesn't collide with other users on ssfhead.
uid = 9652;
isNormalUser = true;
home = "/home/Computational/pmartin1";
description = "Pedro J. Martinez-Ferrer";
group = "Computational";
hosts = [ "fox" ];
hashedPassword = "$6$nIgDMGnt4YIZl3G.$.JQ2jXLtDPRKsbsJfJAXdSvjDIzRrg7tNNjPkLPq3KJQhMjfDXRUvzagUHUU2TrE2hHM8/6uq8ex0UdxQ0ysl.";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIIV5LEAII5rfe1hYqDYIIrhb1gOw7RcS1p2mhOTqG+zc pedro@pedro-ThinkPad-P14s-Gen-2a"
];
};
csiringo = {
uid = 9653;
isNormalUser = true;
home = "/home/Computational/csiringo";
description = "Cesare Siringo";
group = "Computational";
hosts = [ ];
hashedPassword = "$6$0IsZlju8jFukLlAw$VKm0FUXbS.mVmPm3rcJeizTNU4IM5Nmmy21BvzFL.cQwvlGwFI1YWRQm6gsbd4nbg47mPDvYkr/ar0SlgF6GO1";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIHA65zvvG50iuFEMf+guRwZB65jlGXfGLF4HO+THFaed csiringo@bsc.es"
];
};
acinca = {
uid = 9654;
isNormalUser = true;
home = "/home/Computational/acinca";
description = "Arnau Cinca";
group = "Computational";
hosts = [ "apex" "hut" "fox" "owl1" "owl2" ];
hashedPassword = "$6$S6PUeRpdzYlidxzI$szyvWejQ4hEN76yBYhp1diVO5ew1FFg.cz4lKiXt2Idy4XdpifwrFTCIzLTs5dvYlR62m7ekA5MrhcVxR5F/q/";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFmMqKqPg4uocNOr3O41kLbZMOMJn3m2ZdN1JvTR96z3 bsccns@arnau-bsc"
];
};
aaguirre = {
uid = 9655;
isNormalUser = true;
home = "/home/Computational/aaguirre";
description = "Alejandro Aguirre";
group = "Computational";
hosts = [ "apex" "hut" ];
hashedPassword = "$6$TXRXQT6jjBvxkxU6$E.sh5KspAm1qeG5Ct7OPHpo8REmbGDwjFGvqeGgTVz3GASGOAnPL7UMZsMAsAKBoahOw.v8LNno6XGrTEPzZH1";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOlRX7ZCnqtUJYCxKgWmgSrFCYuA2LHY96rVwqxXPl86 aaguirre@BSC-8488184117"
];
};
};
groups = {
Computational = { gid = 564; };
tracing = { };
};
};
}

View File

@ -0,0 +1,9 @@
{ ... }:
{
# The boards have a BMC watchdog controlled by IPMI
boot.kernelModules = [ "ipmi_watchdog" ];
# Enable systemd watchdog with 30 s interval
systemd.watchdog.runtimeTime = "30s";
}

91
m/common/base/zsh.nix Normal file
View File

@ -0,0 +1,91 @@
{ pkgs, ... }:
{
environment.systemPackages = with pkgs; [
zsh-completions
nix-zsh-completions
];
programs.zsh = {
enable = true;
histSize = 1000000;
shellInit = ''
# Disable new user prompt
if [ ! -e ~/.zshrc ]; then
touch ~/.zshrc
fi
'';
promptInit = ''
# Note that to manually override this in ~/.zshrc you should run `prompt off`
# before setting your PS1 and etc. Otherwise this will likely to interact with
# your ~/.zshrc configuration in unexpected ways as the default prompt sets
# a lot of different prompt variables.
autoload -U promptinit && promptinit && prompt default && setopt prompt_sp
'';
# Taken from Ulli Kehrle config:
# https://git.hrnz.li/Ulli/nixos/src/commit/2e203b8d8d671f4e3ced0f1744a51d5c6ee19846/profiles/shell.nix#L199-L205
interactiveShellInit = ''
source "${pkgs.zsh-history-substring-search}/share/zsh-history-substring-search/zsh-history-substring-search.zsh"
# Save history immediately, but only load it when the shell starts
setopt inc_append_history
# dircolors doesn't support alacritty:
# https://lists.gnu.org/archive/html/bug-coreutils/2019-05/msg00029.html
export LS_COLORS='rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=00:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.avif=01;35:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:*~=00;90:*#=00;90:*.bak=00;90:*.old=00;90:*.orig=00;90:*.part=00;90:*.rej=00;90:*.swp=00;90:*.tmp=00;90:*.dpkg-dist=00;90:*.dpkg-old=00;90:*.ucf-dist=00;90:*.ucf-new=00;90:*.ucf-old=00;90:*.rpmnew=00;90:*.rpmorig=00;90:*.rpmsave=00;90:';
# From Arch Linux and GRML
bindkey "^R" history-incremental-pattern-search-backward
bindkey "^S" history-incremental-pattern-search-forward
# Auto rehash for new binaries
zstyle ':completion:*' rehash true
# show a nice menu with the matches
zstyle ':completion:*' menu yes select
bindkey '^[OA' history-substring-search-up # Up
bindkey '^[[A' history-substring-search-up # Up
bindkey '^[OB' history-substring-search-down # Down
bindkey '^[[B' history-substring-search-down # Down
bindkey '\e[1~' beginning-of-line # Home
bindkey '\e[7~' beginning-of-line # Home
bindkey '\e[H' beginning-of-line # Home
bindkey '\eOH' beginning-of-line # Home
bindkey '\e[4~' end-of-line # End
bindkey '\e[8~' end-of-line # End
bindkey '\e[F ' end-of-line # End
bindkey '\eOF' end-of-line # End
bindkey '^?' backward-delete-char # Backspace
bindkey '\e[3~' delete-char # Del
# bindkey '\e[3;5~' delete-char # sometimes Del, sometimes C-Del
bindkey '\e[2~' overwrite-mode # Ins
bindkey '^H' backward-kill-word # C-Backspace
bindkey '5~' kill-word # C-Del
bindkey '^[[3;5~' kill-word # C-Del
bindkey '^[[3^' kill-word # C-Del
bindkey "^[[1;5H" backward-kill-line # C-Home
bindkey "^[[7^" backward-kill-line # C-Home
bindkey "^[[1;5F" kill-line # C-End
bindkey "^[[8^" kill-line # C-End
bindkey '^[[1;5C' forward-word # C-Right
bindkey '^[0c' forward-word # C-Right
bindkey '^[[5C' forward-word # C-Right
bindkey '^[[1;5D' backward-word # C-Left
bindkey '^[0d' backward-word # C-Left
bindkey '^[[5D' backward-word # C-Left
'';
};
}

10
m/common/ssf.nix Normal file
View File

@ -0,0 +1,10 @@
{
# Provides the base system for a xeon node in the SSF rack.
imports = [
./xeon.nix
./ssf/fs.nix
./ssf/hosts.nix
./ssf/hosts-remote.nix
./ssf/net.nix
];
}

View File

@ -1,5 +1,3 @@
{ ... }:
{ {
# Mount the home via NFS # Mount the home via NFS
fileSystems."/home" = { fileSystems."/home" = {
@ -7,10 +5,4 @@
fsType = "nfs"; fsType = "nfs";
options = [ "nfsvers=3" "rsize=1024" "wsize=1024" "cto" "nofail" ]; options = [ "nfsvers=3" "rsize=1024" "wsize=1024" "cto" "nofail" ];
}; };
# Tracing
fileSystems."/sys/kernel/tracing" = {
device = "none";
fsType = "tracefs";
};
} }

View File

@ -0,0 +1,9 @@
{ pkgs, ... }:
{
networking.hosts = {
# Remote hosts visible from compute nodes
"10.106.0.236" = [ "raccoon" ];
"10.0.44.4" = [ "tent" ];
};
}

23
m/common/ssf/hosts.nix Normal file
View File

@ -0,0 +1,23 @@
{ pkgs, ... }:
{
networking.hosts = {
# Login
"10.0.40.30" = [ "apex" ];
# Storage
"10.0.40.40" = [ "bay" ]; "10.0.42.40" = [ "bay-ib" ]; "10.0.40.141" = [ "bay-ipmi" ];
"10.0.40.41" = [ "oss01" ]; "10.0.42.41" = [ "oss01-ib0" ]; "10.0.40.142" = [ "oss01-ipmi" ];
"10.0.40.42" = [ "lake2" ]; "10.0.42.42" = [ "lake2-ib" ]; "10.0.40.143" = [ "lake2-ipmi" ];
# Xeon compute
"10.0.40.1" = [ "owl1" ]; "10.0.42.1" = [ "owl1-ib" ]; "10.0.40.101" = [ "owl1-ipmi" ];
"10.0.40.2" = [ "owl2" ]; "10.0.42.2" = [ "owl2-ib" ]; "10.0.40.102" = [ "owl2-ipmi" ];
"10.0.40.3" = [ "xeon03" ]; "10.0.42.3" = [ "xeon03-ib" ]; "10.0.40.103" = [ "xeon03-ipmi" ];
#"10.0.40.4" = [ "tent" ]; "10.0.42.4" = [ "tent-ib" ]; "10.0.40.104" = [ "tent-ipmi" ];
"10.0.40.5" = [ "koro" ]; "10.0.42.5" = [ "koro-ib" ]; "10.0.40.105" = [ "koro-ipmi" ];
"10.0.40.6" = [ "weasel" ]; "10.0.42.6" = [ "weasel-ib" ]; "10.0.40.106" = [ "weasel-ipmi" ];
"10.0.40.7" = [ "hut" ]; "10.0.42.7" = [ "hut-ib" ]; "10.0.40.107" = [ "hut-ipmi" ];
"10.0.40.8" = [ "eudy" ]; "10.0.42.8" = [ "eudy-ib" ]; "10.0.40.108" = [ "eudy-ipmi" ];
};
}

23
m/common/ssf/net.nix Normal file
View File

@ -0,0 +1,23 @@
{ pkgs, ... }:
{
# Infiniband (IPoIB)
environment.systemPackages = [ pkgs.rdma-core ];
boot.kernelModules = [ "ib_umad" "ib_ipoib" ];
networking = {
defaultGateway = "10.0.40.30";
nameservers = ["8.8.8.8"];
firewall = {
extraCommands = ''
# Prevent ssfhead from contacting our slurmd daemon
iptables -A nixos-fw -p tcp -s ssfhead --dport 6817:6819 -j nixos-fw-refuse
# But accept traffic to slurm ports from any other node in the subnet
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 6817:6819 -j nixos-fw-accept
# We also need to open the srun port range
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 60000:61000 -j nixos-fw-accept
'';
};
};
}

7
m/common/xeon.nix Normal file
View File

@ -0,0 +1,7 @@
{
# Provides the base system for a xeon node, not necessarily in the SSF rack.
imports = [
./base.nix
./xeon/console.nix
];
}

14
m/common/xeon/console.nix Normal file
View File

@ -0,0 +1,14 @@
{
# Restart the serial console
systemd.services."serial-getty@ttyS0" = {
enable = true;
wantedBy = [ "getty.target" ];
serviceConfig.Restart = "always";
};
# Enable serial console
boot.kernelParams = [
"console=tty1"
"console=ttyS0,115200"
];
}

38
m/eudy/configuration.nix Normal file
View File

@ -0,0 +1,38 @@
{ config, pkgs, lib, modulesPath, ... }:
{
imports = [
../common/ssf.nix
#(modulesPath + "/installer/netboot/netboot-minimal.nix")
./kernel/kernel.nix
./cpufreq.nix
./fs.nix
./users.nix
../module/hut-substituter.nix
../module/debuginfod.nix
];
# Select this using the ID to avoid mismatches
boot.loader.grub.device = "/dev/disk/by-id/wwn-0x55cd2e414d53564b";
# disable automatic garbage collector
nix.gc.automatic = lib.mkForce false;
# members of the tracing group can use the lttng-provided kernel events
# without root permissions
users.groups.tracing.members = [ "arocanon" ];
# set up both ethernet and infiniband ips
networking = {
hostName = "eudy";
interfaces.eno1.ipv4.addresses = [ {
address = "10.0.40.8";
prefixLength = 24;
} ];
interfaces.ibp5s0.ipv4.addresses = [ {
address = "10.0.42.8";
prefixLength = 24;
} ];
};
}

40
m/eudy/cpufreq.nix Normal file
View File

@ -0,0 +1,40 @@
{ lib, ... }:
{
# Disable frequency boost by default. Use the intel_pstate driver instead of
# acpi_cpufreq driver because the acpi_cpufreq driver does not read the
# complete range of P-States [1]. Use the intel_pstate passive mode [2] to
# disable HWP, which allows a core to "select P-states by itself". Also, this
# disables intel governors, which confusingly, have the same names as the
# generic ones but behave differently [3].
# Essentially, we use the generic governors, but use the intel driver to read
# the P-state list.
# [1] - https://www.kernel.org/doc/html/latest/admin-guide/pm/intel_pstate.html#intel-pstate-vs-acpi-cpufreq
# [2] - https://www.kernel.org/doc/html/latest/admin-guide/pm/intel_pstate.html#passive-mode
# [3] - https://www.kernel.org/doc/html/latest/admin-guide/pm/intel_pstate.html#active-mode
# https://www.kernel.org/doc/html/latest/admin-guide/pm/cpufreq.html
# set intel_pstate to passive mode
boot.kernelParams = [
"intel_pstate=passive"
];
# Disable frequency boost
system.activationScripts = {
disableFrequencyBoost.text = ''
echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
'';
};
## disable intel_pstate
#boot.kernelParams = [
# "intel_pstate=disable"
#];
## Disable frequency boost
#system.activationScripts = {
# disableFrequencyBoost.text = ''
# echo 0 > /sys/devices/system/cpu/cpufreq/boost
# '';
#};
}

13
m/eudy/fs.nix Normal file
View File

@ -0,0 +1,13 @@
{ ... }:
{
fileSystems."/nix" = {
device = "/dev/disk/by-label/optane";
fsType = "ext4";
neededForBoot = true;
};
fileSystems."/mnt/data" = {
device = "/dev/disk/by-label/data";
fsType = "ext4";
};
}

70
m/eudy/kernel/kernel.nix Normal file
View File

@ -0,0 +1,70 @@
{ pkgs, lib, ... }:
let
#fcs-devel = pkgs.linuxPackages_custom {
# version = "6.2.8";
# src = /mnt/data/kernel/fcs/kernel/src;
# configfile = /mnt/data/kernel/fcs/kernel/configs/defconfig;
#};
#fcsv1 = fcs-kernel "bc11660676d3d68ce2459b9fb5d5e654e3f413be" false;
#fcsv2 = fcs-kernel "db0f2eca0cd57a58bf456d7d2c7d5d8fdb25dfb1" false;
#fcsv1-lockdep = fcs-kernel "bc11660676d3d68ce2459b9fb5d5e654e3f413be" true;
#fcsv2-lockdep = fcs-kernel "db0f2eca0cd57a58bf456d7d2c7d5d8fdb25dfb1" true;
#fcs-kernel = gitCommit: lockdep: pkgs.linuxPackages_custom {
# version = "6.2.8";
# src = builtins.fetchGit {
# url = "git@bscpm03.bsc.es:ompss-kernel/linux.git";
# rev = gitCommit;
# ref = "fcs";
# };
# configfile = if lockdep then ./configs/lockdep else ./configs/defconfig;
#};
kernel = nixos-fcs;
nixos-fcs-kernel = lib.makeOverridable ({gitCommit, lockStat ? false, preempt ? false, branch ? "fcs"}: pkgs.linuxPackagesFor (pkgs.buildLinux rec {
version = "6.2.8";
src = builtins.fetchGit {
url = "git@bscpm03.bsc.es:ompss-kernel/linux.git";
rev = gitCommit;
ref = branch;
};
structuredExtraConfig = with lib.kernel; {
# add general custom kernel options here
} // lib.optionalAttrs lockStat {
LOCK_STAT = yes;
} // lib.optionalAttrs preempt {
PREEMPT = lib.mkForce yes;
PREEMPT_VOLUNTARY = lib.mkForce no;
};
kernelPatches = [];
extraMeta.branch = lib.versions.majorMinor version;
}));
nixos-fcs = nixos-fcs-kernel {gitCommit = "8a09822dfcc8f0626b209d6d2aec8b5da459dfee";};
nixos-fcs-lockstat = nixos-fcs.override {
lockStat = true;
};
nixos-fcs-lockstat-preempt = nixos-fcs.override {
lockStat = true;
preempt = true;
};
latest = pkgs.linuxPackages_latest;
in {
imports = [
./lttng.nix
./perf.nix
];
boot.kernelPackages = lib.mkForce kernel;
# disable all cpu mitigations
boot.kernelParams = [
"mitigations=off"
];
# enable memory overcommit, needed to build a taglibc system using nix after
# increasing the openblas memory footprint
boot.kernel.sysctl."vm.overcommit_memory" = 1;
}

43
m/eudy/kernel/lttng.nix Normal file
View File

@ -0,0 +1,43 @@
{ config, pkgs, lib, ... }:
let
# The lttng btrfs probe crashes at compile time because of an undefined
# function. This disables the btrfs tracepoints to avoid the issue.
# Also enable lockdep tracepoints, this is disabled by default because it
# does not work well on architectures other than x86_64 (i think that arm) as
# I was told on the mailing list.
lttng-modules-fixed = config.boot.kernelPackages.lttng-modules.overrideAttrs (finalAttrs: previousAttrs: {
patchPhase = (lib.optionalString (previousAttrs ? patchPhase) previousAttrs.patchPhase) + ''
# disable btrfs
substituteInPlace src/probes/Kbuild \
--replace " obj-\$(CONFIG_LTTNG) += lttng-probe-btrfs.o" " #obj-\$(CONFIG_LTTNG) += lttng-probe-btrfs.o"
# enable lockdep tracepoints
substituteInPlace src/probes/Kbuild \
--replace "#ifneq (\$(CONFIG_LOCKDEP),)" "ifneq (\$(CONFIG_LOCKDEP),)" \
--replace "# obj-\$(CONFIG_LTTNG) += lttng-probe-lock.o" " obj-\$(CONFIG_LTTNG) += lttng-probe-lock.o" \
--replace "#endif # CONFIG_LOCKDEP" "endif # CONFIG_LOCKDEP"
'';
});
in {
# add the lttng tools and modules to the system environment
boot.extraModulePackages = [ lttng-modules-fixed ];
environment.systemPackages = with pkgs; [
lttng-tools lttng-ust babeltrace
];
# start the lttng root daemon to manage kernel events
systemd.services.lttng-sessiond = {
wantedBy = [ "multi-user.target" ];
description = "LTTng session daemon for the root user";
serviceConfig = {
User = "root";
ExecStart = ''
${pkgs.lttng-tools}/bin/lttng-sessiond
'';
};
};
}

22
m/eudy/kernel/perf.nix Normal file
View File

@ -0,0 +1,22 @@
{ config, pkgs, lib, ... }:
{
# add the perf tool
environment.systemPackages = with pkgs; [
config.boot.kernelPackages.perf
];
# allow non-root users to read tracing data from the kernel
boot.kernel.sysctl."kernel.perf_event_paranoid" = -2;
boot.kernel.sysctl."kernel.kptr_restrict" = 0;
# specify additionl options to the tracefs directory to allow members of the
# tracing group to access tracefs.
fileSystems."/sys/kernel/tracing" = {
options = [
"mode=755"
"gid=tracing"
];
};
}

11
m/eudy/users.nix Normal file
View File

@ -0,0 +1,11 @@
{ ... }:
{
security.sudo.extraRules= [{
users = [ "arocanon" ];
commands = [{
command = "ALL" ;
options= [ "NOPASSWD" ]; # "SETENV" # Adding the following could be a good idea
}];
}];
}

112
m/fox/configuration.nix Normal file
View File

@ -0,0 +1,112 @@
{ lib, config, pkgs, ... }:
{
imports = [
../common/base.nix
../common/xeon/console.nix
../module/amd-uprof.nix
../module/emulation.nix
../module/nvidia.nix
../module/slurm-client.nix
../module/hut-substituter.nix
./wireguard.nix
];
# Don't turn off on August as UPC has different dates.
# Fox works fine on power cuts.
systemd.timers.august-shutdown.enable = false;
# Select the this using the ID to avoid mismatches
boot.loader.grub.device = "/dev/disk/by-id/wwn-0x500a07514b0c1103";
# No swap, there is plenty of RAM
swapDevices = lib.mkForce [];
boot.initrd.availableKernelModules = [ "xhci_pci" "ahci" "nvme" "usbhid" "usb_storage" "sd_mod" ];
boot.kernelModules = [ "kvm-amd" "amd_uncore" "amd_hsmp" ];
hardware.cpu.amd.updateMicrocode = lib.mkDefault config.hardware.enableRedistributableFirmware;
hardware.cpu.intel.updateMicrocode = lib.mkForce false;
# Use performance for benchmarks
powerManagement.cpuFreqGovernor = "performance";
services.amd-uprof.enable = true;
# Disable NUMA balancing
boot.kernel.sysctl."kernel.numa_balancing" = 0;
# Expose kernel addresses
boot.kernel.sysctl."kernel.kptr_restrict" = 0;
# Disable NMI watchdog to save one hw counter (for AMD uProf)
boot.kernel.sysctl."kernel.nmi_watchdog" = 0;
services.openssh.settings.X11Forwarding = true;
services.fail2ban.enable = true;
networking = {
timeServers = [ "ntp1.upc.edu" "ntp2.upc.edu" ];
hostName = "fox";
# UPC network (may change over time, use DHCP)
# Public IP configuration:
# - Hostname: fox.ac.upc.edu
# - IP: 147.83.30.141
# - Gateway: 147.83.30.130
# - NetMask: 255.255.255.192
# Private IP configuration for BMC:
# - Hostname: fox-ipmi.ac.upc.edu
# - IP: 147.83.35.27
# - Gateway: 147.83.35.2
# - NetMask: 255.255.255.0
interfaces.enp1s0f0np0.useDHCP = true;
};
# Recommended for new graphics cards
hardware.nvidia.open = true;
# Mount NVME disks
fileSystems."/nvme0" = { device = "/dev/disk/by-label/nvme0"; fsType = "ext4"; };
fileSystems."/nvme1" = { device = "/dev/disk/by-label/nvme1"; fsType = "ext4"; };
# Mount the NFS home
fileSystems."/nfs/home" = {
device = "10.106.0.30:/home";
fsType = "nfs";
options = [ "nfsvers=3" "rsize=1024" "wsize=1024" "cto" "nofail" ];
};
# Make a /nvme{0,1}/$USER directory for each user.
systemd.services.create-nvme-dirs = let
# Take only normal users in fox
users = lib.filterAttrs (_: v: v.isNormalUser) config.users.users;
commands = lib.concatLists (lib.mapAttrsToList
(_: user: [
"install -d -o ${user.name} -g ${user.group} -m 0755 /nvme{0,1}/${user.name}"
]) users);
script = pkgs.writeShellScript "create-nvme-dirs.sh" (lib.concatLines commands);
in {
enable = true;
wants = [ "local-fs.target" ];
after = [ "local-fs.target" ];
wantedBy = [ "multi-user.target" ];
serviceConfig.ExecStart = script;
};
# Only allow SSH connections from users who have a SLURM allocation
# See: https://slurm.schedmd.com/pam_slurm_adopt.html
security.pam.services.sshd.rules.account.slurm = {
control = "required";
enable = true;
modulePath = "${pkgs.slurm}/lib/security/pam_slurm_adopt.so";
args = [ "log_level=debug5" ];
order = 999999; # Make it last one
};
# Disable systemd session (pam_systemd.so) as it will conflict with the
# pam_slurm_adopt.so module. What happens is that the shell is first adopted
# into the slurmstepd task and then into the systemd session, which is not
# what we want, otherwise it will linger even if all jobs are gone.
security.pam.services.sshd.startSession = lib.mkForce false;
}

54
m/fox/wireguard.nix Normal file
View File

@ -0,0 +1,54 @@
{ config, ... }:
{
networking.firewall = {
allowedUDPPorts = [ 666 ];
};
age.secrets.wgFox.file = ../../secrets/wg-fox.age;
networking.wireguard.enable = true;
networking.wireguard.interfaces = {
# "wg0" is the network interface name. You can name the interface arbitrarily.
wg0 = {
# Determines the IP address and subnet of the server's end of the tunnel interface.
ips = [ "10.106.0.1/24" ];
# The port that WireGuard listens to. Must be accessible by the client.
listenPort = 666;
# Path to the private key file.
privateKeyFile = config.age.secrets.wgFox.path;
# Public key: VfMPBQLQTKeyXJSwv8wBhc6OV0j2qAxUpX3kLHunK2Y=
peers = [
# List of allowed peers.
{
name = "apex";
publicKey = "VwhcN8vSOzdJEotQTpmPHBC52x3Hbv1lkFIyKubrnUA=";
# List of IPs assigned to this peer within the tunnel subnet. Used to configure routing.
allowedIPs = [ "10.106.0.30/32" "10.0.40.7/32" ];
}
{
name = "raccoon";
publicKey = "QUfnGXSMEgu2bviglsaSdCjidB51oEDBFpnSFcKGfDI=";
allowedIPs = [ "10.106.0.236/32" "192.168.0.0/16" "10.0.44.0/24" ];
}
];
};
};
networking.hosts = {
"10.106.0.30" = [ "apex" ];
"10.0.40.7" = [ "hut" ];
"10.106.0.236" = [ "raccoon" ];
"10.0.44.4" = [ "tent" ];
};
networking.firewall = {
extraCommands = ''
# Accept slurm connections to slurmd from apex (via wireguard)
iptables -A nixos-fw -p tcp -i wg0 -s 10.106.0.30/32 -d 10.106.0.1/32 --dport 6818 -j nixos-fw-accept
'';
};
}

14
m/hut/blackbox.yml Normal file
View File

@ -0,0 +1,14 @@
modules:
http_2xx:
prober: http
timeout: 5s
http:
follow_redirects: true
preferred_ip_protocol: "ip4"
valid_status_codes: [] # Defaults to 2xx
method: GET
icmp:
prober: icmp
timeout: 5s
icmp:
preferred_ip_protocol: "ip4"

67
m/hut/configuration.nix Normal file
View File

@ -0,0 +1,67 @@
{ config, pkgs, lib, ... }:
{
imports = [
../common/ssf.nix
../module/ceph.nix
../module/debuginfod.nix
../module/emulation.nix
./gitlab-runner.nix
./monitoring.nix
./nfs.nix
./nix-serve.nix
./public-inbox.nix
./gitea.nix
./msmtp.nix
./postgresql.nix
./nginx.nix
./p.nix
#./pxe.nix
];
# Select the this using the ID to avoid mismatches
boot.loader.grub.device = "/dev/disk/by-id/wwn-0x55cd2e414d53567f";
fileSystems = {
"/" = lib.mkForce {
device = "/dev/disk/by-label/nvme";
fsType = "ext4";
neededForBoot = true;
options = [ "noatime" ];
};
"/boot" = lib.mkForce {
device = "/dev/disk/by-label/nixos-boot";
fsType = "ext4";
neededForBoot = true;
};
};
networking = {
hostName = "hut";
interfaces.eno1.ipv4.addresses = [ {
address = "10.0.40.7";
prefixLength = 24;
} ];
interfaces.ibp5s0.ipv4.addresses = [ {
address = "10.0.42.7";
prefixLength = 24;
} ];
firewall = {
extraCommands = ''
# Accept all proxy traffic from compute nodes but not the login
iptables -A nixos-fw -p tcp -s 10.0.40.30 --dport 23080 -j nixos-fw-log-refuse
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 23080 -j nixos-fw-accept
'';
# Flush all rules and chains on stop so it won't break on start
extraStopCommands = ''
iptables -F
iptables -X
'';
};
};
# Allow proxy to bind to the ethernet interface
services.openssh.settings.GatewayPorts = "clientspecified";
}

63
m/hut/gitea.nix Normal file
View File

@ -0,0 +1,63 @@
{ config, lib, ... }:
{
age.secrets.giteaRunnerToken.file = ../../secrets/gitea-runner-token.age;
services.gitea = {
enable = true;
appName = "Gitea in the jungle";
settings = {
server = {
ROOT_URL = "https://jungle.bsc.es/git/";
LOCAL_ROOT_URL = "https://jungle.bsc.es/git/";
LANDING_PAGE = "explore";
};
metrics.ENABLED = true;
service = {
REGISTER_MANUAL_CONFIRM = true;
ENABLE_NOTIFY_MAIL = true;
};
log.LEVEL = "Warn";
mailer = {
ENABLED = true;
FROM = "jungle-robot@bsc.es";
PROTOCOL = "sendmail";
SENDMAIL_PATH = "/run/wrappers/bin/sendmail";
SENDMAIL_ARGS = "--";
};
};
};
services.gitea-actions-runner.instances = {
runrun = {
enable = true;
name = "runrun";
url = "https://jungle.bsc.es/git/";
tokenFile = config.age.secrets.giteaRunnerToken.path;
labels = [ "native:host" ];
settings.runner.capacity = 8;
};
};
systemd.services.gitea-runner-runrun = {
path = [ "/run/current-system/sw" ];
serviceConfig = {
# DynamicUser doesn't work well with SSH
DynamicUser = lib.mkForce false;
User = "gitea-runner";
Group = "gitea-runner";
};
};
users.users.gitea-runner = {
isSystemUser = true;
home = "/var/lib/gitea-runner";
description = "Gitea Runner";
group = "gitea-runner";
extraGroups = [ "docker" ];
createHome = true;
};
users.groups.gitea-runner = {};
}

126
m/hut/gitlab-runner.nix Normal file
View File

@ -0,0 +1,126 @@
{ pkgs, lib, config, ... }:
{
age.secrets.gitlab-pm-shell.file = ../../secrets/gitlab-runner-shell-token.age;
age.secrets.gitlab-pm-docker.file = ../../secrets/gitlab-runner-docker-token.age;
age.secrets.gitlab-bsc-docker.file = ../../secrets/gitlab-bsc-docker-token.age;
services.gitlab-runner = {
enable = true;
settings.concurrent = 5;
services = let
common-shell = {
executor = "shell";
environmentVariables = {
SHELL = "${pkgs.bash}/bin/bash";
};
};
common-docker = {
executor = "docker";
dockerImage = "debian:stable";
registrationFlags = [
"--docker-network-mode host"
];
environmentVariables = {
https_proxy = "http://hut:23080";
http_proxy = "http://hut:23080";
};
};
in {
# For pm.bsc.es/gitlab
gitlab-pm-shell = common-shell // {
authenticationTokenConfigFile = config.age.secrets.gitlab-pm-shell.path;
};
gitlab-pm-docker = common-docker // {
authenticationTokenConfigFile = config.age.secrets.gitlab-pm-docker.path;
};
gitlab-bsc-docker = {
# gitlab.bsc.es still uses the old token mechanism
registrationConfigFile = config.age.secrets.gitlab-bsc-docker.path;
tagList = [ "docker" "hut" ];
environmentVariables = {
# We cannot access the hut local interface from docker, so we connect
# to hut directly via the ethernet one.
https_proxy = "http://hut:23080";
http_proxy = "http://hut:23080";
};
executor = "docker";
dockerImage = "alpine";
dockerVolumes = [
"/nix/store:/nix/store:ro"
"/nix/var/nix/db:/nix/var/nix/db:ro"
"/nix/var/nix/daemon-socket:/nix/var/nix/daemon-socket:ro"
];
dockerExtraHosts = [
# Required to pass the proxy via hut
"hut:10.0.40.7"
];
dockerDisableCache = true;
registrationFlags = [
# Increase build log length to 64 MiB
"--output-limit 65536"
];
preBuildScript = pkgs.writeScript "setup-container" ''
mkdir -p -m 0755 /nix/var/log/nix/drvs
mkdir -p -m 0755 /nix/var/nix/gcroots
mkdir -p -m 0755 /nix/var/nix/profiles
mkdir -p -m 0755 /nix/var/nix/temproots
mkdir -p -m 0755 /nix/var/nix/userpool
mkdir -p -m 1777 /nix/var/nix/gcroots/per-user
mkdir -p -m 1777 /nix/var/nix/profiles/per-user
mkdir -p -m 0755 /nix/var/nix/profiles/per-user/root
mkdir -p -m 0700 "$HOME/.nix-defexpr"
mkdir -p -m 0700 "$HOME/.ssh"
cat > "$HOME/.ssh/config" << EOF
Host bscpm04.bsc.es gitlab-internal.bsc.es
User git
ProxyCommand nc -X connect -x hut:23080 %h %p
Host amdlogin1.bsc.es armlogin1.bsc.es hualogin1.bsc.es glogin1.bsc.es glogin2.bsc.es fpgalogin1.bsc.es
ProxyCommand nc -X connect -x hut:23080 %h %p
EOF
cat >> "$HOME/.ssh/known_hosts" << EOF
bscpm04.bsc.es ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPx4mC0etyyjYUT2Ztc/bs4ZXSbVMrogs1ZTP924PDgT
gitlab-internal.bsc.es ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIF9arsAOSRB06hdy71oTvJHG2Mg8zfebADxpvc37lZo3
EOF
. ${pkgs.nix}/etc/profile.d/nix-daemon.sh
# Required to load SSL certificate paths
. ${pkgs.cacert}/nix-support/setup-hook
'';
environmentVariables = {
ENV = "/etc/profile";
USER = "root";
NIX_REMOTE = "daemon";
PATH = "${config.system.path}/bin:/bin:/sbin:/usr/bin:/usr/sbin";
};
};
};
};
# DOCKER* chains are useless, override at FORWARD and nixos-fw
networking.firewall.extraCommands = ''
# Don't forward any traffic from docker
iptables -I FORWARD 1 -p all -i docker0 -j nixos-fw-log-refuse
# Allow incoming traffic from docker to 23080
iptables -A nixos-fw -p tcp -i docker0 -d hut --dport 23080 -j ACCEPT
'';
#systemd.services.gitlab-runner.serviceConfig.Shell = "${pkgs.bash}/bin/bash";
systemd.services.gitlab-runner.serviceConfig.DynamicUser = lib.mkForce false;
systemd.services.gitlab-runner.serviceConfig.User = "gitlab-runner";
systemd.services.gitlab-runner.serviceConfig.Group = "gitlab-runner";
systemd.services.gitlab-runner.serviceConfig.ExecStart = lib.mkForce
''${pkgs.gitlab-runner}/bin/gitlab-runner run --config ''${HOME}/.gitlab-runner/config.toml --listen-address "127.0.0.1:9252" --working-directory ''${HOME}'';
users.users.gitlab-runner = {
uid = config.ids.uids.gitlab-runner;
#isNormalUser = true;
home = "/var/lib/gitlab-runner";
description = "Gitlab Runner";
group = "gitlab-runner";
extraGroups = [ "docker" ];
createHome = true;
};
users.groups.gitlab-runner.gid = config.ids.gids.gitlab-runner;
}

31
m/hut/gpfs-probe.nix Normal file
View File

@ -0,0 +1,31 @@
{ pkgs, config, lib, ... }:
let
gpfs-probe-script = pkgs.runCommand "gpfs-probe.sh" { }
''
cp ${./gpfs-probe.sh} $out;
chmod +x $out
''
;
in
{
# Use a new user to handle the SSH keys
users.groups.ssh-robot = { };
users.users.ssh-robot = {
description = "SSH Robot";
isNormalUser = true;
home = "/var/lib/ssh-robot";
};
systemd.services.gpfs-probe = {
description = "Daemon to report GPFS latency via SSH";
path = [ pkgs.openssh pkgs.netcat ];
after = [ "network.target" ];
wantedBy = [ "default.target" ];
serviceConfig = {
Type = "simple";
ExecStart = "${pkgs.socat}/bin/socat TCP4-LISTEN:9966,fork EXEC:${gpfs-probe-script}";
User = "ssh-robot";
Group = "ssh-robot";
};
};
}

18
m/hut/gpfs-probe.sh Executable file
View File

@ -0,0 +1,18 @@
#!/bin/sh
N=500
t=$(timeout 5 ssh bsc015557@glogin2.bsc.es "timeout 3 command time -f %e touch /gpfs/projects/bsc15/bsc015557/gpfs.{1..$N} 2>&1; rm -f /gpfs/projects/bsc15/bsc015557/gpfs.{1..$N}")
if [ -z "$t" ]; then
t="5.00"
fi
cat <<EOF
HTTP/1.1 200 OK
Content-Type: text/plain; version=0.0.4; charset=utf-8; escaping=values
# HELP gpfs_touch_latency Time to create $N files.
# TYPE gpfs_touch_latency gauge
gpfs_touch_latency $t
EOF

272
m/hut/monitoring.nix Normal file
View File

@ -0,0 +1,272 @@
{ config, lib, ... }:
{
imports = [
../module/slurm-exporter.nix
../module/meteocat-exporter.nix
../module/upc-qaire-exporter.nix
./gpfs-probe.nix
../module/nix-daemon-exporter.nix
];
age.secrets.grafanaJungleRobotPassword = {
file = ../../secrets/jungle-robot-password.age;
owner = "grafana";
mode = "400";
};
age.secrets.ipmiYml.file = ../../secrets/ipmi.yml.age;
services.grafana = {
enable = true;
settings = {
server = {
domain = "jungle.bsc.es";
root_url = "%(protocol)s://%(domain)s/grafana";
serve_from_sub_path = true;
http_port = 2342;
http_addr = "127.0.0.1";
};
smtp = {
enabled = true;
from_address = "jungle-robot@bsc.es";
user = "jungle-robot";
# Read the password from a file, which is only readable by grafana user
# https://grafana.com/docs/grafana/latest/setup-grafana/configure-grafana/#file-provider
password = "$__file{${config.age.secrets.grafanaJungleRobotPassword.path}}";
host = "mail.bsc.es:465";
startTLS_policy = "NoStartTLS";
};
feature_toggles.publicDashboards = true;
"auth.anonymous".enabled = true;
log.level = "warn";
};
};
# Make grafana alerts also use the proxy
systemd.services.grafana.environment = config.networking.proxy.envVars;
services.prometheus = {
enable = true;
port = 9001;
retentionTime = "5y";
listenAddress = "127.0.0.1";
};
systemd.services.prometheus-ipmi-exporter.serviceConfig.DynamicUser = lib.mkForce false;
systemd.services.prometheus-ipmi-exporter.serviceConfig.PrivateDevices = lib.mkForce false;
# We need access to the devices to monitor the disk space
systemd.services.prometheus-node-exporter.serviceConfig.PrivateDevices = lib.mkForce false;
systemd.services.prometheus-node-exporter.serviceConfig.ProtectHome = lib.mkForce "read-only";
virtualisation.docker.daemon.settings = {
metrics-addr = "127.0.0.1:9323";
};
# Required to allow the smartctl exporter to read the nvme0 character device,
# see the commit message on:
# https://github.com/NixOS/nixpkgs/commit/12c26aca1fd55ab99f831bedc865a626eee39f80
services.udev.extraRules = ''
SUBSYSTEM=="nvme", KERNEL=="nvme[0-9]*", GROUP="disk"
'';
services.prometheus = {
exporters = {
ipmi = {
enable = true;
group = "root";
user = "root";
configFile = config.age.secrets.ipmiYml.path;
# extraFlags = [ "--log.level=debug" ];
listenAddress = "127.0.0.1";
};
node = {
enable = true;
enabledCollectors = [ "systemd" "logind" ];
port = 9002;
listenAddress = "127.0.0.1";
};
smartctl = {
enable = true;
listenAddress = "127.0.0.1";
};
blackbox = {
enable = true;
listenAddress = "127.0.0.1";
configFile = ./blackbox.yml;
};
};
scrapeConfigs = [
{
job_name = "xeon07";
static_configs = [{
targets = [
"127.0.0.1:${toString config.services.prometheus.exporters.node.port}"
"127.0.0.1:${toString config.services.prometheus.exporters.ipmi.port}"
"127.0.0.1:9323"
"127.0.0.1:9252"
"127.0.0.1:${toString config.services.prometheus.exporters.smartctl.port}"
"127.0.0.1:9341" # Slurm exporter
"127.0.0.1:9966" # GPFS custom exporter
"127.0.0.1:9999" # Nix-daemon custom exporter
"127.0.0.1:9929" # Meteocat custom exporter
"127.0.0.1:9928" # UPC Qaire custom exporter
"127.0.0.1:${toString config.services.prometheus.exporters.blackbox.port}"
];
}];
}
{
job_name = "ceph";
static_configs = [{
targets = [
"10.0.40.40:9283" # Ceph statistics
"10.0.40.40:9002" # Node exporter
"10.0.40.42:9002" # Node exporter
];
}];
}
{
job_name = "blackbox-http";
metrics_path = "/probe";
params = { module = [ "http_2xx" ]; };
static_configs = [{
targets = [
"https://www.google.com/robots.txt"
"https://pm.bsc.es/"
"https://pm.bsc.es/gitlab/"
"https://jungle.bsc.es/"
"https://gitlab.bsc.es/"
];
}];
relabel_configs = [
{
# Takes the address and sets it in the "target=<xyz>" URL parameter
source_labels = [ "__address__" ];
target_label = "__param_target";
}
{
# Sets the "instance" label with the remote host we are querying
source_labels = [ "__param_target" ];
target_label = "instance";
}
{
# Shows the host target address instead of the blackbox address
target_label = "__address__";
replacement = "127.0.0.1:${toString config.services.prometheus.exporters.blackbox.port}";
}
];
}
{
job_name = "blackbox-icmp";
metrics_path = "/probe";
params = { module = [ "icmp" ]; };
static_configs = [{
targets = [
"1.1.1.1"
"8.8.8.8"
"ssfhead"
"anella-bsc.cesca.cat"
"upc-anella.cesca.cat"
"fox.ac.upc.edu"
"arenys5.ac.upc.edu"
];
}];
relabel_configs = [
{
# Takes the address and sets it in the "target=<xyz>" URL parameter
source_labels = [ "__address__" ];
target_label = "__param_target";
}
{
# Sets the "instance" label with the remote host we are querying
source_labels = [ "__param_target" ];
target_label = "instance";
}
{
# Shows the host target address instead of the blackbox address
target_label = "__address__";
replacement = "127.0.0.1:${toString config.services.prometheus.exporters.blackbox.port}";
}
];
}
{
job_name = "gitea";
static_configs = [{ targets = [ "127.0.0.1:3000" ]; }];
}
{
# Scrape the IPMI info of the hosts remotely via LAN
job_name = "ipmi-lan";
scrape_interval = "1m";
scrape_timeout = "30s";
metrics_path = "/ipmi";
scheme = "http";
relabel_configs = [
{
# Takes the address and sets it in the "target=<xyz>" URL parameter
source_labels = [ "__address__" ];
separator = ";";
regex = "(.*)(:80)?";
target_label = "__param_target";
replacement = "\${1}";
action = "replace";
}
{
# Sets the "instance" label with the remote host we are querying
source_labels = [ "__param_target" ];
separator = ";";
regex = "(.*)-ipmi"; # Remove "-ipm̀i" at the end
target_label = "instance";
replacement = "\${1}";
action = "replace";
}
{
# Sets the fixed "module=lan" URL param
separator = ";";
regex = "(.*)";
target_label = "__param_module";
replacement = "lan";
action = "replace";
}
{
# Sets the target to query as the localhost IPMI exporter
separator = ";";
regex = ".*";
target_label = "__address__";
replacement = "127.0.0.1:9290";
action = "replace";
}
];
# Load the list of targets from another file
file_sd_configs = [
{
files = [ "${./targets.yml}" ];
refresh_interval = "30s";
}
];
}
{
job_name = "ipmi-raccoon";
metrics_path = "/ipmi";
static_configs = [
{ targets = [ "127.0.0.1:9291" ]; }
];
params = {
target = [ "84.88.51.142" ];
module = [ "raccoon" ];
};
}
{
job_name = "raccoon";
static_configs = [
{
targets = [ "127.0.0.1:19002" ]; # Node exporter
}
];
}
];
};
}

24
m/hut/msmtp.nix Normal file
View File

@ -0,0 +1,24 @@
{ config, lib, ... }:
{
age.secrets.jungleRobotPassword = {
file = ../../secrets/jungle-robot-password.age;
group = "gitea";
mode = "440";
};
programs.msmtp = {
enable = true;
accounts = {
default = {
auth = true;
tls = true;
tls_starttls = false;
port = 465;
host = "mail.bsc.es";
user = "jungle-robot";
passwordeval = "cat ${config.age.secrets.jungleRobotPassword.path}";
from = "jungle-robot@bsc.es";
};
};
};
}

76
m/hut/nginx.nix Normal file
View File

@ -0,0 +1,76 @@
{ theFlake, pkgs, ... }:
let
website = pkgs.stdenv.mkDerivation {
name = "jungle-web";
src = pkgs.fetchgit {
url = "https://jungle.bsc.es/git/rarias/jungle-website.git";
rev = "52abaf4d71652a9ef77a0b098db14ca33bffff4c";
hash = "sha256-/ul9GazbOrOkmlvSgDz/+2W+V+ir5725Y7mVLc3rb0M=";
};
buildInputs = [ pkgs.hugo ];
buildPhase = ''
rm -rf public/
hugo
'';
installPhase = ''
cp -r public $out
'';
# Don't mess doc/
dontFixup = true;
};
in
{
networking.firewall.allowedTCPPorts = [ 80 ];
services.nginx = {
enable = true;
virtualHosts."jungle.bsc.es" = {
root = "${website}";
listen = [
{
addr = "0.0.0.0";
port = 80;
}
];
extraConfig = ''
set_real_ip_from 127.0.0.1;
set_real_ip_from 84.88.52.107;
real_ip_recursive on;
real_ip_header X-Forwarded-For;
location /git {
rewrite ^/git$ / break;
rewrite ^/git/(.*) /$1 break;
proxy_pass http://127.0.0.1:3000;
proxy_redirect http:// $scheme://;
}
location /cache {
rewrite ^/cache/(.*) /$1 break;
proxy_pass http://127.0.0.1:5000;
proxy_redirect http:// $scheme://;
}
location /lists {
proxy_pass http://127.0.0.1:8081;
proxy_redirect http:// $scheme://;
}
location /grafana {
proxy_pass http://127.0.0.1:2342;
proxy_redirect http:// $scheme://;
proxy_set_header Host $host;
# Websockets
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
location ~ ^/~(.+?)(/.*)?$ {
alias /ceph/home/$1/public_html$2;
index index.html index.htm;
autoindex on;
absolute_redirect off;
}
location /p/ {
alias /ceph/p/;
}
'';
};
};
}

16
m/hut/nix-serve.nix Normal file
View File

@ -0,0 +1,16 @@
{ config, ... }:
{
age.secrets.nixServe.file = ../../secrets/nix-serve.age;
services.nix-serve = {
enable = true;
# Only listen locally, as we serve it via ssh
bindAddress = "127.0.0.1";
port = 5000;
secretKeyFile = config.age.secrets.nixServe.path;
# Public key:
# jungle.bsc.es:pEc7MlAT0HEwLQYPtpkPLwRsGf80ZI26aj29zMw/HH0=
};
}

43
m/hut/p.nix Normal file
View File

@ -0,0 +1,43 @@
{ pkgs, lib, config, ... }:
let
p = pkgs.writeShellScriptBin "p" ''
set -e
cd /ceph
pastedir="p/$USER"
mkdir -p "$pastedir"
ext="txt"
if [ -n "$1" ]; then
ext="$1"
fi
out=$(mktemp "$pastedir/XXXXXXXX.$ext")
cat > "$out"
chmod go+r "$out"
echo "https://jungle.bsc.es/$out"
'';
in
{
environment.systemPackages = with pkgs; [ p ];
# Make sure we have a directory per user. We cannot use the nice
# systemd-tmpfiles-setup.service service because this is a remote FS, and it
# may not be mounted when it runs.
systemd.services.create-paste-dirs = let
# Take only normal users in hut
users = lib.filterAttrs (_: v: v.isNormalUser) config.users.users;
commands = lib.concatLists (lib.mapAttrsToList
(_: user: [
"install -d -o ${user.name} -g ${user.group} -m 0755 /ceph/p/${user.name}"
]) users);
script = pkgs.writeShellScript "create-paste-dirs.sh" (lib.concatLines commands);
in {
enable = true;
wants = [ "remote-fs.target" ];
after = [ "remote-fs.target" ];
wantedBy = [ "multi-user.target" ];
serviceConfig.ExecStart = script;
};
}

19
m/hut/postgresql.nix Normal file
View File

@ -0,0 +1,19 @@
{ lib, ... }:
{
services.postgresql = {
enable = true;
ensureDatabases = [ "perftestsdb" ];
ensureUsers = [
{ name = "anavarro"; ensureClauses.superuser = true; }
{ name = "rarias"; ensureClauses.superuser = true; }
{ name = "grafana"; }
];
authentication = ''
#type database DBuser auth-method
local perftestsdb rarias trust
local perftestsdb anavarro trust
local perftestsdb grafana trust
'';
};
}

79
m/hut/public-inbox.css Normal file
View File

@ -0,0 +1,79 @@
/*
* CC0-1.0 <https://creativecommons.org/publicdomain/zero/1.0/legalcode>
* Dark color scheme using 216 web-safe colors, inspired
* somewhat by the default color scheme in mutt.
* It reduces eyestrain for me, and energy usage for all:
* https://en.wikipedia.org/wiki/Light-on-dark_color_scheme
*/
* {
font-size: 14px;
font-family: monospace;
}
pre {
white-space: pre-wrap;
padding: 10px;
background: #f5f5f5;
}
hr {
margin: 30px 0;
}
body {
max-width: 120ex; /* 120 columns wide */
margin: 50px auto;
}
/*
* Underlined links add visual noise which make them hard-to-read.
* Use colors to make them stand out, instead.
*/
a:link {
color: #007;
text-decoration: none;
}
a:visited {
color:#504;
}
a:hover {
text-decoration: underline;
}
/* quoted text in emails gets a different color */
*.q { color:gray }
/*
* these may be used with cgit <https://git.zx2c4.com/cgit/>, too.
* (cgit uses <div>, public-inbox uses <span>)
*/
*.add { color:darkgreen } /* diff post-image lines */
*.del { color:darkred } /* diff pre-image lines */
*.head { color:black } /* diff header (metainformation) */
*.hunk { color:gray } /* diff hunk-header */
/*
* highlight 3.x colors (tested 3.18) for displaying blobs.
* This doesn't use most of the colors available, as I find too
* many colors overwhelming, so the default is commented out.
*/
.hl.num { color:#f30 } /* number */
.hl.esc { color:#f0f } /* escape character */
.hl.str { color:#f30 } /* string */
.hl.ppc { color:#f0f } /* preprocessor */
.hl.pps { color:#f30 } /* preprocessor string */
.hl.slc { color:#09f } /* single-line comment */
.hl.com { color:#09f } /* multi-line comment */
/* .hl.opt { color:#ccc } */ /* operator */
/* .hl.ipl { color:#ccc } */ /* interpolation */
/* keyword groups kw[a-z] */
.hl.kwa { color:#ff0 }
.hl.kwb { color:#0f0 }
.hl.kwc { color:#ff0 }
/* .hl.kwd { color:#ccc } */
/* line-number (unused by public-inbox) */
/* .hl.lin { color:#ccc } */

47
m/hut/public-inbox.nix Normal file
View File

@ -0,0 +1,47 @@
{ lib, ... }:
{
services.public-inbox = {
enable = true;
http = {
enable = true;
port = 8081;
mounts = [ "/lists" ];
};
settings.publicinbox = {
css = [ "${./public-inbox.css}" ];
wwwlisting = "all";
};
inboxes = {
bscpkgs = {
url = "https://jungle.bsc.es/lists/bscpkgs";
address = [ "~rodarima/bscpkgs@lists.sr.ht" ];
watch = [ "imaps://jungle-robot%40gmx.com@imap.gmx.com/INBOX" ];
description = "Patches for bscpkgs";
listid = "~rodarima/bscpkgs.lists.sr.ht";
};
jungle = {
url = "https://jungle.bsc.es/lists/jungle";
address = [ "~rodarima/jungle@lists.sr.ht" ];
watch = [ "imaps://jungle-robot%40gmx.com@imap.gmx.com/INBOX" ];
description = "Patches for jungle";
listid = "~rodarima/jungle.lists.sr.ht";
};
};
};
# We need access to the network for the watch service, as we will fetch the
# emails directly from the IMAP server.
systemd.services.public-inbox-watch.serviceConfig = {
PrivateNetwork = lib.mkForce false;
RestrictAddressFamilies = lib.mkForce [ "AF_UNIX" "AF_INET" "AF_INET6" ];
KillSignal = "SIGKILL"; # Avoid slow shutdown
# Required for chmod(..., 02750) on directories by git, from
# systemd.exec(8):
# > Note that this restricts marking of any type of file system object with
# > these bits, including both regular files and directories (where the SGID
# > is a different meaning than for files, see documentation).
RestrictSUIDSGID = lib.mkForce false;
};
}

35
m/hut/pxe.nix Normal file
View File

@ -0,0 +1,35 @@
{ theFlake, pkgs, ... }:
# This module describes a script that can launch the pixiecore daemon to serve a
# NixOS image via PXE to a node to directly boot from there, without requiring a
# working disk.
let
# The host config must have the netboot-minimal.nix module too
host = theFlake.nixosConfigurations.lake2;
sys = host.config.system;
build = sys.build;
kernel = "${build.kernel}/bzImage";
initrd = "${build.netbootRamdisk}/initrd";
init = "${build.toplevel}/init";
script = pkgs.writeShellScriptBin "pixiecore-helper" ''
#!/usr/bin/env bash -x
${pkgs.pixiecore}/bin/pixiecore \
boot ${kernel} ${initrd} --cmdline "init=${init} loglevel=4" \
--debug --dhcp-no-bind --port 64172 --status-port 64172 "$@"
'';
in
{
## We need a DHCP server to provide the IP
#services.dnsmasq = {
# enable = true;
# settings = {
# domain-needed = true;
# dhcp-range = [ "192.168.0.2,192.168.0.254" ];
# };
#};
environment.systemPackages = [ script ];
}

15
m/hut/targets.yml Normal file
View File

@ -0,0 +1,15 @@
- targets:
- owl1-ipmi
- owl2-ipmi
- xeon03-ipmi
- xeon04-ipmi
- koro-ipmi
- weasel-ipmi
- hut-ipmi
- eudy-ipmi
# Storage
- bay-ipmi
- oss01-ipmi
- lake2-ipmi
labels:
job: ipmi-lan

35
m/koro/configuration.nix Normal file
View File

@ -0,0 +1,35 @@
{ config, pkgs, lib, modulesPath, ... }:
{
imports = [
../common/ssf.nix
#(modulesPath + "/installer/netboot/netboot-minimal.nix")
../eudy/cpufreq.nix
../eudy/users.nix
./kernel.nix
];
# Select this using the ID to avoid mismatches
boot.loader.grub.device = "/dev/disk/by-id/wwn-0x55cd2e414d5376d2";
# disable automatic garbage collector
nix.gc.automatic = lib.mkForce false;
# members of the tracing group can use the lttng-provided kernel events
# without root permissions
users.groups.tracing.members = [ "arocanon" "vlopez" ];
# set up both ethernet and infiniband ips
networking = {
hostName = "koro";
interfaces.eno1.ipv4.addresses = [ {
address = "10.0.40.5";
prefixLength = 24;
} ];
interfaces.ibp5s0.ipv4.addresses = [ {
address = "10.0.42.5";
prefixLength = 24;
} ];
};
}

70
m/koro/kernel.nix Normal file
View File

@ -0,0 +1,70 @@
{ pkgs, lib, ... }:
let
#fcs-devel = pkgs.linuxPackages_custom {
# version = "6.2.8";
# src = /mnt/data/kernel/fcs/kernel/src;
# configfile = /mnt/data/kernel/fcs/kernel/configs/defconfig;
#};
#fcsv1 = fcs-kernel "bc11660676d3d68ce2459b9fb5d5e654e3f413be" false;
#fcsv2 = fcs-kernel "db0f2eca0cd57a58bf456d7d2c7d5d8fdb25dfb1" false;
#fcsv1-lockdep = fcs-kernel "bc11660676d3d68ce2459b9fb5d5e654e3f413be" true;
#fcsv2-lockdep = fcs-kernel "db0f2eca0cd57a58bf456d7d2c7d5d8fdb25dfb1" true;
#fcs-kernel = gitCommit: lockdep: pkgs.linuxPackages_custom {
# version = "6.2.8";
# src = builtins.fetchGit {
# url = "git@bscpm03.bsc.es:ompss-kernel/linux.git";
# rev = gitCommit;
# ref = "fcs";
# };
# configfile = if lockdep then ./configs/lockdep else ./configs/defconfig;
#};
kernel = nixos-fcs;
nixos-fcs-kernel = lib.makeOverridable ({gitCommit, lockStat ? false, preempt ? false, branch ? "fcs"}: pkgs.linuxPackagesFor (pkgs.buildLinux rec {
version = "6.2.8";
src = builtins.fetchGit {
url = "git@bscpm03.bsc.es:ompss-kernel/linux.git";
rev = gitCommit;
ref = branch;
};
structuredExtraConfig = with lib.kernel; {
# add general custom kernel options here
} // lib.optionalAttrs lockStat {
LOCK_STAT = yes;
} // lib.optionalAttrs preempt {
PREEMPT = lib.mkForce yes;
PREEMPT_VOLUNTARY = lib.mkForce no;
};
kernelPatches = [];
extraMeta.branch = lib.versions.majorMinor version;
}));
nixos-fcs = nixos-fcs-kernel {gitCommit = "8a09822dfcc8f0626b209d6d2aec8b5da459dfee";};
nixos-fcs-lockstat = nixos-fcs.override {
lockStat = true;
};
nixos-fcs-lockstat-preempt = nixos-fcs.override {
lockStat = true;
preempt = true;
};
latest = pkgs.linuxPackages_latest;
in {
imports = [
../eudy/kernel/lttng.nix
../eudy/kernel/perf.nix
];
boot.kernelPackages = lib.mkForce kernel;
# disable all cpu mitigations
boot.kernelParams = [
"mitigations=off"
];
# enable memory overcommit, needed to build a taglibc system using nix after
# increasing the openblas memory footprint
boot.kernel.sysctl."vm.overcommit_memory" = 1;
}

84
m/lake2/configuration.nix Normal file
View File

@ -0,0 +1,84 @@
{ config, pkgs, lib, modulesPath, ... }:
{
imports = [
../common/ssf.nix
../module/monitoring.nix
../module/hut-substituter.nix
];
boot.loader.grub.device = "/dev/disk/by-id/wwn-0x55cd2e414d53563a";
boot.kernel.sysctl = {
"kernel.yama.ptrace_scope" = lib.mkForce "1";
};
environment.systemPackages = with pkgs; [
ceph
];
services.ceph = {
enable = true;
global = {
fsid = "9c8d06e0-485f-4aaf-b16b-06d6daf1232b";
monHost = "10.0.40.40";
monInitialMembers = "bay";
clusterNetwork = "10.0.40.40/24"; # Use Ethernet only
};
osd = {
enable = true;
# One daemon per NVME disk
daemons = [ "4" "5" "6" "7" ];
extraConfig = {
"osd crush chooseleaf type" = "0";
"osd journal size" = "10000";
"osd pool default min size" = "2";
"osd pool default pg num" = "200";
"osd pool default pgp num" = "200";
"osd pool default size" = "3";
};
};
};
networking = {
hostName = "lake2";
interfaces.eno1.ipv4.addresses = [ {
address = "10.0.40.42";
prefixLength = 24;
} ];
interfaces.ibp5s0.ipv4.addresses = [ {
address = "10.0.42.42";
prefixLength = 24;
} ];
firewall = {
extraCommands = ''
# Accept all incoming TCP traffic from bay
iptables -A nixos-fw -p tcp -s bay -j nixos-fw-accept
# Accept monitoring requests from hut
iptables -A nixos-fw -p tcp -s hut --dport 9002 -j nixos-fw-accept
# Accept all Ceph traffic from the local network
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 -m multiport --dport 3300,6789,6800:7568 -j nixos-fw-accept
'';
};
};
# Missing service for volumes, see:
# https://www.reddit.com/r/ceph/comments/14otjyo/comment/jrd69vt/
systemd.services.ceph-volume = {
enable = true;
description = "Ceph Volume activation";
unitConfig = {
Type = "oneshot";
After = "local-fs.target";
Wants = "local-fs.target";
};
path = [ pkgs.ceph pkgs.util-linux pkgs.lvm2 pkgs.cryptsetup ];
serviceConfig = {
KillMode = "none";
Environment = "CEPH_VOLUME_TIMEOUT=10000";
ExecStart = "/bin/sh -c 'timeout $CEPH_VOLUME_TIMEOUT ${pkgs.ceph}/bin/ceph-volume lvm activate --all --no-systemd'";
TimeoutSec = "0";
};
wantedBy = [ "multi-user.target" ];
};
}

70
m/map.nix Normal file
View File

@ -0,0 +1,70 @@
{
# In physical order from top to bottom (see note below)
ssf = {
# Switches for Ethernet and OmniPath
switch-C6-S1A-05 = { pos=42; size=1; model="Dell S3048-ON"; };
switch-opa = { pos=41; size=1; };
# SSF login
apex = { pos=39; size=2; label="SSFHEAD"; board="R2208WTTYSR"; contact="rodrigo.arias@bsc.es"; };
# Storage
bay = { pos=38; size=1; label="MDS01"; board="S2600WT2R"; sn="BQWL64850303"; contact="rodrigo.arias@bsc.es"; };
lake1 = { pos=37; size=1; label="OSS01"; board="S2600WT2R"; sn="BQWL64850234"; contact="rodrigo.arias@bsc.es"; };
lake2 = { pos=36; size=1; label="OSS02"; board="S2600WT2R"; sn="BQWL64850266"; contact="rodrigo.arias@bsc.es"; };
# Compute xeon
owl1 = { pos=35; size=1; label="SSF-XEON01"; board="S2600WTTR"; sn="BQWL64954172"; contact="rodrigo.arias@bsc.es"; };
owl2 = { pos=34; size=1; label="SSF-XEON02"; board="S2600WTTR"; sn="BQWL64756560"; contact="rodrigo.arias@bsc.es"; };
xeon03 = { pos=33; size=1; label="SSF-XEON03"; board="S2600WTTR"; sn="BQWL64750826"; contact="rodrigo.arias@bsc.es"; };
# Slot 34 empty
koro = { pos=31; size=1; label="SSF-XEON05"; board="S2600WTTR"; sn="BQWL64954293"; contact="rodrigo.arias@bsc.es"; };
weasel = { pos=30; size=1; label="SSF-XEON06"; board="S2600WTTR"; sn="BQWL64750846"; contact="antoni.navarro@bsc.es"; };
hut = { pos=29; size=1; label="SSF-XEON07"; board="S2600WTTR"; sn="BQWL64751184"; contact="rodrigo.arias@bsc.es"; };
eudy = { pos=28; size=1; label="SSF-XEON08"; board="S2600WTTR"; sn="BQWL64756586"; contact="aleix.rocanonell@bsc.es"; };
# 16 KNL nodes, 4 per chassis
knl01_04 = { pos=26; size=2; label="KNL01..KNL04"; board="HNS7200APX"; };
knl05_08 = { pos=24; size=2; label="KNL05..KNL18"; board="HNS7200APX"; };
knl09_12 = { pos=22; size=2; label="KNL09..KNL12"; board="HNS7200APX"; };
knl13_16 = { pos=20; size=2; label="KNL13..KNL16"; board="HNS7200APX"; };
# Slot 19 empty
# EPI (hw team, guessed order)
epi01 = { pos=18; size=1; contact="joan.cabre@bsc.es"; };
epi02 = { pos=17; size=1; contact="joan.cabre@bsc.es"; };
epi03 = { pos=16; size=1; contact="joan.cabre@bsc.es"; };
anon = { pos=14; size=2; }; # Unlabeled machine. Operative
# These are old and decommissioned (off)
power8 = { pos=12; size=2; label="BSCPOWER8N3"; decommissioned=true; };
powern1 = { pos=8; size=4; label="BSCPOWERN1"; decommissioned=true; };
gustafson = { pos=7; size=1; label="gustafson"; decommissioned=true; };
odap01 = { pos=3; size=4; label="ODAP01"; decommissioned=true; };
amhdal = { pos=2; size=1; label="AMHDAL"; decommissioned=true; }; # sic
moore = { pos=1; size=1; label="moore (earth)"; decommissioned=true; };
};
bsc2218 = {
raccoon = { board="W2600CR"; sn="QSIP22500829"; contact="rodrigo.arias@bsc.es"; };
tent = { label="SSF-XEON04"; board="S2600WTTR"; sn="BQWL64751229"; contact="rodrigo.arias@bsc.es"; };
};
upc = {
fox = { board="H13DSG-O-CPU"; sn="UM24CS600392"; prod="AS-4125GS-TNRT"; prod_sn="E508839X5103339"; contact="rodrigo.arias@bsc.es"; };
};
# NOTE: Position is specified in "U" units (44.45 mm) and starts at 1 from the
# bottom. Example:
#
# | ... | - [pos+size] <--- Label in chassis
# +--------+
# | node | - [pos+1]
# | 2U | - [pos]
# +------- +
# | ... | - [pos-1]
#
# NOTE: The board and sn refers to the FRU information (Board Product and
# Board Serial) via `ipmitool fru print 0`.
}

357
m/module/agenix.nix Normal file
View File

@ -0,0 +1,357 @@
{
config,
options,
lib,
pkgs,
...
}:
with lib;
let
cfg = config.age;
isDarwin = lib.attrsets.hasAttrByPath [ "environment" "darwinConfig" ] options;
ageBin = config.age.ageBin;
users = config.users.users;
sysusersEnabled =
if isDarwin then
false
else
options.systemd ? sysusers && (config.systemd.sysusers.enable || config.services.userborn.enable);
mountCommand =
if isDarwin then
''
if ! diskutil info "${cfg.secretsMountPoint}" &> /dev/null; then
num_sectors=1048576
dev=$(hdiutil attach -nomount ram://"$num_sectors" | sed 's/[[:space:]]*$//')
newfs_hfs -v agenix "$dev"
mount -t hfs -o nobrowse,nodev,nosuid,-m=0751 "$dev" "${cfg.secretsMountPoint}"
fi
''
else
''
grep -q "${cfg.secretsMountPoint} ramfs" /proc/mounts ||
mount -t ramfs none "${cfg.secretsMountPoint}" -o nodev,nosuid,mode=0751
'';
newGeneration = ''
_agenix_generation="$(basename "$(readlink ${cfg.secretsDir})" || echo 0)"
(( ++_agenix_generation ))
echo "[agenix] creating new generation in ${cfg.secretsMountPoint}/$_agenix_generation"
mkdir -p "${cfg.secretsMountPoint}"
chmod 0751 "${cfg.secretsMountPoint}"
${mountCommand}
mkdir -p "${cfg.secretsMountPoint}/$_agenix_generation"
chmod 0751 "${cfg.secretsMountPoint}/$_agenix_generation"
'';
chownGroup = if isDarwin then "admin" else "keys";
# chown the secrets mountpoint and the current generation to the keys group
# instead of leaving it root:root.
chownMountPoint = ''
chown :${chownGroup} "${cfg.secretsMountPoint}" "${cfg.secretsMountPoint}/$_agenix_generation"
'';
setTruePath = secretType: ''
${
if secretType.symlink then
''
_truePath="${cfg.secretsMountPoint}/$_agenix_generation/${secretType.name}"
''
else
''
_truePath="${secretType.path}"
''
}
'';
installSecret = secretType: ''
${setTruePath secretType}
echo "decrypting '${secretType.file}' to '$_truePath'..."
TMP_FILE="$_truePath.tmp"
IDENTITIES=()
for identity in ${toString cfg.identityPaths}; do
test -r "$identity" || continue
test -s "$identity" || continue
IDENTITIES+=(-i)
IDENTITIES+=("$identity")
done
test "''${#IDENTITIES[@]}" -eq 0 && echo "[agenix] WARNING: no readable identities found!"
mkdir -p "$(dirname "$_truePath")"
[ "${secretType.path}" != "${cfg.secretsDir}/${secretType.name}" ] && mkdir -p "$(dirname "${secretType.path}")"
(
umask u=r,g=,o=
test -f "${secretType.file}" || echo '[agenix] WARNING: encrypted file ${secretType.file} does not exist!'
test -d "$(dirname "$TMP_FILE")" || echo "[agenix] WARNING: $(dirname "$TMP_FILE") does not exist!"
LANG=${
config.i18n.defaultLocale or "C"
} ${ageBin} --decrypt "''${IDENTITIES[@]}" -o "$TMP_FILE" "${secretType.file}"
)
chmod ${secretType.mode} "$TMP_FILE"
mv -f "$TMP_FILE" "$_truePath"
${optionalString secretType.symlink ''
[ "${secretType.path}" != "${cfg.secretsDir}/${secretType.name}" ] && ln -sfT "${cfg.secretsDir}/${secretType.name}" "${secretType.path}"
''}
'';
testIdentities = map (path: ''
test -f ${path} || echo '[agenix] WARNING: config.age.identityPaths entry ${path} not present!'
'') cfg.identityPaths;
cleanupAndLink = ''
_agenix_generation="$(basename "$(readlink ${cfg.secretsDir})" || echo 0)"
(( ++_agenix_generation ))
echo "[agenix] symlinking new secrets to ${cfg.secretsDir} (generation $_agenix_generation)..."
ln -sfT "${cfg.secretsMountPoint}/$_agenix_generation" ${cfg.secretsDir}
(( _agenix_generation > 1 )) && {
echo "[agenix] removing old secrets (generation $(( _agenix_generation - 1 )))..."
rm -rf "${cfg.secretsMountPoint}/$(( _agenix_generation - 1 ))"
}
'';
installSecrets = builtins.concatStringsSep "\n" (
[ "echo '[agenix] decrypting secrets...'" ]
++ testIdentities
++ (map installSecret (builtins.attrValues cfg.secrets))
++ [ cleanupAndLink ]
);
chownSecret = secretType: ''
${setTruePath secretType}
chown ${secretType.owner}:${secretType.group} "$_truePath"
'';
chownSecrets = builtins.concatStringsSep "\n" (
[ "echo '[agenix] chowning...'" ]
++ [ chownMountPoint ]
++ (map chownSecret (builtins.attrValues cfg.secrets))
);
secretType = types.submodule (
{ config, ... }:
{
options = {
name = mkOption {
type = types.str;
default = config._module.args.name;
defaultText = literalExpression "config._module.args.name";
description = ''
Name of the file used in {option}`age.secretsDir`
'';
};
file = mkOption {
type = types.path;
description = ''
Age file the secret is loaded from.
'';
};
path = mkOption {
type = types.str;
default = "${cfg.secretsDir}/${config.name}";
defaultText = literalExpression ''
"''${cfg.secretsDir}/''${config.name}"
'';
description = ''
Path where the decrypted secret is installed.
'';
};
mode = mkOption {
type = types.str;
default = "0400";
description = ''
Permissions mode of the decrypted secret in a format understood by chmod.
'';
};
owner = mkOption {
type = types.str;
default = "0";
description = ''
User of the decrypted secret.
'';
};
group = mkOption {
type = types.str;
default = users.${config.owner}.group or "0";
defaultText = literalExpression ''
users.''${config.owner}.group or "0"
'';
description = ''
Group of the decrypted secret.
'';
};
symlink = mkEnableOption "symlinking secrets to their destination" // {
default = true;
};
};
}
);
in
{
imports = [
(mkRenamedOptionModule [ "age" "sshKeyPaths" ] [ "age" "identityPaths" ])
];
options.age = {
ageBin = mkOption {
type = types.str;
default = "${pkgs.age}/bin/age";
defaultText = literalExpression ''
"''${pkgs.age}/bin/age"
'';
description = ''
The age executable to use.
'';
};
secrets = mkOption {
type = types.attrsOf secretType;
default = { };
description = ''
Attrset of secrets.
'';
};
secretsDir = mkOption {
type = types.path;
default = "/run/agenix";
description = ''
Folder where secrets are symlinked to
'';
};
secretsMountPoint = mkOption {
type =
types.addCheck types.str (
s:
(builtins.match "[ \t\n]*" s) == null # non-empty
&& (builtins.match ".+/" s) == null
) # without trailing slash
// {
description = "${types.str.description} (with check: non-empty without trailing slash)";
};
default = "/run/agenix.d";
description = ''
Where secrets are created before they are symlinked to {option}`age.secretsDir`
'';
};
identityPaths = mkOption {
type = types.listOf types.path;
default =
if isDarwin then
[
"/etc/ssh/ssh_host_ed25519_key"
"/etc/ssh/ssh_host_rsa_key"
]
else if (config.services.openssh.enable or false) then
map (e: e.path) (
lib.filter (e: e.type == "rsa" || e.type == "ed25519") config.services.openssh.hostKeys
)
else
[ ];
defaultText = literalExpression ''
if isDarwin
then [
"/etc/ssh/ssh_host_ed25519_key"
"/etc/ssh/ssh_host_rsa_key"
]
else if (config.services.openssh.enable or false)
then map (e: e.path) (lib.filter (e: e.type == "rsa" || e.type == "ed25519") config.services.openssh.hostKeys)
else [];
'';
description = ''
Path to SSH keys to be used as identities in age decryption.
'';
};
};
config = mkIf (cfg.secrets != { }) (mkMerge [
{
assertions = [
{
assertion = cfg.identityPaths != [ ];
message = "age.identityPaths must be set, for example by enabling openssh.";
}
];
}
(optionalAttrs (!isDarwin) {
# When using sysusers we no longer be started as an activation script
# because those are started in initrd while sysusers is started later.
systemd.services.agenix-install-secrets = mkIf sysusersEnabled {
wantedBy = [ "sysinit.target" ];
after = [ "systemd-sysusers.service" ];
unitConfig.DefaultDependencies = "no";
path = [ pkgs.mount ];
serviceConfig = {
Type = "oneshot";
ExecStart = pkgs.writeShellScript "agenix-install" (concatLines [
newGeneration
installSecrets
chownSecrets
]);
RemainAfterExit = true;
};
};
# Create a new directory full of secrets for symlinking (this helps
# ensure removed secrets are actually removed, or at least become
# invalid symlinks).
system.activationScripts = mkIf (!sysusersEnabled) {
agenixNewGeneration = {
text = newGeneration;
deps = [
"specialfs"
];
};
agenixInstall = {
text = installSecrets;
deps = [
"agenixNewGeneration"
"specialfs"
];
};
# So user passwords can be encrypted.
users.deps = [ "agenixInstall" ];
# Change ownership and group after users and groups are made.
agenixChown = {
text = chownSecrets;
deps = [
"users"
"groups"
];
};
# So other activation scripts can depend on agenix being done.
agenix = {
text = "";
deps = [ "agenixChown" ];
};
};
})
(optionalAttrs isDarwin {
launchd.daemons.activate-agenix = {
script = ''
set -e
set -o pipefail
export PATH="${pkgs.gnugrep}/bin:${pkgs.coreutils}/bin:@out@/sw/bin:/usr/bin:/bin:/usr/sbin:/sbin"
${newGeneration}
${installSecrets}
${chownSecrets}
exit 0
'';
serviceConfig = {
RunAtLoad = true;
KeepAlive.SuccessfulExit = false;
};
};
})
]);
}

49
m/module/amd-uprof.nix Normal file
View File

@ -0,0 +1,49 @@
{ config, lib, pkgs, ... }:
{
options = {
services.amd-uprof = {
enable = lib.mkOption {
type = lib.types.bool;
default = false;
description = "Whether to enable AMD uProf.";
};
};
};
# Only setup amd-uprof if enabled
config = lib.mkIf config.services.amd-uprof.enable {
# First make sure that we add the module to the list of available modules
# in the kernel matching the same kernel version of this configuration.
boot.extraModulePackages = with config.boot.kernelPackages; [ amd-uprof-driver ];
boot.kernelModules = [ "AMDPowerProfiler" ];
# Make the userspace tools available in $PATH.
environment.systemPackages = with pkgs; [ amd-uprof ];
# The AMDPowerProfiler module doesn't create the /dev device nor it emits
# any uevents, so we cannot use udev rules to automatically create the
# device. Instead, we run a systemd unit that does it after loading the
# modules.
systemd.services.amd-uprof-device = {
description = "Create /dev/AMDPowerProfiler device";
after = [ "systemd-modules-load.service" ];
wantedBy = [ "multi-user.target" ];
unitConfig.ConditionPathExists = [
"/proc/AMDPowerProfiler/device"
"!/dev/AMDPowerProfiler"
];
serviceConfig = {
Type = "oneshot";
RemainAfterExit = true;
ExecStart = pkgs.writeShellScript "add-amd-uprof-dev.sh" ''
mknod /dev/AMDPowerProfiler -m 666 c $(< /proc/AMDPowerProfiler/device) 0
'';
ExecStop = pkgs.writeShellScript "remove-amd-uprof-dev.sh" ''
rm -f /dev/AMDPowerProfiler
'';
};
};
};
}

24
m/module/ceph.nix Normal file
View File

@ -0,0 +1,24 @@
{ config, pkgs, ... }:
# Mounts the /ceph filesystem at boot
{
environment.systemPackages = with pkgs; [
ceph-client
fio # For benchmarks
];
# We need the ceph module loaded as the mount.ceph binary fails to run the
# modprobe command.
boot.kernelModules = [ "ceph" ];
age.secrets.cephUser.file = ../../secrets/ceph-user.age;
fileSystems."/ceph" = {
fsType = "ceph";
device = "user@9c8d06e0-485f-4aaf-b16b-06d6daf1232b.cephfs=/";
options = [
"mon_addr=10.0.40.40"
"secretfile=${config.age.secrets.cephUser.path}"
];
};
}

3
m/module/debuginfod.nix Normal file
View File

@ -0,0 +1,3 @@
{
services.nixseparatedebuginfod.enable = true;
}

3
m/module/emulation.nix Normal file
View File

@ -0,0 +1,3 @@
{
boot.binfmt.emulatedSystems = [ "armv7l-linux" "aarch64-linux" "powerpc64le-linux" "riscv64-linux" ];
}

View File

@ -0,0 +1,13 @@
{ config, ... }:
{
nix.settings =
# Don't add hut as a cache to itself
assert config.networking.hostName != "hut";
{
extra-substituters = [ "http://hut/cache" ];
extra-trusted-public-keys = [ "jungle.bsc.es:pEc7MlAT0HEwLQYPtpkPLwRsGf80ZI26aj29zMw/HH0=" ];
# Set a low timeout in case hut is down
connect-timeout = 3; # seconds
};
}

24
m/module/jungle-users.nix Normal file
View File

@ -0,0 +1,24 @@
{ config, lib, ... }:
with lib;
{
options = {
users.jungleUsers = mkOption {
type = types.attrsOf (types.anything // { check = (x: x ? "hosts"); });
description = ''
Same as users.users but with the extra `hosts` attribute, which controls
access to the nodes by `networking.hostName`.
'';
};
};
config = let
allowedUser = host: userConf: builtins.elem host userConf.hosts;
filterUsers = host: users: filterAttrs (n: v: allowedUser host v) users;
removeHosts = users: mapAttrs (n: v: builtins.removeAttrs v [ "hosts" ]) users;
currentHost = config.networking.hostName;
in {
users.users = removeHosts (filterUsers currentHost config.users.jungleUsers);
};
}

View File

@ -0,0 +1,17 @@
{ config, lib, pkgs, ... }:
with lib;
{
systemd.services."prometheus-meteocat-exporter" = {
wantedBy = [ "multi-user.target" ];
after = [ "network.target" ];
serviceConfig = {
Restart = mkDefault "always";
PrivateTmp = mkDefault true;
WorkingDirectory = mkDefault "/tmp";
DynamicUser = mkDefault true;
ExecStart = "${pkgs.meteocat-exporter}/bin/meteocat-exporter";
};
};
}

25
m/module/monitoring.nix Normal file
View File

@ -0,0 +1,25 @@
{ config, lib, ... }:
{
# We need access to the devices to monitor the disk space
systemd.services.prometheus-node-exporter.serviceConfig.PrivateDevices = lib.mkForce false;
systemd.services.prometheus-node-exporter.serviceConfig.ProtectHome = lib.mkForce "read-only";
# Required to allow the smartctl exporter to read the nvme0 character device,
# see the commit message on:
# https://github.com/NixOS/nixpkgs/commit/12c26aca1fd55ab99f831bedc865a626eee39f80
services.udev.extraRules = ''
SUBSYSTEM=="nvme", KERNEL=="nvme[0-9]*", GROUP="disk"
'';
services.prometheus = {
exporters = {
node = {
enable = true;
enabledCollectors = [ "systemd" ];
port = 9002;
};
smartctl.enable = true;
};
};
}

26
m/module/nix-daemon-builds.sh Executable file
View File

@ -0,0 +1,26 @@
#!/bin/sh
# Locate nix daemon pid
nd=$(pgrep -o nix-daemon)
# Locate children of nix-daemon
pids1=$(tr ' ' '\n' < "/proc/$nd/task/$nd/children")
# For each children, locate 2nd level children
pids2=$(echo "$pids1" | xargs -I @ /bin/sh -c 'cat /proc/@/task/*/children' | tr ' ' '\n')
cat <<EOF
HTTP/1.1 200 OK
Content-Type: text/plain; version=0.0.4; charset=utf-8; escaping=values
# HELP nix_daemon_build Nix daemon derivation build state.
# TYPE nix_daemon_build gauge
EOF
for pid in $pids2; do
name=$(cat /proc/$pid/environ 2>/dev/null | tr '\0' '\n' | rg "^name=(.+)" - --replace '$1' | tr -dc ' [:alnum:]_\-\.')
user=$(ps -o uname= -p "$pid")
if [ -n "$name" -a -n "$user" ]; then
printf 'nix_daemon_build{user="%s",name="%s"} 1\n' "$user" "$name"
fi
done

View File

@ -0,0 +1,23 @@
{ pkgs, config, lib, ... }:
let
script = pkgs.runCommand "nix-daemon-exporter.sh" { }
''
cp ${./nix-daemon-builds.sh} $out;
chmod +x $out
''
;
in
{
systemd.services.nix-daemon-exporter = {
description = "Daemon to export nix-daemon metrics";
path = [ pkgs.procps pkgs.ripgrep ];
wantedBy = [ "default.target" ];
serviceConfig = {
Type = "simple";
ExecStart = "${pkgs.socat}/bin/socat TCP4-LISTEN:9999,fork EXEC:${script}";
# Needed root to read the environment, potentially unsafe
User = "root";
Group = "root";
};
};
}

20
m/module/nvidia.nix Normal file
View File

@ -0,0 +1,20 @@
{ lib, config, pkgs, ... }:
{
# Configure Nvidia driver to use with CUDA
hardware.nvidia.package = config.boot.kernelPackages.nvidiaPackages.production;
hardware.nvidia.open = lib.mkDefault (builtins.abort "hardware.nvidia.open not set");
hardware.graphics.enable = true;
nixpkgs.config.nvidia.acceptLicense = true;
services.xserver.videoDrivers = [ "nvidia" ];
# enable support for derivations which require nvidia-gpu to be available
# > requiredSystemFeatures = [ "cuda" ];
programs.nix-required-mounts.enable = true;
programs.nix-required-mounts.presets.nvidia-gpu.enable = true;
# They forgot to add the symlink
programs.nix-required-mounts.allowedPatterns.nvidia-gpu.paths = [
config.systemd.tmpfiles.settings.graphics-driver."/run/opengl-driver"."L+".argument
];
environment.systemPackages = [ pkgs.cudainfo ];
}

68
m/module/p.nix Normal file
View File

@ -0,0 +1,68 @@
{ config, lib, pkgs, ... }:
let
cfg = config.services.p;
in
{
options = {
services.p = {
enable = lib.mkOption {
type = lib.types.bool;
default = false;
description = "Whether to enable the p service.";
};
path = lib.mkOption {
type = lib.types.str;
default = "/var/lib/p";
description = "Where to save the pasted files on disk.";
};
url = lib.mkOption {
type = lib.types.str;
default = "https://jungle.bsc.es/p";
description = "URL prefix for the printed file.";
};
};
};
config = lib.mkIf cfg.enable {
environment.systemPackages = let
p = pkgs.writeShellScriptBin "p" ''
set -e
pastedir="${cfg.path}/$USER"
cd "$pastedir"
ext="txt"
if [ -n "$1" ]; then
ext="$1"
fi
out=$(mktemp "XXXXXXXX.$ext")
cat > "$out"
chmod go+r "$out"
echo "${cfg.url}/$USER/$out"
'';
in [ p ];
systemd.services.p = let
# Take only normal users
users = lib.filterAttrs (_: v: v.isNormalUser) config.users.users;
# Create a directory for each user
commands = lib.concatLists (lib.mapAttrsToList (_: user: [
"install -d -o ${user.name} -g ${user.group} -m 0755 ${cfg.path}/${user.name}"
]) users);
in {
description = "P service setup";
requires = [ "network-online.target" ];
#wants = [ "remote-fs.target" ];
#after = [ "remote-fs.target" ];
wantedBy = [ "multi-user.target" ];
serviceConfig = {
ExecStart = pkgs.writeShellScript "p-init.sh" (''
install -d -o root -g root -m 0755 ${cfg.path}
'' + (lib.concatLines commands));
};
};
};
}

33
m/module/power-policy.nix Normal file
View File

@ -0,0 +1,33 @@
{ config, lib, pkgs, ... }:
with lib;
let
cfg = config.power.policy;
in
{
options = {
power.policy = mkOption {
type = types.nullOr (types.enum [ "always-on" "previous" "always-off" ]);
default = null;
description = "Set power policy to use via IPMI.";
};
};
config = mkIf (cfg != null) {
systemd.services."power-policy" = {
description = "Set power policy to use via IPMI";
wantedBy = [ "multi-user.target" ];
unitConfig = {
StartLimitBurst = "10";
StartLimitIntervalSec = "10m";
};
serviceConfig = {
ExecStart = "${pkgs.ipmitool}/bin/ipmitool chassis policy ${cfg}";
Type = "oneshot";
Restart = "on-failure";
RestartSec = "5s";
};
};
};
}

24
m/module/slurm-client.nix Normal file
View File

@ -0,0 +1,24 @@
{ lib, ... }:
{
imports = [
./slurm-common.nix
];
systemd.services.slurmd.serviceConfig = {
# Kill all processes in the control group on stop/restart. This will kill
# all the jobs running, so ensure that we only upgrade when the nodes are
# not in use. See:
# https://github.com/NixOS/nixpkgs/commit/ae93ed0f0d4e7be0a286d1fca86446318c0c6ffb
# https://bugs.schedmd.com/show_bug.cgi?id=2095#c24
KillMode = lib.mkForce "control-group";
# If slurmd fails to contact the control server it will fail, causing the
# node to remain out of service until manually restarted. Always try to
# restart it.
Restart = "always";
RestartSec = "30s";
};
services.slurm.client.enable = true;
}

115
m/module/slurm-common.nix Normal file
View File

@ -0,0 +1,115 @@
{ config, pkgs, ... }:
let
suspendProgram = pkgs.writeShellScript "suspend.sh" ''
exec 1>>/var/log/power_save.log 2>>/var/log/power_save.log
set -x
export "PATH=/run/current-system/sw/bin:$PATH"
echo "$(date) Suspend invoked $0 $*" >> /var/log/power_save.log
hosts=$(scontrol show hostnames $1)
for host in $hosts; do
echo Shutting down host: $host
ipmitool -I lanplus -H ''${host}-ipmi -P "" -U "" chassis power off
done
'';
resumeProgram = pkgs.writeShellScript "resume.sh" ''
exec 1>>/var/log/power_save.log 2>>/var/log/power_save.log
set -x
export "PATH=/run/current-system/sw/bin:$PATH"
echo "$(date) Suspend invoked $0 $*" >> /var/log/power_save.log
hosts=$(scontrol show hostnames $1)
for host in $hosts; do
echo Starting host: $host
ipmitool -I lanplus -H ''${host}-ipmi -P "" -U "" chassis power on
done
'';
in {
services.slurm = {
controlMachine = "apex";
clusterName = "jungle";
nodeName = [
"owl[1,2] Sockets=2 CoresPerSocket=14 ThreadsPerCore=2 Feature=owl"
"fox Sockets=8 CoresPerSocket=24 ThreadsPerCore=1"
];
partitionName = [
"owl Nodes=owl[1-2] Default=YES DefaultTime=01:00:00 MaxTime=INFINITE State=UP"
"fox Nodes=fox Default=NO DefaultTime=01:00:00 MaxTime=INFINITE State=UP"
];
# See slurm.conf(5) for more details about these options.
extraConfig = ''
# Use PMIx for MPI by default. It works okay with MPICH and OpenMPI, but
# not with Intel MPI. For that use the compatibility shim libpmi.so
# setting I_MPI_PMI_LIBRARY=$pmix/lib/libpmi.so while maintaining the PMIx
# library in SLURM (--mpi=pmix). See more details here:
# https://pm.bsc.es/gitlab/rarias/jungle/-/issues/16
MpiDefault=pmix
# When a node reboots return that node to the slurm queue as soon as it
# becomes operative again.
ReturnToService=2
# Track all processes by using a cgroup
ProctrackType=proctrack/cgroup
# Enable task/affinity to allow the jobs to run in a specified subset of
# the resources. Use the task/cgroup plugin to enable process containment.
TaskPlugin=task/affinity,task/cgroup
# Power off unused nodes until they are requested
SuspendProgram=${suspendProgram}
SuspendTimeout=60
ResumeProgram=${resumeProgram}
ResumeTimeout=300
SuspendExcNodes=fox
# Turn the nodes off after 1 hour of inactivity
SuspendTime=3600
# Reduce port range so we can allow only this range in the firewall
SrunPortRange=60000-61000
# Use cores as consumable resources. In SLURM terms, a core may have
# multiple hardware threads (or CPUs).
SelectType=select/cons_tres
# Ignore memory constraints and only use unused cores to share a node with
# other jobs.
SelectTypeParameters=CR_Core
# Required for pam_slurm_adopt, see https://slurm.schedmd.com/pam_slurm_adopt.html
# This sets up the "extern" step into which ssh-launched processes will be
# adopted. Alloc runs the prolog at job allocation (salloc) rather than
# when a task runs (srun) so we can ssh early.
PrologFlags=Alloc,Contain,X11
# LaunchParameters=ulimit_pam_adopt will set RLIMIT_RSS in processes
# adopted by the external step, similar to tasks running in regular steps
# LaunchParameters=ulimit_pam_adopt
SlurmdDebug=debug5
#DebugFlags=Protocol,Cgroup
'';
extraCgroupConfig = ''
CgroupPlugin=cgroup/v2
#ConstrainCores=yes
'';
};
# Place the slurm config in /etc as this will be required by PAM
environment.etc.slurm.source = config.services.slurm.etcSlurm;
age.secrets.mungeKey = {
file = ../../secrets/munge-key.age;
owner = "munge";
group = "munge";
};
services.munge = {
enable = true;
password = config.age.secrets.mungeKey.path;
};
}

View File

@ -0,0 +1,28 @@
{ config, lib, pkgs, ... }:
# See also: https://github.com/NixOS/nixpkgs/pull/112010
# And: https://github.com/NixOS/nixpkgs/pull/115839
with lib;
{
systemd.services."prometheus-slurm-exporter" = {
wantedBy = [ "multi-user.target" ];
after = [ "network.target" ];
serviceConfig = {
Restart = mkDefault "always";
PrivateTmp = mkDefault true;
WorkingDirectory = mkDefault "/tmp";
DynamicUser = mkDefault true;
ExecStart = ''
${pkgs.prometheus-slurm-exporter}/bin/prometheus-slurm-exporter --listen-address "127.0.0.1:9341"
'';
Environment = [
"PATH=${pkgs.slurm}/bin"
# We need to specify the slurm config to be able to talk to the slurmd
# daemon.
"SLURM_CONF=${config.services.slurm.etcSlurm}/slurm.conf"
];
};
};
}

View File

@ -0,0 +1,8 @@
{ ... }:
{
networking.firewall = {
# Required for PMIx in SLURM, we should find a better way
allowedTCPPortRanges = [ { from=1024; to=65535; } ];
};
}

View File

@ -0,0 +1,19 @@
{ ... }:
{
# Mount the hut nix store via NFS
fileSystems."/mnt/hut-nix-store" = {
device = "hut:/nix/store";
fsType = "nfs";
options = [ "ro" ];
};
systemd.services.slurmd.serviceConfig = {
# When running a job, bind the hut store in /nix/store so the paths are
# available too.
# FIXME: This doesn't keep the programs in /run/current-system/sw/bin
# available in the store. Ideally they should be merged but the overlay FS
# doesn't work when the underlying directories change.
BindReadOnlyPaths = "/mnt/hut-nix-store:/nix/store";
};
}

23
m/module/slurm-server.nix Normal file
View File

@ -0,0 +1,23 @@
{ ... }:
{
imports = [
./slurm-common.nix
];
services.slurm.server.enable = true;
networking.firewall = {
extraCommands = ''
# Accept slurm connections to controller from compute nodes
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 6817 -j nixos-fw-accept
# Accept slurm connections from compute nodes for srun
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 60000:61000 -j nixos-fw-accept
# Accept slurm connections to controller from fox (via wireguard)
iptables -A nixos-fw -p tcp -i wg0 -s 10.106.0.1/32 --dport 6817 -j nixos-fw-accept
# Accept slurm connections from fox for srun (via wireguard)
iptables -A nixos-fw -p tcp -i wg0 -s 10.106.0.1/32 --dport 60000:61000 -j nixos-fw-accept
'';
};
}

View File

@ -0,0 +1,17 @@
{ config, lib, pkgs, ... }:
with lib;
{
systemd.services."prometheus-upc-qaire-exporter" = {
wantedBy = [ "multi-user.target" ];
after = [ "network.target" ];
serviceConfig = {
Restart = mkDefault "always";
PrivateTmp = mkDefault true;
WorkingDirectory = mkDefault "/tmp";
DynamicUser = mkDefault true;
ExecStart = "${pkgs.upc-qaire-exporter}/bin/upc-qaire-exporter";
};
};
}

35
m/module/vpn-dac.nix Normal file
View File

@ -0,0 +1,35 @@
{config, ...}:
{
age.secrets.vpn-dac-login.file = ../../secrets/vpn-dac-login.age;
age.secrets.vpn-dac-client-key.file = ../../secrets/vpn-dac-client-key.age;
services.openvpn.servers = {
# systemctl status openvpn-dac.service
dac = {
config = ''
client
dev tun
proto tcp
remote vpn.ac.upc.edu 1194
remote vpn.ac.upc.edu 80
resolv-retry infinite
nobind
persist-key
persist-tun
ca ${./vpn-dac/ca.crt}
cert ${./vpn-dac/client.crt}
# Only key needs to be secret
key ${config.age.secrets.vpn-dac-client-key.path}
remote-cert-tls server
comp-lzo
verb 3
auth-user-pass ${config.age.secrets.vpn-dac-login.path}
reneg-sec 0
# Only route fox-ipmi
pull-filter ignore "route "
route 147.83.35.27 255.255.255.255
'';
};
};
}

31
m/module/vpn-dac/ca.crt Normal file
View File

@ -0,0 +1,31 @@
-----BEGIN CERTIFICATE-----
MIIFUjCCBDqgAwIBAgIJAJH118PApk5hMA0GCSqGSIb3DQEBCwUAMIHLMQswCQYD
VQQGEwJFUzESMBAGA1UECBMJQmFyY2Vsb25hMRIwEAYDVQQHEwlCYXJjZWxvbmEx
LTArBgNVBAoTJFVuaXZlcnNpdGF0IFBvbGl0ZWNuaWNhIGRlIENhdGFsdW55YTEk
MCIGA1UECxMbQXJxdWl0ZWN0dXJhIGRlIENvbXB1dGFkb3JzMRAwDgYDVQQDEwdM
Q0FDIENBMQ0wCwYDVQQpEwRMQ0FDMR4wHAYJKoZIhvcNAQkBFg9sY2FjQGFjLnVw
Yy5lZHUwHhcNMTYwMTEyMTI0NDIxWhcNNDYwMTEyMTI0NDIxWjCByzELMAkGA1UE
BhMCRVMxEjAQBgNVBAgTCUJhcmNlbG9uYTESMBAGA1UEBxMJQmFyY2Vsb25hMS0w
KwYDVQQKEyRVbml2ZXJzaXRhdCBQb2xpdGVjbmljYSBkZSBDYXRhbHVueWExJDAi
BgNVBAsTG0FycXVpdGVjdHVyYSBkZSBDb21wdXRhZG9yczEQMA4GA1UEAxMHTENB
QyBDQTENMAsGA1UEKRMETENBQzEeMBwGCSqGSIb3DQEJARYPbGNhY0BhYy51cGMu
ZWR1MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA0CteSeof7Xwi51kC
F0nQ4E9iR5Lq7wtfRuVPn6JJcIxJJ6+F9gr4R/HIHTztW4XAzReE36DYfexupx3D
6UgQIkMLlVyGqRbulNF+RnCx20GosF7Dm4RGBVvOxBP1PGjYq/A+XhaaDAFd0cOF
LMNkzuYP7PF0bnBEaHnxmN8bPmuyDyas7fK9AAc3scyWT2jSBPbOVFvCJwPg8MH9
V/h+hKwL/7hRt1MVfVv2qyIuKwTki8mUt0RcVbP7oJoRY5K1+R52phIz/GL/b4Fx
L6MKXlQxLi8vzP4QZXgCMyV7oFNdU3VqCEXBA11YIRvsOZ4QS19otIk/ZWU5x+HH
LAIJ7wIDAQABo4IBNTCCATEwHQYDVR0OBBYEFNyezX1cH1N4QR14ebBpljqmtE7q
MIIBAAYDVR0jBIH4MIH1gBTcns19XB9TeEEdeHmwaZY6prRO6qGB0aSBzjCByzEL
MAkGA1UEBhMCRVMxEjAQBgNVBAgTCUJhcmNlbG9uYTESMBAGA1UEBxMJQmFyY2Vs
b25hMS0wKwYDVQQKEyRVbml2ZXJzaXRhdCBQb2xpdGVjbmljYSBkZSBDYXRhbHVu
eWExJDAiBgNVBAsTG0FycXVpdGVjdHVyYSBkZSBDb21wdXRhZG9yczEQMA4GA1UE
AxMHTENBQyBDQTENMAsGA1UEKRMETENBQzEeMBwGCSqGSIb3DQEJARYPbGNhY0Bh
Yy51cGMuZWR1ggkAkfXXw8CmTmEwDAYDVR0TBAUwAwEB/zANBgkqhkiG9w0BAQsF
AAOCAQEAUAmOvVXIQrR+aZVO0bOTeugKBHB75eTIZSIHIn2oDUvDbAP5GXIJ56A1
6mZXxemSMY8/9k+pRcwJhfat3IgvAN159XSqf9kRv0NHgc3FWUI1Qv/BsAn0vJO/
oK0dbmbbRWqt86qNrCN+cUfz5aovvxN73jFfnvfDQFBk/8enj9wXxYfokjjLPR1Q
+oTkH8dY68qf71oaUB9MndppPEPSz0K1S6h1XxvJoSu9MVSXOQHiq1cdZdxRazI3
4f7q9sTCL+khwDAuZxAYzlEYxFFa/NN8PWU6xPw6V+t/aDhOiXUPJQB/O/K7mw3Z
TQQx5NqM7B5jjak5fauR3/oRD8XXsA==
-----END CERTIFICATE-----

100
m/module/vpn-dac/client.crt Normal file
View File

@ -0,0 +1,100 @@
Certificate:
Data:
Version: 3 (0x2)
Serial Number: 2 (0x2)
Signature Algorithm: sha256WithRSAEncryption
Issuer: C=ES, ST=Barcelona, L=Barcelona, O=Universitat Politecnica de Catalunya, OU=Arquitectura de Computadors, CN=LCAC CA/name=LCAC/emailAddress=lcac@ac.upc.edu
Validity
Not Before: Jan 12 12:45:41 2016 GMT
Not After : Jan 12 12:45:41 2046 GMT
Subject: C=ES, ST=Barcelona, L=Barcelona, O=Universitat Politecnica de Catalunya, OU=Arquitectura de Computadors, CN=client/name=LCAC/emailAddress=lcac@ac.upc.edu
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
Public-Key: (2048 bit)
Modulus:
00:97:99:fa:7a:0e:4d:e2:1d:a5:b1:a8:14:18:64:
c7:66:bf:de:99:1d:92:3b:86:82:4d:95:39:f7:a6:
56:49:97:14:4f:e3:37:00:6c:f4:d0:1d:56:79:e7:
19:b5:dd:36:15:8e:1d:57:7b:59:29:d2:11:bf:58:
48:e0:f7:41:3d:16:64:8d:a2:0b:4a:ac:fa:c6:83:
dc:10:2a:2c:d9:97:48:ee:11:2a:bc:4b:60:dd:b9:
2e:8f:45:ca:87:0b:38:65:1c:f8:a2:1d:f9:50:aa:
6e:60:f9:48:df:57:12:23:e1:e7:0c:81:5c:9f:c5:
b2:e6:99:99:95:30:6d:57:36:06:8c:fd:fb:f9:4f:
60:d2:3c:ba:ae:28:56:2f:da:58:5c:e8:c5:7b:ec:
76:d9:28:6e:fb:8c:07:f9:d7:23:c3:72:76:3c:fa:
dc:20:67:8f:cc:16:e0:91:07:d5:68:f9:20:4d:7d:
5c:2d:02:04:16:76:52:f3:53:be:a3:dc:0d:d5:fb:
6b:55:29:f3:52:35:c8:7d:99:d1:4a:94:be:b1:8e:
fd:85:18:25:eb:41:e9:56:da:af:62:84:20:0a:00:
17:94:92:94:91:6a:f8:54:37:17:ee:1e:bb:fb:93:
71:91:d9:e4:e9:b8:3b:18:7d:6d:7d:4c:ce:58:55:
f9:41
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Basic Constraints:
CA:FALSE
Netscape Comment:
Easy-RSA Generated Certificate
X509v3 Subject Key Identifier:
1B:88:06:D5:33:1D:5C:48:46:B5:DE:78:89:36:96:91:3A:74:43:18
X509v3 Authority Key Identifier:
keyid:DC:9E:CD:7D:5C:1F:53:78:41:1D:78:79:B0:69:96:3A:A6:B4:4E:EA
DirName:/C=ES/ST=Barcelona/L=Barcelona/O=Universitat Politecnica de Catalunya/OU=Arquitectura de Computadors/CN=LCAC CA/name=LCAC/emailAddress=lcac@ac.upc.edu
serial:91:F5:D7:C3:C0:A6:4E:61
X509v3 Extended Key Usage:
TLS Web Client Authentication
X509v3 Key Usage:
Digital Signature
X509v3 Subject Alternative Name:
DNS:client
Signature Algorithm: sha256WithRSAEncryption
42:e8:50:b2:e7:88:75:86:0b:bb:29:e3:aa:c6:0e:4c:e8:ea:
3d:0c:02:31:7f:3b:80:0c:3f:80:af:45:d6:62:27:a0:0e:e7:
26:09:12:97:95:f8:d9:9b:89:b5:ef:56:64:f1:de:82:74:e0:
31:0a:cc:90:0a:bd:50:b8:54:95:0a:ae:3b:40:df:76:b6:d1:
01:2e:f3:96:9f:52:d4:e9:14:6d:b7:14:9d:45:99:33:36:2a:
01:0b:15:1a:ed:55:dc:64:83:65:1a:06:42:d9:c7:dc:97:d4:
02:81:c2:58:2b:ea:e4:b7:ae:84:3a:e4:3f:f1:2e:fa:ec:f3:
40:5d:b8:6a:d5:5e:e1:e8:2f:e2:2f:48:a4:38:a1:4f:22:e3:
4f:66:94:aa:02:78:9a:2b:7a:5d:aa:aa:51:a5:e3:d0:91:e9:
1d:f9:08:ed:8b:51:c9:a6:af:46:85:b5:1c:ed:12:a1:28:33:
75:36:00:d8:5c:14:65:96:c0:28:7d:47:50:a4:89:5f:b0:72:
1a:4b:13:17:26:0f:f0:b8:65:3c:e9:96:36:f9:bf:90:59:33:
87:1f:01:03:25:f8:f0:3a:9b:33:02:d0:0a:43:b5:0a:cf:62:
a1:45:38:37:07:9d:9c:94:0b:31:c6:3c:34:b7:fc:5a:0c:e4:
bf:23:f6:7d
-----BEGIN CERTIFICATE-----
MIIFqjCCBJKgAwIBAgIBAjANBgkqhkiG9w0BAQsFADCByzELMAkGA1UEBhMCRVMx
EjAQBgNVBAgTCUJhcmNlbG9uYTESMBAGA1UEBxMJQmFyY2Vsb25hMS0wKwYDVQQK
EyRVbml2ZXJzaXRhdCBQb2xpdGVjbmljYSBkZSBDYXRhbHVueWExJDAiBgNVBAsT
G0FycXVpdGVjdHVyYSBkZSBDb21wdXRhZG9yczEQMA4GA1UEAxMHTENBQyBDQTEN
MAsGA1UEKRMETENBQzEeMBwGCSqGSIb3DQEJARYPbGNhY0BhYy51cGMuZWR1MB4X
DTE2MDExMjEyNDU0MVoXDTQ2MDExMjEyNDU0MVowgcoxCzAJBgNVBAYTAkVTMRIw
EAYDVQQIEwlCYXJjZWxvbmExEjAQBgNVBAcTCUJhcmNlbG9uYTEtMCsGA1UEChMk
VW5pdmVyc2l0YXQgUG9saXRlY25pY2EgZGUgQ2F0YWx1bnlhMSQwIgYDVQQLExtB
cnF1aXRlY3R1cmEgZGUgQ29tcHV0YWRvcnMxDzANBgNVBAMTBmNsaWVudDENMAsG
A1UEKRMETENBQzEeMBwGCSqGSIb3DQEJARYPbGNhY0BhYy51cGMuZWR1MIIBIjAN
BgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAl5n6eg5N4h2lsagUGGTHZr/emR2S
O4aCTZU596ZWSZcUT+M3AGz00B1WeecZtd02FY4dV3tZKdIRv1hI4PdBPRZkjaIL
Sqz6xoPcECos2ZdI7hEqvEtg3bkuj0XKhws4ZRz4oh35UKpuYPlI31cSI+HnDIFc
n8Wy5pmZlTBtVzYGjP37+U9g0jy6rihWL9pYXOjFe+x22Shu+4wH+dcjw3J2PPrc
IGePzBbgkQfVaPkgTX1cLQIEFnZS81O+o9wN1ftrVSnzUjXIfZnRSpS+sY79hRgl
60HpVtqvYoQgCgAXlJKUkWr4VDcX7h67+5Nxkdnk6bg7GH1tfUzOWFX5QQIDAQAB
o4IBljCCAZIwCQYDVR0TBAIwADAtBglghkgBhvhCAQ0EIBYeRWFzeS1SU0EgR2Vu
ZXJhdGVkIENlcnRpZmljYXRlMB0GA1UdDgQWBBQbiAbVMx1cSEa13niJNpaROnRD
GDCCAQAGA1UdIwSB+DCB9YAU3J7NfVwfU3hBHXh5sGmWOqa0TuqhgdGkgc4wgcsx
CzAJBgNVBAYTAkVTMRIwEAYDVQQIEwlCYXJjZWxvbmExEjAQBgNVBAcTCUJhcmNl
bG9uYTEtMCsGA1UEChMkVW5pdmVyc2l0YXQgUG9saXRlY25pY2EgZGUgQ2F0YWx1
bnlhMSQwIgYDVQQLExtBcnF1aXRlY3R1cmEgZGUgQ29tcHV0YWRvcnMxEDAOBgNV
BAMTB0xDQUMgQ0ExDTALBgNVBCkTBExDQUMxHjAcBgkqhkiG9w0BCQEWD2xjYWNA
YWMudXBjLmVkdYIJAJH118PApk5hMBMGA1UdJQQMMAoGCCsGAQUFBwMCMAsGA1Ud
DwQEAwIHgDARBgNVHREECjAIggZjbGllbnQwDQYJKoZIhvcNAQELBQADggEBAELo
ULLniHWGC7sp46rGDkzo6j0MAjF/O4AMP4CvRdZiJ6AO5yYJEpeV+NmbibXvVmTx
3oJ04DEKzJAKvVC4VJUKrjtA33a20QEu85afUtTpFG23FJ1FmTM2KgELFRrtVdxk
g2UaBkLZx9yX1AKBwlgr6uS3roQ65D/xLvrs80BduGrVXuHoL+IvSKQ4oU8i409m
lKoCeJorel2qqlGl49CR6R35CO2LUcmmr0aFtRztEqEoM3U2ANhcFGWWwCh9R1Ck
iV+wchpLExcmD/C4ZTzpljb5v5BZM4cfAQMl+PA6mzMC0ApDtQrPYqFFODcHnZyU
CzHGPDS3/FoM5L8j9n0=
-----END CERTIFICATE-----

28
m/owl1/configuration.nix Normal file
View File

@ -0,0 +1,28 @@
{ config, pkgs, ... }:
{
imports = [
../common/ssf.nix
../module/ceph.nix
../module/emulation.nix
../module/slurm-client.nix
../module/slurm-firewall.nix
../module/debuginfod.nix
../module/hut-substituter.nix
];
# Select the this using the ID to avoid mismatches
boot.loader.grub.device = "/dev/disk/by-id/wwn-0x55cd2e414d53566c";
networking = {
hostName = "owl1";
interfaces.eno1.ipv4.addresses = [ {
address = "10.0.40.1";
prefixLength = 24;
} ];
interfaces.ibp5s0.ipv4.addresses = [ {
address = "10.0.42.1";
prefixLength = 24;
} ];
};
}

29
m/owl2/configuration.nix Normal file
View File

@ -0,0 +1,29 @@
{ config, pkgs, ... }:
{
imports = [
../common/ssf.nix
../module/ceph.nix
../module/emulation.nix
../module/slurm-client.nix
../module/slurm-firewall.nix
../module/debuginfod.nix
../module/hut-substituter.nix
];
# Select the this using the ID to avoid mismatches
boot.loader.grub.device = "/dev/disk/by-id/wwn-0x55cd2e414d535629";
networking = {
hostName = "owl2";
interfaces.eno1.ipv4.addresses = [ {
address = "10.0.40.2";
prefixLength = 24;
} ];
# Watch out! The OmniPath device is not in the same place here:
interfaces.ibp129s0.ipv4.addresses = [ {
address = "10.0.42.2";
prefixLength = 24;
} ];
};
}

View File

@ -0,0 +1,98 @@
{ config, pkgs, lib, modulesPath, ... }:
{
imports = [
../common/base.nix
../common/ssf/hosts.nix
../module/emulation.nix
../module/debuginfod.nix
../module/nvidia.nix
../eudy/kernel/perf.nix
./wireguard.nix
../module/hut-substituter.nix
];
# Don't install Grub on the disk yet
boot.loader.grub.device = "nodev";
# Enable serial console
boot.kernelParams = [
"console=tty1"
"console=ttyS1,115200"
];
networking = {
hostName = "raccoon";
# Only BSC DNSs seem to be reachable from the office VLAN
nameservers = [ "84.88.52.35" "84.88.52.36" ];
defaultGateway = "84.88.51.129";
interfaces.eno0.ipv4.addresses = [ {
address = "84.88.51.152";
prefixLength = 25;
} ];
interfaces.enp5s0f1.ipv4.addresses = [ {
address = "10.0.44.1";
prefixLength = 24;
} ];
nat = {
enable = true;
internalInterfaces = [ "enp5s0f1" ];
externalInterface = "eno0";
};
hosts = {
"10.0.44.4" = [ "tent" ];
"84.88.53.236" = [ "apex" ];
};
};
# Mount the NFS home
fileSystems."/nfs/home" = {
device = "10.106.0.30:/home";
fsType = "nfs";
options = [ "nfsvers=3" "rsize=1024" "wsize=1024" "cto" "nofail" ];
};
# Enable performance governor
powerManagement.cpuFreqGovernor = "performance";
hardware.nvidia.open = false; # Maxwell is older than Turing architecture
services.openssh.settings.X11Forwarding = true;
services.prometheus.exporters.node = {
enable = true;
enabledCollectors = [ "systemd" ];
port = 9002;
listenAddress = "127.0.0.1";
};
users.motd = ''
DO YOU BRING FEEDS?
'';
}

48
m/raccoon/wireguard.nix Normal file
View File

@ -0,0 +1,48 @@
{ config, pkgs, ... }:
{
networking.nat = {
enable = true;
enableIPv6 = false;
externalInterface = "eno0";
internalInterfaces = [ "wg0" ];
};
networking.firewall = {
allowedUDPPorts = [ 666 ];
};
age.secrets.wgRaccoon.file = ../../secrets/wg-raccoon.age;
# Enable WireGuard
networking.wireguard.enable = true;
networking.wireguard.interfaces = {
wg0 = {
ips = [ "10.106.0.236/24" ];
listenPort = 666;
privateKeyFile = config.age.secrets.wgRaccoon.path;
# Public key: QUfnGXSMEgu2bviglsaSdCjidB51oEDBFpnSFcKGfDI=
peers = [
{
name = "fox";
publicKey = "VfMPBQLQTKeyXJSwv8wBhc6OV0j2qAxUpX3kLHunK2Y=";
allowedIPs = [ "10.106.0.1/32" ];
endpoint = "fox.ac.upc.edu:666";
persistentKeepalive = 25;
}
{
name = "apex";
publicKey = "VwhcN8vSOzdJEotQTpmPHBC52x3Hbv1lkFIyKubrnUA=";
allowedIPs = [ "10.106.0.30/32" "10.0.40.0/24" ];
endpoint = "ssfhead.bsc.es:666";
persistentKeepalive = 25;
}
];
};
};
networking.hosts = {
"10.106.0.1" = [ "fox.wg" ];
"10.106.0.30" = [ "apex.wg" ];
};
}

14
m/tent/blackbox.yml Normal file
View File

@ -0,0 +1,14 @@
modules:
http_2xx:
prober: http
timeout: 5s
http:
preferred_ip_protocol: "ip4"
follow_redirects: true
valid_status_codes: [] # Defaults to 2xx
method: GET
icmp:
prober: icmp
timeout: 5s
icmp:
preferred_ip_protocol: "ip4"

85
m/tent/configuration.nix Normal file
View File

@ -0,0 +1,85 @@
{ config, pkgs, lib, ... }:
{
imports = [
../common/xeon.nix
../common/ssf/hosts.nix
../module/emulation.nix
../module/debuginfod.nix
./monitoring.nix
./nginx.nix
./nix-serve.nix
./gitlab-runner.nix
./gitea.nix
../hut/public-inbox.nix
../hut/msmtp.nix
../module/p.nix
../module/vpn-dac.nix
../module/hut-substituter.nix
];
# Select the this using the ID to avoid mismatches
boot.loader.grub.device = "/dev/disk/by-id/wwn-0x55cd2e414d537675";
networking = {
hostName = "tent";
interfaces.eno1.ipv4.addresses = [
{
address = "10.0.44.4";
prefixLength = 24;
}
];
# Only BSC DNSs seem to be reachable from the office VLAN
nameservers = [ "84.88.52.35" "84.88.52.36" ];
search = [ "bsc.es" "ac.upc.edu" ];
defaultGateway = "10.0.44.1";
hosts = {
"84.88.53.236" = [ "apex" ];
"10.0.44.1" = [ "raccoon" ];
};
};
services.p.enable = true;
services.prometheus.exporters.node = {
enable = true;
enabledCollectors = [ "systemd" ];
port = 9002;
listenAddress = "127.0.0.1";
};
boot.swraid = {
enable = true;
mdadmConf = ''
DEVICE partitions
ARRAY /dev/md0 metadata=1.2 UUID=496db1e2:056a92aa:a544543f:40db379d
MAILADDR root
'';
};
fileSystems."/vault" = {
device = "/dev/disk/by-label/vault";
fsType = "ext4";
};
# Make a /vault/$USER directory for each user.
systemd.services.create-vault-dirs = let
# Take only normal users in tent
users = lib.filterAttrs (_: v: v.isNormalUser) config.users.users;
commands = lib.concatLists (lib.mapAttrsToList
(_: user: [
"install -d -o ${user.name} -g ${user.group} -m 0711 /vault/home/${user.name}"
]) users);
script = pkgs.writeShellScript "create-vault-dirs.sh" (lib.concatLines commands);
in {
enable = true;
wants = [ "local-fs.target" ];
after = [ "local-fs.target" ];
wantedBy = [ "multi-user.target" ];
serviceConfig.ExecStart = script;
};
# disable automatic garbage collector
nix.gc.automatic = lib.mkForce false;
}

Some files were not shown because too many files have changed in this diff Show More