Compare commits

..

186 Commits

Author SHA1 Message Date
e065cde376 Use NixOS attributes for the install section 2023-09-18 19:27:14 +02:00
3bb0b550aa Add a RequiredBy dependency for remote-fs.target 2023-09-18 19:05:58 +02:00
e4cbcab81c Use a systemd mount directly for the nix store
Allows the LazyUnmount option and avoids the stage1 hack with
/nix//store.
2023-09-18 18:53:40 +02:00
77b41a90e2 Patch nix instead of using an override unit 2023-09-18 18:06:51 +02:00
1fc6891dc6 Remove nix-daemon.socket dependency of /nix/store
The dependency causes a cycle as the nix store will be mounted after the
network is ready, which itself depends on the socket.target which
requires the nix-daemon.socket to be ready too.
2023-09-18 17:28:47 +02:00
8c11c7460a Delay the mount until the network is ready 2023-09-18 16:07:46 +02:00
e6014511f5 Wait for the NFS hut store to be mounted 2023-09-18 15:50:37 +02:00
320c58ce48 Prevent the overlay to be mounted in stage1 2023-09-18 13:57:41 +02:00
d145ee9b2c Mount the overlay in /nix/store 2023-09-18 13:02:32 +02:00
140178d58e Begin the nix store overlay
We need to disable the read-only bind mount, so we can directly bind
mount the overlay.
2023-09-18 11:22:24 +02:00
d48f3b989a Enable direnv integration 2023-09-17 22:27:51 +02:00
653d411b9e Remove bscpkgs from the registry and nixPath
This is done to prevent accidental evaluations where the nixpkgs input
of bscpkgs is still pointing to a different version that the one
specified in the jungle flake. Instead use jungle#bscpkgs.X to get a
package from bscpkgs.
2023-09-15 12:00:33 +02:00
51c57dbc41 Add bscpkgs and nixpkgs top level attributes
Allows the evaluation of packages of the intermediate overlays.
2023-09-15 12:00:33 +02:00
33cd40160e Use hut packages as the default package set
Allows the user to directly access nixpkgs and bscpkgs from the top
level as `nix build jungle#htop` and `nix build jungle#bsc.ovni`.
2023-09-15 12:00:28 +02:00
a1e8cfea47 Don't fetch registry flakes from the net 2023-09-15 12:00:28 +02:00
5d72ee3da3 flake.lock: Update
Flake lock file updates:

• Updated input 'bscpkgs':
    'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=6122fef92701701e1a0622550ac0fc5c2beb5906' (2023-09-07)
  → 'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=3a4062ac04be6263c64a481420d8e768c2521b80' (2023-09-14)
2023-09-15 11:50:47 +02:00
fdc6445d47 Revert "Update slurm to 23.02.05.1"
This reverts commit aaefddc44a9073166ac52b8bd56ac96258d3b053.
2023-09-14 15:46:18 +02:00
e88805947e Open ports in firewall of compute nodes 2023-09-14 15:45:43 +02:00
aaefddc44a Update slurm to 23.02.05.1 2023-09-13 17:44:24 +02:00
d9d249411d Monitor storage nodes via IPMI too 2023-09-13 15:57:13 +02:00
c07f75c6bb Specify the space available in /ceph 2023-09-13 14:19:59 +02:00
8d449ba20c Add update post to website 2023-09-12 18:13:38 +02:00
10ca572aec Enable fstrim service 2023-09-12 16:39:45 +02:00
75b0f48715 Serve the nix store from hut 2023-09-12 12:19:43 +02:00
19a451db77 Add encrypted munge key with agenix 2023-09-08 19:05:45 +02:00
ec9be9bb62 Remove unused large port hole in firewall 2023-09-08 18:22:48 +02:00
7ddd1977f3 Make exporters listen in localhost only 2023-09-08 18:13:04 +02:00
7050c505b5 Allow only some ports for srun 2023-09-08 17:51:37 +02:00
033a1fe97b Block ssfhead from reaching our slurm daemon 2023-09-08 17:36:28 +02:00
77cb3c494e Poweroff idle slurm nodes after 1 hour 2023-09-08 16:49:53 +02:00
6db5772ac4 Add IB and IPMI node host names 2023-09-08 13:21:37 +02:00
3e347e673c flake.lock: Update
Flake lock file updates:

• Updated input 'bscpkgs':
    'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=ee24b910a1cb95bd222e253da43238e843816f2f' (2023-09-01)
  → 'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=6122fef92701701e1a0622550ac0fc5c2beb5906' (2023-09-07)
2023-09-07 11:13:45 +02:00
dca274d020 Unlock ovni gitlab runners 2023-09-05 16:59:45 +02:00
c33909f32f Update email contact to jungle mail list 2023-09-05 16:10:58 +02:00
64e856e8b9 flake.lock: Update
Flake lock file updates:

• Updated input 'bscpkgs':
    'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=18d64c352c10f9ce74aabddeba5a5db02b74ec27' (2023-08-31)
  → 'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs/heads/master&rev=ee24b910a1cb95bd222e253da43238e843816f2f' (2023-09-01)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/d680ded26da5cf104dd2735a51e88d2d8f487b4d' (2023-08-19)
  → 'github:NixOS/nixpkgs/e56990880811a451abd32515698c712788be5720' (2023-09-02)
2023-09-05 15:03:26 +02:00
02f40a8217 Add agenix to all nodes 2023-09-04 22:10:43 +02:00
77d43b6da9 Add agenix module to ceph 2023-09-04 22:07:07 +02:00
ab55aac5ff Remove old secrets 2023-09-04 22:04:32 +02:00
9b5bfbb7a3 Mount /ceph in owl1 and owl2 2023-09-04 22:00:36 +02:00
a69a71d1b0 Warn about the owl2 omnipath device 2023-09-04 22:00:17 +02:00
98374bd303 Clean owl2 configuration 2023-09-04 21:59:56 +02:00
3b6be8a2fc Move the ceph client config to an external module 2023-09-04 21:59:04 +02:00
2bb366b9ac Reorganize secrets and ssh keys
The agenix tools needs to read the secrets from a standalone file, but
we also need the same information for the SSH keys.
2023-09-04 21:36:31 +02:00
2d16709648 Add anavarro user 2023-09-04 16:00:01 +02:00
9344daa31c Set zsh inc_append_history option 2023-09-03 16:57:53 +02:00
80c98041b5 Set zsh shell for rarias 2023-09-03 16:46:27 +02:00
3418e57907 Enable zsh and fix key bindings 2023-09-03 16:42:04 +02:00
6848b58e39 Keep a log over time with the config commits 2023-09-03 00:02:14 +02:00
13a70411aa Configure bscpkgs.nixpkgs to follow nixpkgs 2023-09-02 23:37:59 +02:00
f9c77b433a Store nixos config in /etc/nixos/config.rev 2023-09-02 23:37:11 +02:00
9d487845f6 Enable binary emulation for other architectures 2023-08-31 17:27:08 +02:00
3c99c2a662 Enable watchdog 2023-08-30 16:32:17 +02:00
7d09108c9f Enable all osd on boot in lake2 2023-08-30 16:32:17 +02:00
0f0a861896 Scrape lake2 too 2023-08-29 12:33:26 +02:00
beb0d5940e Also enable monitoring in lake2 2023-08-29 12:29:41 +02:00
70321ce237 Scrape metrics from bay 2023-08-29 11:58:00 +02:00
5bd1d67333 Add monitoring in the bay node 2023-08-29 11:53:32 +02:00
fad9df61e1 Add fio tool 2023-08-29 11:27:50 +02:00
d2a80c8c18 Add ceph tools in hut too 2023-08-28 17:58:21 +02:00
599613d139 Switch ceph logs to journal 2023-08-28 17:58:08 +02:00
ac4fa9abd4 Update ceph to 18.2.0 in overlay 2023-08-25 18:20:21 +02:00
cb3a7b19f7 Move pkgs overlay to overlay.nix 2023-08-25 18:12:00 +02:00
f5d6bf627b Enable ceph osd daemons in lake2 2023-08-25 14:54:51 +02:00
f1ce815edd Add the lake2 hostname to the hosts 2023-08-25 14:44:35 +02:00
a2075cfd65 Use the sda for lake2 2023-08-25 13:40:10 +02:00
8f1f6f92a8 Remove netboot module 2023-08-25 13:39:01 +02:00
3416416864 Disable pixiecore in hut for now 2023-08-25 13:21:00 +02:00
815888fb07 Add PXE helper 2023-08-25 12:05:33 +02:00
029d9cb1db Enable netboot again for PXE 2023-08-24 19:08:23 +02:00
95fa67ede1 Specify the disk by path 2023-08-24 15:27:37 +02:00
a19347161f Prepare lake2 config after bootstrap
The disk ID is different under NixOS.
2023-08-24 13:54:53 +02:00
58c1cc1f7c Add lake2 bootstrap config 2023-08-24 12:30:46 +02:00
b06399dc70 Add section to enable serial console 2023-08-24 12:29:44 +02:00
077eece6b9 Add agenix to PATH in hut 2023-08-23 17:42:50 +02:00
b3ef53de51 Store ceph secret key in age
This allows a node to mount the ceph FS without any extra ceph
configuration in /etc/ceph.
2023-08-23 17:26:44 +02:00
e0852ee89b Add rarias key for secrets 2023-08-23 17:15:26 +02:00
dfffc0bdce Add ceph metrics to prometheus 2023-08-22 16:33:55 +02:00
8257c245b1 Mount the ceph filesystem in hut 2023-08-22 16:15:46 +02:00
cd5853cf53 Add ceph config in bay 2023-08-22 15:58:48 +02:00
b677b827d4 Add the bay host name 2023-08-22 15:56:09 +02:00
b1d5185cca Remove netboot and fixes 2023-08-22 12:12:15 +02:00
a7e66e2246 Add bay node 2023-08-22 12:12:15 +02:00
480c97e952 Update flake 2023-08-22 11:28:54 +02:00
f8fb5fa4ff Monitor power from other nodes via LAN 2023-08-22 11:28:54 +02:00
acf9b71f04 Increase prometheus retention time to one year 2023-08-22 11:28:54 +02:00
bf692e6e4e Don't set all_proxy 2023-08-22 11:28:54 +02:00
c242b65e47 Update nixpkgs to fix docker problem 2023-07-28 14:24:51 +02:00
55d6c17776 Allow access to devices for node_exporter 2023-07-28 13:55:35 +02:00
14b173f67e GRUB version no longer needed 2023-07-27 17:22:20 +02:00
b9001cdf7d Upgrade flake: nixpkgs, bscpkgs and agenix 2023-07-27 17:19:17 +02:00
f892d43b47 Kill slurmd remaining processes on upgrade 2023-07-27 14:49:20 +02:00
d9e9ee6e3a Add details to request access in the web 2023-07-25 16:07:22 +02:00
79adbe76a8 koro: Add vlopez user 2023-07-21 13:00:43 +02:00
66fb848ba8 Add koro node 2023-07-21 13:00:08 +02:00
40b1a8f0df eudy: Add fcsv3 and intermediate versions for testing 2023-07-21 11:27:51 +02:00
a0b9d10b14 eudy: Enable memory overcommit 2023-07-21 11:27:51 +02:00
4c309dea2f eudy: disable all cpu mitigations 2023-07-21 11:27:51 +02:00
b3a397eee4 Add jungle.bsc.es hugo website 2023-07-21 10:52:23 +02:00
7c1fe1455b Enable NTP using the BSC time server 2023-06-30 14:02:15 +02:00
2d4b178895 Add the ssfhead node as gateway 2023-06-30 14:01:35 +02:00
4dd25f2f89 Use our host names first by default 2023-06-23 16:22:18 +02:00
6dcd9d8144 Add DNS tools to resolve hosts 2023-06-23 16:15:45 +02:00
31be81d2b1 Lower perf_event_paranoid to -1 2023-06-23 16:01:27 +02:00
826cfdf43f Set perf paranoid to 0 by default 2023-06-21 16:24:19 +02:00
a1f258c5ce Add perf to packages 2023-06-21 15:41:06 +02:00
1c1d3f3231 Allow srun to specify the cpu binding
The task/affinity plugin needs to be selected.
2023-06-21 13:16:23 +02:00
623d46c03f Move authorized keys to users.nix 2023-06-20 14:08:34 +02:00
518a4d6af3 Add rpenacob user 2023-06-20 12:54:26 +02:00
60077948d6 Add osumb to the system packages 2023-06-16 19:22:41 +02:00
c76bfa7f86 flake.lock: Update
Flake lock file updates:

• Updated input 'bscpkgs':
    'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs%2fheads%2fmaster&rev=c775ee4d6f76aded05b08ae13924c302f18f9b2c' (2023-04-26)
  → 'git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git?ref=refs%2fheads%2fmaster&rev=cbe9af5d042e9d5585fe2acef65a1347c68b2fbd' (2023-06-16)
2023-06-16 18:33:54 +02:00
6c10933e80 Set mpi to mpich by default in bscpkgs 2023-06-16 18:26:51 +02:00
6402605b1f Add missing parameter to extend 2023-06-16 18:26:51 +02:00
1724535495 Use explicit order in overlays 2023-06-16 18:26:51 +02:00
5b41670f36 Replace mpi inside bsc attribute 2023-06-16 18:26:51 +02:00
ab04855382 Add mpich overlay 2023-06-16 18:26:51 +02:00
684d5e41c5 Add coments in slurm config 2023-06-16 18:26:50 +02:00
316ea18e24 Add eudy host key to known hosts 2023-06-16 17:29:48 +02:00
c916157fcc Rename xeon08 to eudy
From Eudyptula, a little penguin.
2023-06-16 17:16:05 +02:00
4e9409db10 Update rebuild script for all nodes 2023-06-16 12:13:07 +02:00
94320d9256 Add ssh host keys 2023-06-16 12:01:12 +02:00
9f5941c2be Set the name of the slurm cluster to jungle 2023-06-16 12:00:54 +02:00
fba0f7b739 Change owl hostnames 2023-06-16 11:42:39 +02:00
2e95281af5 Add owl and all partition 2023-06-16 11:34:00 +02:00
f4ac9f3186 Simplify flake and expose host pkgs
The configuration of the machines is now moved to m/
2023-06-16 11:31:31 +02:00
f787343f29 Rename xeon07 to hut 2023-06-14 17:28:40 +02:00
70304d26ff Remove profiles older than 30 days with gc 2023-06-14 17:28:39 +02:00
76c10ec22e Add ncdu to system packages 2023-06-14 17:28:39 +02:00
011e8c2bf8 Move arocanon user from xeon08 to common 2023-06-14 16:22:43 +02:00
c1f138a9c1 xeon08: Add config for kernel non-voluntary preemption 2023-06-14 16:17:33 +02:00
1552eeca12 xeon08: Add perf 2023-06-14 15:42:20 +02:00
8769f3d418 xeon08: Enable lttng lockdep tracepoints 2023-06-14 15:42:20 +02:00
a4c254fcd6 xeon08: Add lttng module and tools 2023-06-14 15:42:20 +02:00
24fb1846d2 Serve grafana in https://jungle.bsc.es/grafana 2023-05-31 18:12:14 +02:00
5e77d0b86c Add tree command 2023-05-31 18:11:34 +02:00
494fda126c Add file to system packages 2023-05-31 18:11:34 +02:00
5cfa2f9611 Add gnumake to system packages 2023-05-31 18:11:34 +02:00
9539a24bdb Add cmake to system packages 2023-05-31 18:11:34 +02:00
98c4d924dd Add ix to common packages 2023-05-31 18:11:34 +02:00
7aae967c65 Improve documentation 2023-05-26 11:38:27 +02:00
49f7edddac Add gitignore 2023-05-26 11:38:27 +02:00
2f055d9fc5 Set intel_pstate=passive and disable frequency boost 2023-05-26 11:38:26 +02:00
108abffd2a Add xeon08 basic config 2023-05-26 11:38:26 +02:00
4c19ad66e3 Add nixos-config.nix to easily enable nix repl 2023-05-26 11:29:59 +02:00
19c01aeb1d Automatically resume restarted nodes in SLURM 2023-05-18 12:48:04 +02:00
fc90b40310 Allow public dashboards in grafana 2023-05-09 18:53:31 +02:00
81de0effb1 Add hal ssh key 2023-05-09 18:37:38 +02:00
5ce93ff85a Increase the number of CPUs to 56 for nOS-V docker 2023-05-02 17:47:57 +02:00
c020b9f5d6 Allow 5 concurrent buils in the gitlab-runner 2023-05-02 17:38:10 +02:00
f47734b524 Simplify bash prompt 2023-04-28 18:15:04 +02:00
ca3a7d98f5 Roolback to bash as default shell
Zsh doesn't behave properly, it needs further configuration.
2023-04-28 17:59:19 +02:00
0d5609ecc2 Use pmix by default in slurm 2023-04-28 17:07:48 +02:00
818edccb34 Increase locked memory to 1 GiB 2023-04-28 12:34:51 +02:00
2815f5bcfd Use the latest kernel 2023-04-28 11:51:38 +02:00
c1bbbd7793 Disable osnoise and hwlat tracer for now
Reuse nix cache to avoid rebuilding the kernel.
2023-04-28 11:19:47 +02:00
aa1dd14b62 Update nixpkgs to nixos-unstable 2023-04-28 11:18:37 +02:00
399103a9b4 Update nixpkgs 2023-04-28 11:13:46 +02:00
74639d3ece Update ib interface name in xeon02
It seems to be plugged in another PCI port
2023-04-27 18:29:32 +02:00
613a76ac29 Add steps in install documentation 2023-04-27 17:30:53 +02:00
c3ea8864bb Add minimal netboot module to build kexec image 2023-04-27 16:36:15 +02:00
919f211536 Add xeon02 configuration 2023-04-27 16:28:12 +02:00
141d77e2b6 Refacto slurm configuration into compute/control 2023-04-27 16:27:04 +02:00
44fcb97ec7 Lock flakes and add inputs 2023-04-27 13:52:59 +02:00
543983e9f3 Test flakes 2023-04-26 14:27:02 +02:00
95bbeeb646 Enable slurm in xeon01 2023-04-26 14:10:36 +02:00
de2af79810 Use xeon07 as control machine 2023-04-26 14:10:36 +02:00
b9aff1dba5 Remove xeon07 overlay to load upstream slurm 2023-04-26 14:10:36 +02:00
7da979bed2 Add script to rebuild configuration 2023-04-26 14:09:23 +02:00
cfe37640ea Add configuration for xeon01 2023-04-26 11:44:00 +00:00
096e407571 Load overlays from /config 2023-04-26 11:44:00 +00:00
ae31b546e7 Move net.nix to common 2023-04-26 11:44:00 +00:00
c3a2766bb7 Remove host specific network options from net.nix 2023-04-26 11:44:00 +00:00
b568bb36d4 Move ssh.nix to common 2023-04-26 11:44:00 +00:00
55f784e6b7 Move overlays.nix to common 2023-04-26 11:44:00 +00:00
dfab84b0ba Move users.nix to common 2023-04-26 11:44:00 +00:00
8f66ba824a Move common options from configuration.nix 2023-04-26 11:44:00 +00:00
79bd4398f3 Move the remaining hw config to common 2023-04-26 11:44:00 +00:00
b44afdaaa1 Move boot config to common/boot.nix 2023-04-26 11:44:00 +00:00
9528fab3ef Move filesystems config to common/fs.nix 2023-04-26 11:44:00 +00:00
7e82885d84 Use partition labels for / and swap 2023-04-26 11:44:00 +00:00
57ed0cf319 Move fs.nix to common 2023-04-26 11:44:00 +00:00
b043ee3b1d Move boot.nix to common 2023-04-26 11:44:00 +00:00
9e3bdaabb6 Move disk selection to configuration.nix 2023-04-26 11:44:00 +00:00
77f72ac939 Add common directory 2023-04-26 11:44:00 +00:00
fa25a68571 Add server board documentation 2023-04-24 10:10:08 +02:00
Rodrigo Arias
ea0f406849 Add BSC SSF slides 2023-04-24 09:47:11 +02:00
Rodrigo Arias
9df6be1b6b Add SEL troubleshooting guide 2023-04-21 13:31:11 +02:00
407 changed files with 27159 additions and 55038 deletions

View File

@ -1,20 +0,0 @@
name: CI
on:
push:
branches:
- master
pull_request:
branches:
- master
jobs:
build:all:
runs-on: native
steps:
- uses: https://gitea.com/ScMi1/checkout@v1.4
- run: nix build -L --no-link --print-out-paths .#bsc.ci.all
build:cross:
runs-on: native
steps:
- uses: https://gitea.com/ScMi1/checkout@v1.4
- run: nix build -L --no-link --print-out-paths .#bsc.ci.cross

3
.gitignore vendored
View File

@ -1,3 +1,2 @@
**.swp
*.swp
/result
/misc

View File

@ -1,6 +0,0 @@
build:bsc-ci.all:
stage: build
tags:
- nix
script:
- nix build -L --no-link --print-out-paths .#bsc-ci.all

21
COPYING
View File

@ -1,21 +0,0 @@
Copyright (c) 2020-2025 Barcelona Supercomputing Center
Copyright (c) 2003-2020 Eelco Dolstra and the Nixpkgs/NixOS contributors
Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

View File

@ -1,9 +0,0 @@
# Jungle
This repository provides two components that can be used independently:
- A Nix overlay with packages used at BSC (formerly known as bscpkgs). Access
them directly with `nix shell .#<pkgname>`.
- NixOS configurations for jungle machines. Use `nixos-rebuild switch --flake .`
to upgrade the current machine.

View File

@ -1,19 +0,0 @@
let
bscOverlay = import ./overlay.nix;
# read flake.lock and determine revision from there
lock = builtins.fromJSON (builtins.readFile ./flake.lock);
inherit (lock.nodes.nixpkgs.locked) rev narHash;
fetchedNixpkgs = builtins.fetchTarball {
url = "https://github.com/NixOS/nixpkgs/archive/${rev}.tar.gz";
sha256 = narHash;
};
in
{ overlays ? [ ]
, nixpkgs ? fetchedNixpkgs
, ...
}@attrs:
import nixpkgs (
(builtins.removeAttrs attrs [ "overlays" "nixpkgs" ]) //
{ overlays = [ bscOverlay ] ++ overlays; }
)

Binary file not shown.

Binary file not shown.

BIN
doc/bsc-ssf.pdf Normal file

Binary file not shown.

View File

@ -150,27 +150,3 @@ And update grub.
```
# nix build .#nixosConfigurations.xeon02.config.system.build.kexecTree -v
```
## Chain NixOS in same disk with other systems
To install NixOS on a partition along another system which controls the GRUB,
first disable the grub device, so the GRUB is not installed in the disk by
NixOS (only the /boot files will be generated):
```
boot.loader.grub.device = "nodev";
```
Then add the following entry to the old GRUB configuration:
```
menuentry 'NixOS' {
insmod chain
search --no-floppy --label nixos --set root
configfile /boot/grub/grub.cfg
}
```
The partition with NixOS must have the label "nixos" for it to be found. New
system configuration entries will be stored in the GRUB configuration managed
by NixOS, so there is no need to change the old GRUB settings.

View File

@ -1,30 +0,0 @@
# Maintainers
## Role of a maintainer
The responsibilities of maintainers are quite lax, and similar in spirit to
[nixpkgs' maintainers][1]:
The main responsibility of a maintainer is to keep the packages they
maintain in a functioning state, and keep up with updates. In order to do
that, they are empowered to make decisions over the packages they maintain.
That being said, the maintainer is not alone in proposing changes to the
packages. Anybody (both bots and humans) can send PRs to bump or tweak the
package.
In practice, this means that when updating or proposing changes to a package,
we will notify maintainers by mentioning them in Gitea so they can test changes
and give feedback.
Since we do bi-yearly release cycles, there is no expectation from maintainers
to update packages at each upstream release. Nevertheless, on each release cycle
we may request help from maintainers when updating or testing their packages.
## Becoming a maintainer
You'll have to add yourself in the `maintainers.nix` list; your username should
match your `bsc.es` email. Then you can add yourself to the `meta.maintainers`
of any package you are interested in maintaining.
[1]: [https://github.com/NixOS/nixpkgs/tree/nixos-25.05/maintainers]

View File

@ -1,46 +0,0 @@
#!/bin/sh
# Trims the jungle repository by moving the website to its own repository and
# removing it from jungle. It also removes big pdf files and kernel
# configurations so the jungle repository is small.
set -e
if [ -e oldjungle -o -e newjungle -o -e website ]; then
echo "remove oldjungle/, newjungle/ and website/ first"
exit 1
fi
# Clone the old jungle repo
git clone gitea@tent:rarias/jungle.git oldjungle
# First split the website into a new repository
mkdir website && git -C website init -b master
git-filter-repo \
--path web \
--subdirectory-filter web \
--source oldjungle \
--target website
# Then remove the website, pdf files and big kernel configs
mkdir newjungle && git -C newjungle init -b master
git-filter-repo \
--invert-paths \
--path web \
--path-glob 'doc*.pdf' \
--path-glob '**/kernel/configs/lockdep' \
--path-glob '**/kernel/configs/defconfig' \
--source oldjungle \
--target newjungle
set -x
du -sh oldjungle newjungle website
# 57M oldjungle
# 2,3M newjungle
# 6,4M website
du -sh --exclude=.git oldjungle newjungle website
# 30M oldjungle
# 700K newjungle
# 3,5M website

95
flake.lock generated
View File

@ -1,23 +1,110 @@
{
"nodes": {
"agenix": {
"inputs": {
"darwin": "darwin",
"home-manager": "home-manager",
"nixpkgs": [
"nixpkgs"
]
},
"locked": {
"lastModified": 1690228878,
"narHash": "sha256-9Xe7JV0krp4RJC9W9W9WutZVlw6BlHTFMiUP/k48LQY=",
"owner": "ryantm",
"repo": "agenix",
"rev": "d8c973fd228949736dedf61b7f8cc1ece3236792",
"type": "github"
},
"original": {
"owner": "ryantm",
"repo": "agenix",
"type": "github"
}
},
"bscpkgs": {
"inputs": {
"nixpkgs": [
"nixpkgs"
]
},
"locked": {
"lastModified": 1694708510,
"narHash": "sha256-72bvRBhq8Q8V6ibsR9lyBE92V2EC6C6Ek3J5cOM79So=",
"ref": "refs/heads/master",
"rev": "3a4062ac04be6263c64a481420d8e768c2521b80",
"revCount": 862,
"type": "git",
"url": "https://pm.bsc.es/gitlab/rarias/bscpkgs.git"
},
"original": {
"type": "git",
"url": "https://pm.bsc.es/gitlab/rarias/bscpkgs.git"
}
},
"darwin": {
"inputs": {
"nixpkgs": [
"agenix",
"nixpkgs"
]
},
"locked": {
"lastModified": 1673295039,
"narHash": "sha256-AsdYgE8/GPwcelGgrntlijMg4t3hLFJFCRF3tL5WVjA=",
"owner": "lnl7",
"repo": "nix-darwin",
"rev": "87b9d090ad39b25b2400029c64825fc2a8868943",
"type": "github"
},
"original": {
"owner": "lnl7",
"ref": "master",
"repo": "nix-darwin",
"type": "github"
}
},
"home-manager": {
"inputs": {
"nixpkgs": [
"agenix",
"nixpkgs"
]
},
"locked": {
"lastModified": 1682203081,
"narHash": "sha256-kRL4ejWDhi0zph/FpebFYhzqlOBrk0Pl3dzGEKSAlEw=",
"owner": "nix-community",
"repo": "home-manager",
"rev": "32d3e39c491e2f91152c84f8ad8b003420eab0a1",
"type": "github"
},
"original": {
"owner": "nix-community",
"repo": "home-manager",
"type": "github"
}
},
"nixpkgs": {
"locked": {
"lastModified": 1752436162,
"narHash": "sha256-Kt1UIPi7kZqkSc5HVj6UY5YLHHEzPBkgpNUByuyxtlw=",
"lastModified": 1693663421,
"narHash": "sha256-ImMIlWE/idjcZAfxKK8sQA7A1Gi/O58u5/CJA+mxvl8=",
"owner": "NixOS",
"repo": "nixpkgs",
"rev": "dfcd5b901dbab46c9c6e80b265648481aafb01f8",
"rev": "e56990880811a451abd32515698c712788be5720",
"type": "github"
},
"original": {
"owner": "NixOS",
"ref": "nixos-25.05",
"ref": "nixos-unstable",
"repo": "nixpkgs",
"type": "github"
}
},
"root": {
"inputs": {
"agenix": "agenix",
"bscpkgs": "bscpkgs",
"nixpkgs": "nixpkgs"
}
}

View File

@ -1,52 +1,34 @@
{
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixos-25.05";
nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable";
agenix.url = "github:ryantm/agenix";
agenix.inputs.nixpkgs.follows = "nixpkgs";
bscpkgs.url = "git+https://pm.bsc.es/gitlab/rarias/bscpkgs.git";
bscpkgs.inputs.nixpkgs.follows = "nixpkgs";
};
outputs = { self, nixpkgs, ... }:
outputs = { self, nixpkgs, agenix, bscpkgs, ... }:
let
mkConf = name: nixpkgs.lib.nixosSystem {
system = "x86_64-linux";
specialArgs = { inherit nixpkgs; theFlake = self; };
specialArgs = { inherit nixpkgs bscpkgs agenix; theFlake = self; };
modules = [ "${self.outPath}/m/${name}/configuration.nix" ];
};
# For now we only support x86
system = "x86_64-linux";
pkgs = import nixpkgs {
inherit system;
overlays = [ self.overlays.default ];
config.allowUnfree = true;
};
in
{
nixosConfigurations = {
hut = mkConf "hut";
tent = mkConf "tent";
owl1 = mkConf "owl1";
owl2 = mkConf "owl2";
eudy = mkConf "eudy";
koro = mkConf "koro";
bay = mkConf "bay";
lake2 = mkConf "lake2";
raccoon = mkConf "raccoon";
fox = mkConf "fox";
apex = mkConf "apex";
weasel = mkConf "weasel";
hut = mkConf "hut";
owl1 = mkConf "owl1";
owl2 = mkConf "owl2";
eudy = mkConf "eudy";
koro = mkConf "koro";
bay = mkConf "bay";
lake2 = mkConf "lake2";
};
bscOverlay = import ./overlay.nix;
overlays.default = self.bscOverlay;
# full nixpkgs with our overlay applied
legacyPackages.${system} = pkgs;
hydraJobs = self.legacyPackages.${system}.bsc.hydraJobs;
# propagate nixpkgs lib, so we can do bscpkgs.lib
lib = nixpkgs.lib // {
maintainers = nixpkgs.lib.maintainers // {
bsc = import ./pkgs/maintainers.nix;
};
packages.x86_64-linux = self.nixosConfigurations.hut.pkgs // {
bscpkgs = bscpkgs.packages.x86_64-linux;
nixpkgs = nixpkgs.legacyPackages.x86_64-linux;
};
};
}

View File

@ -2,36 +2,28 @@
# here all the public keys
rec {
hosts = {
hut = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICO7jIp6JRnRWTMDsTB/aiaICJCl4x8qmKMPSs4lCqP1 hut";
owl1 = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIMqMEXO0ApVsBA6yjmb0xP2kWyoPDIWxBB0Q3+QbHVhv owl1";
owl2 = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIHurEYpQzNHqWYF6B9Pd7W8UPgF3BxEg0BvSbsA7BAdK owl2";
eudy = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIL+WYPRRvZupqLAG0USKmd/juEPmisyyJaP8hAgYwXsG eudy";
koro = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIImiTFDbxyUYPumvm8C4mEnHfuvtBY1H8undtd6oDd67 koro";
bay = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICvGBzpRQKuQYHdlUQeAk6jmdbkrhmdLwTBqf3el7IgU bay";
lake2 = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAINo66//S1yatpQHE/BuYD/Gfq64TY7ZN5XOGXmNchiO0 lake2";
fox = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIDwItIk5uOJcQEVPoy/CVGRzfmE1ojrdDcI06FrU4NFT fox";
tent = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFAtTpHtdYoelbknD/IcfBlThwLKJv/dSmylOgpg3FRM tent";
apex = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIBvUFjSfoxXnKwXhEFXx5ckRKJ0oewJ82mRitSMNMKjh apex";
weasel = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFLJrQ8BF6KcweQV8pLkSbFT+tbDxSG9qxrdQE65zJZp weasel";
raccoon = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGNQttFvL0dNEyy7klIhLoK4xXOeM2/K9R7lPMTG3qvK raccoon";
hut = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICO7jIp6JRnRWTMDsTB/aiaICJCl4x8qmKMPSs4lCqP1 hut";
owl1 = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIMqMEXO0ApVsBA6yjmb0xP2kWyoPDIWxBB0Q3+QbHVhv owl1";
owl2 = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIHurEYpQzNHqWYF6B9Pd7W8UPgF3BxEg0BvSbsA7BAdK owl2";
eudy = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIL+WYPRRvZupqLAG0USKmd/juEPmisyyJaP8hAgYwXsG eudy";
koro = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIImiTFDbxyUYPumvm8C4mEnHfuvtBY1H8undtd6oDd67 koro";
bay = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICvGBzpRQKuQYHdlUQeAk6jmdbkrhmdLwTBqf3el7IgU bay";
lake2 = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAINo66//S1yatpQHE/BuYD/Gfq64TY7ZN5XOGXmNchiO0 lake2";
};
hostGroup = with hosts; rec {
compute = [ owl1 owl2 fox raccoon ];
playground = [ eudy koro weasel ];
compute = [ owl1 owl2 ];
playground = [ eudy koro ];
storage = [ bay lake2 ];
monitor = [ hut ];
login = [ apex ];
system = storage ++ monitor ++ login;
system = storage ++ monitor;
safe = system ++ compute;
all = safe ++ playground;
};
admins = {
"rarias@hut" = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIE1oZTPtlEXdGt0Ak+upeCIiBdaDQtcmuWoTUCVuSVIR rarias@hut";
"rarias@tent" = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIIwlWSBTZi74WTz5xn6gBvTmCoVltmtIAeM3RMmkh4QZ rarias@tent";
"rarias@fox" = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIDSbw3REAKECV7E2c/e2XJITudJQWq2qDSe2N1JHqHZd rarias@fox";
root = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIII/1TNArcwA6D47mgW4TArwlxQRpwmIGiZDysah40Gb root@hut";
rarias = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIE1oZTPtlEXdGt0Ak+upeCIiBdaDQtcmuWoTUCVuSVIR rarias@hut";
root = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIII/1TNArcwA6D47mgW4TArwlxQRpwmIGiZDysah40Gb root@hut";
};
}

View File

@ -1,69 +0,0 @@
{ lib, config, pkgs, ... }:
{
imports = [
../common/xeon.nix
../common/ssf/hosts.nix
../module/ceph.nix
../module/hut-substituter.nix
../module/slurm-server.nix
./nfs.nix
./wireguard.nix
];
# Don't install grub MBR for now
boot.loader.grub.device = "nodev";
boot.initrd.kernelModules = [
"megaraid_sas" # For HW RAID
];
environment.systemPackages = with pkgs; [
storcli # To manage HW RAID
];
fileSystems."/home" = {
device = "/dev/disk/by-label/home";
fsType = "ext4";
};
# No swap, there is plenty of RAM
swapDevices = lib.mkForce [];
networking = {
hostName = "apex";
defaultGateway = "84.88.53.233";
nameservers = [ "8.8.8.8" ];
# Public facing interface
interfaces.eno1.ipv4.addresses = [ {
address = "84.88.53.236";
prefixLength = 29;
} ];
# Internal LAN to our Ethernet switch
interfaces.eno2.ipv4.addresses = [ {
address = "10.0.40.30";
prefixLength = 24;
} ];
# Infiniband over Omnipath switch (disconnected for now)
# interfaces.ibp5s0 = {};
nat = {
enable = true;
internalInterfaces = [ "eno2" ];
externalInterface = "eno1";
};
};
networking.firewall = {
extraCommands = ''
# Blackhole BSC vulnerability scanner (OpenVAS) as it is spamming our
# logs. Insert as first position so we also protect SSH.
iptables -I nixos-fw 1 -p tcp -s 192.168.8.16 -j nixos-fw-refuse
# Same with opsmonweb01.bsc.es which seems to be trying to access via SSH
iptables -I nixos-fw 2 -p tcp -s 84.88.52.176 -j nixos-fw-refuse
'';
};
}

View File

@ -1,48 +0,0 @@
{ ... }:
{
services.nfs.server = {
enable = true;
lockdPort = 4001;
mountdPort = 4002;
statdPort = 4000;
exports = ''
/home 10.0.40.0/24(rw,async,no_subtree_check,no_root_squash)
/home 10.106.0.0/24(rw,async,no_subtree_check,no_root_squash)
'';
};
networking.firewall = {
# Check with `rpcinfo -p`
extraCommands = ''
# Accept NFS traffic from compute nodes but not from the outside
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 111 -j nixos-fw-accept
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 2049 -j nixos-fw-accept
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 4000 -j nixos-fw-accept
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 4001 -j nixos-fw-accept
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 4002 -j nixos-fw-accept
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 20048 -j nixos-fw-accept
# Same but UDP
iptables -A nixos-fw -p udp -s 10.0.40.0/24 --dport 111 -j nixos-fw-accept
iptables -A nixos-fw -p udp -s 10.0.40.0/24 --dport 2049 -j nixos-fw-accept
iptables -A nixos-fw -p udp -s 10.0.40.0/24 --dport 4000 -j nixos-fw-accept
iptables -A nixos-fw -p udp -s 10.0.40.0/24 --dport 4001 -j nixos-fw-accept
iptables -A nixos-fw -p udp -s 10.0.40.0/24 --dport 4002 -j nixos-fw-accept
iptables -A nixos-fw -p udp -s 10.0.40.0/24 --dport 20048 -j nixos-fw-accept
# Accept NFS traffic from wg0
iptables -A nixos-fw -p tcp -i wg0 -s 10.106.0.0/24 --dport 111 -j nixos-fw-accept
iptables -A nixos-fw -p tcp -i wg0 -s 10.106.0.0/24 --dport 2049 -j nixos-fw-accept
iptables -A nixos-fw -p tcp -i wg0 -s 10.106.0.0/24 --dport 4000 -j nixos-fw-accept
iptables -A nixos-fw -p tcp -i wg0 -s 10.106.0.0/24 --dport 4001 -j nixos-fw-accept
iptables -A nixos-fw -p tcp -i wg0 -s 10.106.0.0/24 --dport 4002 -j nixos-fw-accept
iptables -A nixos-fw -p tcp -i wg0 -s 10.106.0.0/24 --dport 20048 -j nixos-fw-accept
# Same but UDP
iptables -A nixos-fw -p udp -i wg0 -s 10.106.0.0/24 --dport 111 -j nixos-fw-accept
iptables -A nixos-fw -p udp -i wg0 -s 10.106.0.0/24 --dport 2049 -j nixos-fw-accept
iptables -A nixos-fw -p udp -i wg0 -s 10.106.0.0/24 --dport 4000 -j nixos-fw-accept
iptables -A nixos-fw -p udp -i wg0 -s 10.106.0.0/24 --dport 4001 -j nixos-fw-accept
iptables -A nixos-fw -p udp -i wg0 -s 10.106.0.0/24 --dport 4002 -j nixos-fw-accept
iptables -A nixos-fw -p udp -i wg0 -s 10.106.0.0/24 --dport 20048 -j nixos-fw-accept
'';
};
}

View File

@ -1,42 +0,0 @@
{ config, ... }:
{
networking.firewall = {
allowedUDPPorts = [ 666 ];
};
age.secrets.wgApex.file = ../../secrets/wg-apex.age;
# Enable WireGuard
networking.wireguard.enable = true;
networking.wireguard.interfaces = {
# "wg0" is the network interface name. You can name the interface arbitrarily.
wg0 = {
ips = [ "10.106.0.30/24" ];
listenPort = 666;
privateKeyFile = config.age.secrets.wgApex.path;
# Public key: VwhcN8vSOzdJEotQTpmPHBC52x3Hbv1lkFIyKubrnUA=
peers = [
{
name = "fox";
publicKey = "VfMPBQLQTKeyXJSwv8wBhc6OV0j2qAxUpX3kLHunK2Y=";
allowedIPs = [ "10.106.0.1/32" ];
endpoint = "fox.ac.upc.edu:666";
# Send keepalives every 25 seconds. Important to keep NAT tables alive.
persistentKeepalive = 25;
}
{
name = "raccoon";
publicKey = "QUfnGXSMEgu2bviglsaSdCjidB51oEDBFpnSFcKGfDI=";
allowedIPs = [ "10.106.0.236/32" "192.168.0.0/16" "10.0.44.0/24" ];
}
];
};
};
networking.hosts = {
"10.106.0.1" = [ "fox" ];
"10.106.0.236" = [ "raccoon" ];
"10.0.44.4" = [ "tent" ];
};
}

View File

@ -2,22 +2,21 @@
{
imports = [
../common/ssf.nix
../module/hut-substituter.nix
../module/monitoring.nix
../common/main.nix
../common/monitoring.nix
];
# Select the this using the ID to avoid mismatches
boot.loader.grub.device = "/dev/disk/by-id/wwn-0x55cd2e414d53562d";
boot.kernel.sysctl = {
"kernel.yama.ptrace_scope" = lib.mkForce "1";
};
environment.systemPackages = with pkgs; [
ceph
];
services.slurm = {
client.enable = lib.mkForce false;
};
networking = {
hostName = "bay";
interfaces.eno1.ipv4.addresses = [ {
@ -28,16 +27,6 @@
address = "10.0.42.40";
prefixLength = 24;
} ];
firewall = {
extraCommands = ''
# Accept all incoming TCP traffic from lake2
iptables -A nixos-fw -p tcp -s lake2 -j nixos-fw-accept
# Accept monitoring requests from hut
iptables -A nixos-fw -p tcp -s hut -m multiport --dport 9283,9002 -j nixos-fw-accept
# Accept all Ceph traffic from the local network
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 -m multiport --dport 3300,6789,6800:7568 -j nixos-fw-accept
'';
};
};
services.ceph = {

9
m/common/agenix.nix Normal file
View File

@ -0,0 +1,9 @@
{ agenix, ... }:
{
imports = [ agenix.nixosModules.default ];
environment.systemPackages = [
agenix.packages.x86_64-linux.default
];
}

View File

@ -1,22 +0,0 @@
{
# All machines should include this profile.
# Includes the basic configuration for an Intel server.
imports = [
./base/agenix.nix
./base/always-power-on.nix
./base/august-shutdown.nix
./base/boot.nix
./base/env.nix
./base/fs.nix
./base/hw.nix
./base/net.nix
./base/nix.nix
./base/sys-devices.nix
./base/ntp.nix
./base/rev.nix
./base/ssh.nix
./base/users.nix
./base/watchdog.nix
./base/zsh.nix
];
}

View File

@ -1,8 +0,0 @@
{ pkgs, ... }:
{
imports = [ ../../module/agenix.nix ];
# Add agenix to system packages
environment.systemPackages = [ pkgs.agenix ];
}

View File

@ -1,8 +0,0 @@
{
imports = [
../../module/power-policy.nix
];
# Turn on as soon as we have power
power.policy = "always-on";
}

View File

@ -1,14 +0,0 @@
{
# Shutdown all machines on August 3rd at 22:00, so we can protect the
# hardware from spurious electrical peaks on the yearly electrical cut for
# manteinance that starts on August 4th.
systemd.timers.august-shutdown = {
description = "Shutdown on August 3rd for maintenance";
wantedBy = [ "timers.target" ];
timerConfig = {
OnCalendar = "*-08-03 22:00:00";
RandomizedDelaySec = "10min";
Unit = "systemd-poweroff.service";
};
};
}

View File

@ -1,37 +0,0 @@
{ pkgs, config, ... }:
{
environment.systemPackages = with pkgs; [
vim wget git htop tmux pciutils tcpdump ripgrep nix-index nixos-option
nix-diff ipmitool freeipmi ethtool lm_sensors cmake gnumake file tree
ncdu config.boot.kernelPackages.perf ldns pv
# From jungle overlay
osumb nixgen
];
programs.direnv.enable = true;
# Increase limits
security.pam.loginLimits = [
{
domain = "*";
type = "-";
item = "memlock";
value = "1048576"; # 1 GiB of mem locked
}
];
environment.enableAllTerminfo = true;
environment.variables = {
EDITOR = "vim";
VISUAL = "vim";
};
programs.bash.promptInit = ''
PS1="\h\\$ "
'';
time.timeZone = "Europe/Madrid";
i18n.defaultLocale = "en_DK.UTF-8";
}

View File

@ -1,23 +0,0 @@
{ pkgs, lib, ... }:
{
networking = {
enableIPv6 = false;
useDHCP = false;
firewall = {
enable = true;
allowedTCPPorts = [ 22 ];
};
# Make sure we use iptables
nftables.enable = lib.mkForce false;
hosts = {
"84.88.53.236" = [ "ssfhead.bsc.es" "ssfhead" ];
"84.88.51.142" = [ "raccoon-ipmi" ];
"192.168.11.12" = [ "bscpm04.bsc.es" ];
"192.168.11.15" = [ "gitlab-internal.bsc.es" ];
};
};
}

View File

@ -1,59 +0,0 @@
{ pkgs, nixpkgs, theFlake, ... }:
{
nixpkgs.overlays = [
(import ../../../overlay.nix)
];
nixpkgs.config.allowUnfree = true;
nix = {
nixPath = [
"nixpkgs=${nixpkgs}"
"jungle=${theFlake.outPath}"
];
registry = {
nixpkgs.flake = nixpkgs;
jungle.flake = theFlake;
};
settings = {
experimental-features = [ "nix-command" "flakes" ];
sandbox = "relaxed";
trusted-users = [ "@wheel" ];
flake-registry = pkgs.writeText "global-registry.json"
''{"flakes":[],"version":2}'';
keep-outputs = true;
};
gc = {
automatic = true;
dates = "weekly";
options = "--delete-older-than 30d";
};
};
# The nix-gc.service can begin its execution *before* /home is mounted,
# causing it to remove all gcroots considering them as stale, as it cannot
# access the symlink. To prevent this problem, we force the service to wait
# until /home is mounted as well as other remote FS like /ceph.
systemd.services.nix-gc = {
# Start remote-fs.target if not already being started and fail if it fails
# to start. It will also be stopped if the remote-fs.target fails after
# starting successfully.
bindsTo = [ "remote-fs.target" ];
# Wait until remote-fs.target fully starts before starting this one.
after = [ "remote-fs.target"];
# Ensure we can access a remote path inside /home
unitConfig.ConditionPathExists = "/home/Computational";
};
# This value determines the NixOS release from which the default
# settings for stateful data, like file locations and database versions
# on your system were taken. Its perfectly fine and recommended to leave
# this value at the release version of the first install of this system.
# Before changing this value read the documentation for this option
# (e.g. man configuration.nix or on https://nixos.org/nixos/options.html).
system.stateVersion = "22.11"; # Did you read the comment?
}

View File

@ -1,9 +0,0 @@
{
nix.settings.system-features = [ "sys-devices" ];
programs.nix-required-mounts.enable = true;
programs.nix-required-mounts.allowedPatterns.sys-devices.paths = [
"/sys/devices/system/cpu"
"/sys/devices/system/node"
];
}

View File

@ -1,203 +0,0 @@
{ pkgs, ... }:
{
imports = [
../../module/jungle-users.nix
];
users = {
mutableUsers = false;
users = {
# Generate hashedPassword with `mkpasswd -m sha-512`
root.openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIKBOf4r4lzQfyO0bx5BaREePREw8Zw5+xYgZhXwOZoBO ram@hop"
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAINa0tvnNgwkc5xOwd6xTtaIdFi5jv0j2FrE7jl5MTLoE ram@mio"
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIF3zeB5KSimMBAjvzsp1GCkepVaquVZGPYwRIzyzaCba aleix@bsc"
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIII/1TNArcwA6D47mgW4TArwlxQRpwmIGiZDysah40Gb root@hut"
];
rarias = {
uid = 1880;
isNormalUser = true;
linger = true;
home = "/home/Computational/rarias";
description = "Rodrigo Arias";
group = "Computational";
extraGroups = [ "wheel" ];
hashedPassword = "$6$u06tkCy13enReBsb$xiI.twRvvTfH4jdS3s68NZ7U9PSbGKs5.LXU/UgoawSwNWhZo2hRAjNL5qG0/lAckzcho2LjD0r3NfVPvthY6/";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIKBOf4r4lzQfyO0bx5BaREePREw8Zw5+xYgZhXwOZoBO ram@hop"
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAINa0tvnNgwkc5xOwd6xTtaIdFi5jv0j2FrE7jl5MTLoE ram@mio"
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGYcXIxe0poOEGLpk8NjiRozls7fMRX0N3j3Ar94U+Gl rarias@hal"
];
shell = pkgs.zsh;
};
arocanon = {
uid = 1042;
isNormalUser = true;
home = "/home/Computational/arocanon";
description = "Aleix Roca";
group = "Computational";
extraGroups = [ "wheel" "tracing" ];
hashedPassword = "$6$hliZiW4tULC/tH7p$pqZarwJkNZ7vS0G5llWQKx08UFG9DxDYgad7jplMD8WkZh5k58i4dfPoWtnEShfjTO6JHiIin05ny5lmSXzGM/";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIF3zeB5KSimMBAjvzsp1GCkepVaquVZGPYwRIzyzaCba aleix@bsc"
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGdphWxLAEekicZ/WBrvP7phMyxKSSuLAZBovNX+hZXQ aleix@kerneland"
];
};
};
jungleUsers = {
rpenacob = {
uid = 2761;
isNormalUser = true;
home = "/home/Computational/rpenacob";
description = "Raúl Peñacoba";
group = "Computational";
hosts = [ "apex" "owl1" "owl2" "hut" "tent" "fox" ];
hashedPassword = "$6$TZm3bDIFyPrMhj1E$uEDXoYYd1z2Wd5mMPfh3DZAjP7ztVjJ4ezIcn82C0ImqafPA.AnTmcVftHEzLB3tbe2O4SxDyPSDEQgJ4GOtj/";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFYfXg37mauGeurqsLpedgA2XQ9d4Nm0ZGo/hI1f7wwH rpenacob@bsc"
];
};
anavarro = {
uid = 1037;
isNormalUser = true;
home = "/home/Computational/anavarro";
description = "Antoni Navarro";
group = "Computational";
hosts = [ "apex" "hut" "tent" "raccoon" "fox" "weasel" ];
hashedPassword = "$6$EgturvVYXlKgP43g$gTN78LLHIhaF8hsrCXD.O6mKnZSASWSJmCyndTX8QBWT6wTlUhcWVAKz65lFJPXjlJA4u7G1ydYQ0GG6Wk07b1";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIMsbM21uepnJwPrRe6jYFz8zrZ6AYMtSEvvt4c9spmFP toni@delltoni"
];
};
abonerib = {
uid = 4541;
isNormalUser = true;
home = "/home/Computational/abonerib";
description = "Aleix Boné";
group = "Computational";
hosts = [ "apex" "owl1" "owl2" "hut" "tent" "raccoon" "fox" "weasel" ];
hashedPassword = "$6$V1EQWJr474whv7XJ$OfJ0wueM2l.dgiJiiah0Tip9ITcJ7S7qDvtSycsiQ43QBFyP4lU0e0HaXWps85nqB4TypttYR4hNLoz3bz662/";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIIIFiqXqt88VuUfyANkZyLJNiuroIITaGlOOTMhVDKjf abonerib@bsc"
];
};
vlopez = {
uid = 4334;
isNormalUser = true;
home = "/home/Computational/vlopez";
description = "Victor López";
group = "Computational";
hosts = [ "apex" "koro" ];
hashedPassword = "$6$0ZBkgIYE/renVqtt$1uWlJsb0FEezRVNoETTzZMx4X2SvWiOsKvi0ppWCRqI66S6TqMBXBdP4fcQyvRRBt0e4Z7opZIvvITBsEtO0f0";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGMwlUZRf9jfG666Qa5Sb+KtEhXqkiMlBV2su3x/dXHq victor@arch"
];
};
dbautist = {
uid = 5649;
isNormalUser = true;
home = "/home/Computational/dbautist";
description = "Dylan Bautista Cases";
group = "Computational";
hosts = [ "apex" "hut" "tent" "raccoon" ];
hashedPassword = "$6$a2lpzMRVkG9nSgIm$12G6.ka0sFX1YimqJkBAjbvhRKZ.Hl090B27pdbnQOW0wzyxVWySWhyDDCILjQELky.HKYl9gqOeVXW49nW7q/";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAb+EQBoS98zrCwnGKkHKwMLdYABMTqv7q9E0+T0QmkS dbautist@bsc-848818791"
];
};
dalvare1 = {
uid = 2758;
isNormalUser = true;
home = "/home/Computational/dalvare1";
description = "David Álvarez";
group = "Computational";
hosts = [ "apex" "hut" "tent" "fox" ];
hashedPassword = "$6$mpyIsV3mdq.rK8$FvfZdRH5OcEkUt5PnIUijWyUYZvB1SgeqxpJ2p91TTe.3eQIDTcLEQ5rxeg.e5IEXAZHHQ/aMsR5kPEujEghx0";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGEfy6F4rF80r4Cpo2H5xaWqhuUZzUsVsILSKGJzt5jF dalvare1@ssfhead"
];
};
varcila = {
uid = 5650;
isNormalUser = true;
home = "/home/Computational/varcila";
description = "Vincent Arcila";
group = "Computational";
hosts = [ "apex" "hut" "tent" "fox" ];
hashedPassword = "$6$oB0Tcn99DcM4Ch$Vn1A0ulLTn/8B2oFPi9wWl/NOsJzaFAWjqekwcuC9sMC7cgxEVb.Nk5XSzQ2xzYcNe5MLtmzkVYnRS1CqP39Y0";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIKGt0ESYxekBiHJQowmKpfdouw0hVm3N7tUMtAaeLejK vincent@varch"
];
};
pmartin1 = {
# Arbitrary UID but large so it doesn't collide with other users on ssfhead.
uid = 9652;
isNormalUser = true;
home = "/home/Computational/pmartin1";
description = "Pedro J. Martinez-Ferrer";
group = "Computational";
hosts = [ "fox" ];
hashedPassword = "$6$nIgDMGnt4YIZl3G.$.JQ2jXLtDPRKsbsJfJAXdSvjDIzRrg7tNNjPkLPq3KJQhMjfDXRUvzagUHUU2TrE2hHM8/6uq8ex0UdxQ0ysl.";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIIV5LEAII5rfe1hYqDYIIrhb1gOw7RcS1p2mhOTqG+zc pedro@pedro-ThinkPad-P14s-Gen-2a"
];
};
csiringo = {
uid = 9653;
isNormalUser = true;
home = "/home/Computational/csiringo";
description = "Cesare Siringo";
group = "Computational";
hosts = [ ];
hashedPassword = "$6$0IsZlju8jFukLlAw$VKm0FUXbS.mVmPm3rcJeizTNU4IM5Nmmy21BvzFL.cQwvlGwFI1YWRQm6gsbd4nbg47mPDvYkr/ar0SlgF6GO1";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIHA65zvvG50iuFEMf+guRwZB65jlGXfGLF4HO+THFaed csiringo@bsc.es"
];
};
acinca = {
uid = 9654;
isNormalUser = true;
home = "/home/Computational/acinca";
description = "Arnau Cinca";
group = "Computational";
hosts = [ "apex" "hut" "fox" "owl1" "owl2" ];
hashedPassword = "$6$S6PUeRpdzYlidxzI$szyvWejQ4hEN76yBYhp1diVO5ew1FFg.cz4lKiXt2Idy4XdpifwrFTCIzLTs5dvYlR62m7ekA5MrhcVxR5F/q/";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFmMqKqPg4uocNOr3O41kLbZMOMJn3m2ZdN1JvTR96z3 bsccns@arnau-bsc"
];
};
aaguirre = {
uid = 9655;
isNormalUser = true;
home = "/home/Computational/aaguirre";
description = "Alejandro Aguirre";
group = "Computational";
hosts = [ "apex" "hut" ];
hashedPassword = "$6$TXRXQT6jjBvxkxU6$E.sh5KspAm1qeG5Ct7OPHpo8REmbGDwjFGvqeGgTVz3GASGOAnPL7UMZsMAsAKBoahOw.v8LNno6XGrTEPzZH1";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOlRX7ZCnqtUJYCxKgWmgSrFCYuA2LHY96rVwqxXPl86 aaguirre@BSC-8488184117"
];
};
};
groups = {
Computational = { gid = 564; };
tracing = { };
};
};
}

View File

@ -2,7 +2,7 @@
{
# Use the GRUB 2 boot loader.
boot.loader.grub.enable = true;
boot.loader.grub.enable = lib.mkForce true;
# Enable GRUB2 serial console
boot.loader.grub.extraConfig = ''
@ -11,12 +11,14 @@
terminal_output --append serial
'';
# Enable serial console
boot.kernelParams = [
"console=tty1"
"console=ttyS0,115200"
];
boot.kernel.sysctl = {
"kernel.perf_event_paranoid" = lib.mkDefault "-1";
# Allow ptracing (i.e. attach with GDB) any process of the same user, see:
# https://www.kernel.org/doc/Documentation/security/Yama.txt
"kernel.yama.ptrace_scope" = "0";
};
boot.kernelPackages = pkgs.linuxPackages_latest;

View File

@ -13,12 +13,16 @@
[ { device = "/dev/disk/by-label/swap"; }
];
# Mount the home via NFS
fileSystems."/home" = {
device = "10.0.40.30:/home";
fsType = "nfs";
options = [ "nfsvers=3" "rsize=1024" "wsize=1024" "cto" "nofail" ];
};
# Tracing
fileSystems."/sys/kernel/tracing" = {
device = "none";
fsType = "tracefs";
};
# Mount a tmpfs into /tmp
boot.tmp.useTmpfs = true;
}

97
m/common/main.nix Normal file
View File

@ -0,0 +1,97 @@
{ config, pkgs, nixpkgs, bscpkgs, agenix, theFlake, ... }:
{
imports = [
./agenix.nix
./boot.nix
./fs.nix
./hw.nix
./net.nix
./ntp.nix
./slurm.nix
./ssh.nix
./users.nix
./watchdog.nix
./rev.nix
./zsh.nix
];
nixpkgs.overlays = [
bscpkgs.bscOverlay
(import ../../pkgs/overlay.nix)
];
system.configurationRevision =
if theFlake ? rev
then theFlake.rev
else throw ("Refusing to build from a dirty Git tree!");
nix.nixPath = [
"nixpkgs=${nixpkgs}"
"jungle=${theFlake.outPath}"
];
nix.settings.flake-registry =
pkgs.writeText "global-registry.json" ''{"flakes":[],"version":2}'';
nix.registry.nixpkgs.flake = nixpkgs;
nix.registry.jungle.flake = theFlake;
environment.systemPackages = with pkgs; [
vim wget git htop tmux pciutils tcpdump ripgrep nix-index nixos-option
nix-diff ipmitool freeipmi ethtool lm_sensors ix cmake gnumake file tree
ncdu config.boot.kernelPackages.perf ldns
# From bsckgs overlay
bsc.osumb
];
programs.direnv.enable = true;
systemd.services."serial-getty@ttyS0" = {
enable = true;
wantedBy = [ "getty.target" ];
serviceConfig.Restart = "always";
};
# Increase limits
security.pam.loginLimits = [
{
domain = "*";
type = "-";
item = "memlock";
value = "1048576"; # 1 GiB of mem locked
}
];
time.timeZone = "Europe/Madrid";
i18n.defaultLocale = "en_DK.UTF-8";
environment.variables = {
EDITOR = "vim";
VISUAL = "vim";
};
nix.settings.experimental-features = [ "nix-command" "flakes" ];
nix.settings.sandbox = "relaxed";
nix.settings.trusted-users = [ "@wheel" ];
nix.gc.automatic = true;
nix.gc.dates = "weekly";
nix.gc.options = "--delete-older-than 30d";
programs.bash.promptInit = ''
PS1="\h\\$ "
'';
# Copy the NixOS configuration file and link it from the resulting system
# (/run/current-system/configuration.nix). This is useful in case you
# accidentally delete configuration.nix.
#system.copySystemConfiguration = true;
# This value determines the NixOS release from which the default
# settings for stateful data, like file locations and database versions
# on your system were taken. Its perfectly fine and recommended to leave
# this value at the release version of the first install of this system.
# Before changing this value read the documentation for this option
# (e.g. man configuration.nix or on https://nixos.org/nixos/options.html).
system.stateVersion = "22.11"; # Did you read the comment?
}

94
m/common/net.nix Normal file
View File

@ -0,0 +1,94 @@
{ pkgs, ... }:
{
# Infiniband (IPoIB)
environment.systemPackages = [ pkgs.rdma-core ];
boot.kernelModules = [ "ib_umad" "ib_ipoib" ];
networking = {
enableIPv6 = false;
useDHCP = false;
defaultGateway = "10.0.40.30";
nameservers = ["8.8.8.8"];
proxy = {
default = "http://localhost:23080/";
noProxy = "127.0.0.1,localhost,internal.domain,10.0.40.40";
# Don't set all_proxy as go complains and breaks the gitlab runner, see:
# https://github.com/golang/go/issues/16715
allProxy = null;
};
firewall = {
enable = true;
allowedTCPPorts = [ 22 ];
extraCommands = ''
# Prevent ssfhead from contacting our slurmd daemon
iptables -A nixos-fw -p tcp -s ssfhead --dport 6817:6819 -j nixos-fw-log-refuse
# But accept traffic to slurm ports from any other node in the subnet
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 6817:6819 -j nixos-fw-accept
# We also need to open the srun port range
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 60000:61000 -j nixos-fw-accept
'';
};
extraHosts = ''
10.0.40.30 ssfhead
84.88.53.236 ssfhead.bsc.es ssfhead
# Node Entry for node: mds01 (ID=72)
10.0.40.40 bay mds01 mds01-eth0
10.0.42.40 bay-ib mds01-ib0
10.0.40.141 bay-ipmi mds01-ipmi0
# Node Entry for node: oss01 (ID=73)
10.0.40.41 oss01 oss01-eth0
10.0.42.41 oss01-ib0
10.0.40.142 oss01-ipmi0
# Node Entry for node: oss02 (ID=74)
10.0.40.42 lake2 oss02 oss02-eth0
10.0.42.42 lake2-ib oss02-ib0
10.0.40.143 lake2-ipmi oss02-ipmi0
# Node Entry for node: xeon01 (ID=15)
10.0.40.1 owl1 xeon01 xeon01-eth0
10.0.42.1 owl1-ib xeon01-ib0
10.0.40.101 owl1-ipmi xeon01-ipmi0
# Node Entry for node: xeon02 (ID=16)
10.0.40.2 owl2 xeon02 xeon02-eth0
10.0.42.2 owl2-ib xeon02-ib0
10.0.40.102 owl2-ipmi xeon02-ipmi0
# Node Entry for node: xeon03 (ID=17)
10.0.40.3 xeon03 xeon03-eth0
10.0.42.3 xeon03-ib0
10.0.40.103 xeon03-ipmi0
# Node Entry for node: xeon04 (ID=18)
10.0.40.4 xeon04 xeon04-eth0
10.0.42.4 xeon04-ib0
10.0.40.104 xeon04-ipmi0
# Node Entry for node: xeon05 (ID=19)
10.0.40.5 koro xeon05 xeon05-eth0
10.0.42.5 koro-ib xeon05-ib0
10.0.40.105 koro-ipmi xeon05-ipmi0
# Node Entry for node: xeon06 (ID=20)
10.0.40.6 xeon06 xeon06-eth0
10.0.42.6 xeon06-ib0
10.0.40.106 xeon06-ipmi0
# Node Entry for node: xeon07 (ID=21)
10.0.40.7 hut xeon07 xeon07-eth0
10.0.42.7 hut-ib xeon07-ib0
10.0.40.107 hut-ipmi xeon07-ipmi0
# Node Entry for node: xeon08 (ID=22)
10.0.40.8 eudy xeon08 xeon08-eth0
10.0.42.8 eudy-ib xeon08-ib0
10.0.40.108 eudy-ipmi xeon08-ipmi0
'';
};
}

View File

@ -1,7 +1,6 @@
{ theFlake, ... }:
let
# Prevent building a configuration without revision
rev = if theFlake ? rev then theFlake.rev
else throw ("Refusing to build from a dirty Git tree!");
in {
@ -16,6 +15,4 @@ in {
DATENOW=$(date --iso-8601=seconds)
echo "$DATENOW booted=$BOOTED current=$CURRENT next=$NEXT" >> /var/configrev.log
'';
system.configurationRevision = rev;
}

99
m/common/slurm.nix Normal file
View File

@ -0,0 +1,99 @@
{ config, pkgs, lib, ... }:
let
suspendProgram = pkgs.writeScript "suspend.sh" ''
#!/usr/bin/env bash
exec 1>>/var/log/power_save.log 2>>/var/log/power_save.log
set -x
export "PATH=/run/current-system/sw/bin:$PATH"
echo "$(date) Suspend invoked $0 $*" >> /var/log/power_save.log
hosts=$(scontrol show hostnames $1)
for host in $hosts; do
echo Shutting down host: $host
ipmitool -I lanplus -H ''${host}-ipmi -P "" -U "" chassis power off
done
'';
resumeProgram = pkgs.writeScript "resume.sh" ''
#!/usr/bin/env bash
exec 1>>/var/log/power_save.log 2>>/var/log/power_save.log
set -x
export "PATH=/run/current-system/sw/bin:$PATH"
echo "$(date) Suspend invoked $0 $*" >> /var/log/power_save.log
hosts=$(scontrol show hostnames $1)
for host in $hosts; do
echo Starting host: $host
ipmitool -I lanplus -H ''${host}-ipmi -P "" -U "" chassis power on
done
'';
in {
systemd.services.slurmd.serviceConfig = {
# Kill all processes in the control group on stop/restart. This will kill
# all the jobs running, so ensure that we only upgrade when the nodes are
# not in use. See:
# https://github.com/NixOS/nixpkgs/commit/ae93ed0f0d4e7be0a286d1fca86446318c0c6ffb
# https://bugs.schedmd.com/show_bug.cgi?id=2095#c24
KillMode = lib.mkForce "control-group";
};
services.slurm = {
client.enable = true;
controlMachine = "hut";
clusterName = "jungle";
nodeName = [
"owl[1,2] Sockets=2 CoresPerSocket=14 ThreadsPerCore=2 Feature=owl"
"hut Sockets=2 CoresPerSocket=14 ThreadsPerCore=2"
];
partitionName = [
"owl Nodes=owl[1-2] Default=YES MaxTime=INFINITE State=UP"
"all Nodes=owl[1-2],hut Default=NO MaxTime=INFINITE State=UP"
];
# See slurm.conf(5) for more details about these options.
extraConfig = ''
# Use PMIx for MPI by default. It works okay with MPICH and OpenMPI, but
# not with Intel MPI. For that use the compatibility shim libpmi.so
# setting I_MPI_PMI_LIBRARY=$pmix/lib/libpmi.so while maintaining the PMIx
# library in SLURM (--mpi=pmix). See more details here:
# https://pm.bsc.es/gitlab/rarias/jungle/-/issues/16
MpiDefault=pmix
# When a node reboots return that node to the slurm queue as soon as it
# becomes operative again.
ReturnToService=2
# Track all processes by using a cgroup
ProctrackType=proctrack/cgroup
# Enable task/affinity to allow the jobs to run in a specified subset of
# the resources. Use the task/cgroup plugin to enable process containment.
TaskPlugin=task/affinity,task/cgroup
# Power off unused nodes until they are requested
SuspendProgram=${suspendProgram}
SuspendTimeout=60
ResumeProgram=${resumeProgram}
ResumeTimeout=300
SuspendExcNodes=hut
# Turn the nodes off after 1 hour of inactivity
SuspendTime=3600
# Reduce port range so we can allow only this range in the firewall
SrunPortRange=60000-61000
'';
};
age.secrets.mungeKey = {
file = ../../secrets/munge-key.age;
owner = "munge";
group = "munge";
};
services.munge = {
enable = true;
password = config.age.secrets.mungeKey.path;
};
}

View File

@ -1,10 +0,0 @@
{
# Provides the base system for a xeon node in the SSF rack.
imports = [
./xeon.nix
./ssf/fs.nix
./ssf/hosts.nix
./ssf/hosts-remote.nix
./ssf/net.nix
];
}

View File

@ -1,8 +0,0 @@
{
# Mount the home via NFS
fileSystems."/home" = {
device = "10.0.40.30:/home";
fsType = "nfs";
options = [ "nfsvers=3" "rsize=1024" "wsize=1024" "cto" "nofail" ];
};
}

View File

@ -1,9 +0,0 @@
{ pkgs, ... }:
{
networking.hosts = {
# Remote hosts visible from compute nodes
"10.106.0.236" = [ "raccoon" ];
"10.0.44.4" = [ "tent" ];
};
}

View File

@ -1,23 +0,0 @@
{ pkgs, ... }:
{
networking.hosts = {
# Login
"10.0.40.30" = [ "apex" ];
# Storage
"10.0.40.40" = [ "bay" ]; "10.0.42.40" = [ "bay-ib" ]; "10.0.40.141" = [ "bay-ipmi" ];
"10.0.40.41" = [ "oss01" ]; "10.0.42.41" = [ "oss01-ib0" ]; "10.0.40.142" = [ "oss01-ipmi" ];
"10.0.40.42" = [ "lake2" ]; "10.0.42.42" = [ "lake2-ib" ]; "10.0.40.143" = [ "lake2-ipmi" ];
# Xeon compute
"10.0.40.1" = [ "owl1" ]; "10.0.42.1" = [ "owl1-ib" ]; "10.0.40.101" = [ "owl1-ipmi" ];
"10.0.40.2" = [ "owl2" ]; "10.0.42.2" = [ "owl2-ib" ]; "10.0.40.102" = [ "owl2-ipmi" ];
"10.0.40.3" = [ "xeon03" ]; "10.0.42.3" = [ "xeon03-ib" ]; "10.0.40.103" = [ "xeon03-ipmi" ];
#"10.0.40.4" = [ "tent" ]; "10.0.42.4" = [ "tent-ib" ]; "10.0.40.104" = [ "tent-ipmi" ];
"10.0.40.5" = [ "koro" ]; "10.0.42.5" = [ "koro-ib" ]; "10.0.40.105" = [ "koro-ipmi" ];
"10.0.40.6" = [ "weasel" ]; "10.0.42.6" = [ "weasel-ib" ]; "10.0.40.106" = [ "weasel-ipmi" ];
"10.0.40.7" = [ "hut" ]; "10.0.42.7" = [ "hut-ib" ]; "10.0.40.107" = [ "hut-ipmi" ];
"10.0.40.8" = [ "eudy" ]; "10.0.42.8" = [ "eudy-ib" ]; "10.0.40.108" = [ "eudy-ipmi" ];
};
}

View File

@ -1,23 +0,0 @@
{ pkgs, ... }:
{
# Infiniband (IPoIB)
environment.systemPackages = [ pkgs.rdma-core ];
boot.kernelModules = [ "ib_umad" "ib_ipoib" ];
networking = {
defaultGateway = "10.0.40.30";
nameservers = ["8.8.8.8"];
firewall = {
extraCommands = ''
# Prevent ssfhead from contacting our slurmd daemon
iptables -A nixos-fw -p tcp -s ssfhead --dport 6817:6819 -j nixos-fw-refuse
# But accept traffic to slurm ports from any other node in the subnet
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 6817:6819 -j nixos-fw-accept
# We also need to open the srun port range
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 60000:61000 -j nixos-fw-accept
'';
};
};
}

View File

@ -1,18 +1,22 @@
{ lib, ... }:
let
keys = import ../../../keys.nix;
keys = import ../../keys.nix;
hostsKeys = lib.mapAttrs (name: value: { publicKey = value; }) keys.hosts;
in
{
# Enable the OpenSSH daemon.
services.openssh.enable = true;
# Connect to intranet git hosts via proxy
programs.ssh.extraConfig = ''
Host bscpm02.bsc.es bscpm03.bsc.es gitlab-internal.bsc.es alya.gitlab.bsc.es
User git
ProxyCommand nc -X connect -x localhost:23080 %h %p
'';
programs.ssh.knownHosts = hostsKeys // {
"gitlab-internal.bsc.es".publicKey = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIF9arsAOSRB06hdy71oTvJHG2Mg8zfebADxpvc37lZo3";
"bscpm03.bsc.es".publicKey = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIM2NuSUPsEhqz1j5b4Gqd+MWFnRqyqY57+xMvBUqHYUS";
"bscpm04.bsc.es".publicKey = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPx4mC0etyyjYUT2Ztc/bs4ZXSbVMrogs1ZTP924PDgT";
"glogin1.bsc.es".publicKey = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFsHsZGCrzpd4QDVn5xoDOtrNBkb0ylxKGlyBt6l9qCz";
"glogin2.bsc.es".publicKey = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFsHsZGCrzpd4QDVn5xoDOtrNBkb0ylxKGlyBt6l9qCz";
};
}

75
m/common/users.nix Normal file
View File

@ -0,0 +1,75 @@
{ pkgs, ... }:
{
users = {
mutableUsers = false;
users = {
# Generate hashedPassword with `mkpasswd -m sha-512`
root.openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIKBOf4r4lzQfyO0bx5BaREePREw8Zw5+xYgZhXwOZoBO ram@hop"
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAINa0tvnNgwkc5xOwd6xTtaIdFi5jv0j2FrE7jl5MTLoE ram@mio"
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIF3zeB5KSimMBAjvzsp1GCkepVaquVZGPYwRIzyzaCba aleix@bsc"
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIII/1TNArcwA6D47mgW4TArwlxQRpwmIGiZDysah40Gb root@hut"
];
rarias = {
uid = 1880;
isNormalUser = true;
home = "/home/Computational/rarias";
description = "Rodrigo Arias";
group = "Computational";
extraGroups = [ "wheel" ];
hashedPassword = "$6$u06tkCy13enReBsb$xiI.twRvvTfH4jdS3s68NZ7U9PSbGKs5.LXU/UgoawSwNWhZo2hRAjNL5qG0/lAckzcho2LjD0r3NfVPvthY6/";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIKBOf4r4lzQfyO0bx5BaREePREw8Zw5+xYgZhXwOZoBO ram@hop"
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAINa0tvnNgwkc5xOwd6xTtaIdFi5jv0j2FrE7jl5MTLoE ram@mio"
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGYcXIxe0poOEGLpk8NjiRozls7fMRX0N3j3Ar94U+Gl rarias@hal"
];
shell = pkgs.zsh;
};
arocanon = {
uid = 1042;
isNormalUser = true;
home = "/home/Computational/arocanon";
description = "Aleix Roca";
group = "Computational";
extraGroups = [ "wheel" ];
hashedPassword = "$6$hliZiW4tULC/tH7p$pqZarwJkNZ7vS0G5llWQKx08UFG9DxDYgad7jplMD8WkZh5k58i4dfPoWtnEShfjTO6JHiIin05ny5lmSXzGM/";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIF3zeB5KSimMBAjvzsp1GCkepVaquVZGPYwRIzyzaCba aleix@bsc"
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGdphWxLAEekicZ/WBrvP7phMyxKSSuLAZBovNX+hZXQ aleix@kerneland"
];
};
rpenacob = {
uid = 2761;
isNormalUser = true;
home = "/home/Computational/rpenacob";
description = "Raúl Peñacoba";
group = "Computational";
hashedPassword = "$6$TZm3bDIFyPrMhj1E$uEDXoYYd1z2Wd5mMPfh3DZAjP7ztVjJ4ezIcn82C0ImqafPA.AnTmcVftHEzLB3tbe2O4SxDyPSDEQgJ4GOtj/";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFYfXg37mauGeurqsLpedgA2XQ9d4Nm0ZGo/hI1f7wwH rpenacob@bsc"
];
};
anavarro = {
uid = 1037;
isNormalUser = true;
home = "/home/Computational/anavarro";
description = "Antoni Navarro";
group = "Computational";
hashedPassword = "$6$QdNDsuLehoZTYZlb$CDhCouYDPrhoiB7/seu7RF.Gqg4zMQz0n5sA4U1KDgHaZOxy2as9pbIGeF8tOHJKRoZajk5GiaZv0rZMn7Oq31";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAILWjRSlKgzBPZQhIeEtk6Lvws2XNcYwHcwPv4osSgst5 anavarro@ssfhead"
];
};
};
groups = {
Computational = { gid = 564; };
};
};
}

View File

@ -1,7 +0,0 @@
{
# Provides the base system for a xeon node, not necessarily in the SSF rack.
imports = [
./base.nix
./xeon/console.nix
];
}

View File

@ -1,14 +0,0 @@
{
# Restart the serial console
systemd.services."serial-getty@ttyS0" = {
enable = true;
wantedBy = [ "getty.target" ];
serviceConfig.Restart = "always";
};
# Enable serial console
boot.kernelParams = [
"console=tty1"
"console=ttyS0,115200"
];
}

View File

@ -2,15 +2,14 @@
{
imports = [
../common/ssf.nix
../common/main.nix
#(modulesPath + "/installer/netboot/netboot-minimal.nix")
./kernel/kernel.nix
./cpufreq.nix
./fs.nix
./users.nix
../module/hut-substituter.nix
../module/debuginfod.nix
./slurm.nix
];
# Select this using the ID to avoid mismatches

File diff suppressed because it is too large Load Diff

10333
m/eudy/kernel/configs/lockdep Normal file

File diff suppressed because it is too large Load Diff

View File

@ -21,9 +21,9 @@ let
# configfile = if lockdep then ./configs/lockdep else ./configs/defconfig;
#};
kernel = nixos-fcs;
kernel = nixos-fcsv3;
nixos-fcs-kernel = lib.makeOverridable ({gitCommit, lockStat ? false, preempt ? false, branch ? "fcs"}: pkgs.linuxPackagesFor (pkgs.buildLinux rec {
nixos-fcs-kernel = {gitCommit, lockStat ? false, preempt ? false, branch ? "fcs"}: pkgs.linuxPackagesFor (pkgs.buildLinux rec {
version = "6.2.8";
src = builtins.fetchGit {
url = "git@bscpm03.bsc.es:ompss-kernel/linux.git";
@ -40,13 +40,35 @@ let
};
kernelPatches = [];
extraMeta.branch = lib.versions.majorMinor version;
}));
});
nixos-fcs = nixos-fcs-kernel {gitCommit = "8a09822dfcc8f0626b209d6d2aec8b5da459dfee";};
nixos-fcs-lockstat = nixos-fcs.override {
nixos-fcsv1 = nixos-fcs-kernel {gitCommit = "bc11660676d3d68ce2459b9fb5d5e654e3f413be";};
nixos-fcsv2 = nixos-fcs-kernel {gitCommit = "db0f2eca0cd57a58bf456d7d2c7d5d8fdb25dfb1";};
nixos-fcsv3 = nixos-fcs-kernel {gitCommit = "6c17394890704c3345ac1a521bb547164b36b154";};
# always use fcs_sched_setaffinity
#nixos-debug = nixos-fcs-kernel {gitCommit = "7d0bf285fca92badc8df3c9907a9ab30db4418aa";};
# remove need_check_cgroup
#nixos-debug = nixos-fcs-kernel {gitCommit = "4cc4efaab5e4a0bfa3089e935215b981c1922919";};
# merge again fcs_wake and fcs_wait
#nixos-debug = nixos-fcs-kernel {gitCommit = "40c6f72f4ae54b0b636b193ac0648fb5730c810d";};
# start from scratch, this is the working version with split fcs_wake and fcs_wait
nixos-debug = nixos-fcs-kernel {gitCommit = "c9a39d6a4ca83845b4e71fcc268fb0a76aff1bdf"; branch = "fcs-test"; };
nixos-fcsv1-lockstat = nixos-fcs-kernel {
gitCommit = "bc11660676d3d68ce2459b9fb5d5e654e3f413be";
lockStat = true;
};
nixos-fcs-lockstat-preempt = nixos-fcs.override {
nixos-fcsv2-lockstat = nixos-fcs-kernel {
gitCommit = "db0f2eca0cd57a58bf456d7d2c7d5d8fdb25dfb1";
lockStat = true;
};
nixos-fcsv3-lockstat = nixos-fcs-kernel {
gitCommit = "6c17394890704c3345ac1a521bb547164b36b154";
lockStat = true;
};
nixos-fcsv3-lockstat-preempt = nixos-fcs-kernel {
gitCommit = "6c17394890704c3345ac1a521bb547164b36b154";
lockStat = true;
preempt = true;
};

7
m/eudy/slurm.nix Normal file
View File

@ -0,0 +1,7 @@
{ lib, ... }:
{
services.slurm = {
client.enable = lib.mkForce false;
};
}

View File

@ -1,96 +0,0 @@
{ lib, config, pkgs, ... }:
{
imports = [
../common/base.nix
../common/xeon/console.nix
../module/amd-uprof.nix
../module/emulation.nix
../module/nvidia.nix
../module/slurm-client.nix
../module/hut-substituter.nix
./wireguard.nix
];
# Don't turn off on August as UPC has different dates.
# Fox works fine on power cuts.
systemd.timers.august-shutdown.enable = false;
# Select the this using the ID to avoid mismatches
boot.loader.grub.device = "/dev/disk/by-id/wwn-0x500a07514b0c1103";
# No swap, there is plenty of RAM
swapDevices = lib.mkForce [];
boot.initrd.availableKernelModules = [ "xhci_pci" "ahci" "nvme" "usbhid" "usb_storage" "sd_mod" ];
boot.kernelModules = [ "kvm-amd" "amd_uncore" "amd_hsmp" ];
hardware.cpu.amd.updateMicrocode = lib.mkDefault config.hardware.enableRedistributableFirmware;
hardware.cpu.intel.updateMicrocode = lib.mkForce false;
# Use performance for benchmarks
powerManagement.cpuFreqGovernor = "performance";
services.amd-uprof.enable = true;
# Disable NUMA balancing
boot.kernel.sysctl."kernel.numa_balancing" = 0;
# Expose kernel addresses
boot.kernel.sysctl."kernel.kptr_restrict" = 0;
# Disable NMI watchdog to save one hw counter (for AMD uProf)
boot.kernel.sysctl."kernel.nmi_watchdog" = 0;
services.openssh.settings.X11Forwarding = true;
services.fail2ban.enable = true;
networking = {
timeServers = [ "ntp1.upc.edu" "ntp2.upc.edu" ];
hostName = "fox";
# UPC network (may change over time, use DHCP)
# Public IP configuration:
# - Hostname: fox.ac.upc.edu
# - IP: 147.83.30.141
# - Gateway: 147.83.30.130
# - NetMask: 255.255.255.192
# Private IP configuration for BMC:
# - Hostname: fox-ipmi.ac.upc.edu
# - IP: 147.83.35.27
# - Gateway: 147.83.35.2
# - NetMask: 255.255.255.0
interfaces.enp1s0f0np0.useDHCP = true;
};
# Recommended for new graphics cards
hardware.nvidia.open = true;
# Mount NVME disks
fileSystems."/nvme0" = { device = "/dev/disk/by-label/nvme0"; fsType = "ext4"; };
fileSystems."/nvme1" = { device = "/dev/disk/by-label/nvme1"; fsType = "ext4"; };
# Mount the NFS home
fileSystems."/nfs/home" = {
device = "10.106.0.30:/home";
fsType = "nfs";
options = [ "nfsvers=3" "rsize=1024" "wsize=1024" "cto" "nofail" ];
};
# Make a /nvme{0,1}/$USER directory for each user.
systemd.services.create-nvme-dirs = let
# Take only normal users in fox
users = lib.filterAttrs (_: v: v.isNormalUser) config.users.users;
commands = lib.concatLists (lib.mapAttrsToList
(_: user: [
"install -d -o ${user.name} -g ${user.group} -m 0755 /nvme{0,1}/${user.name}"
]) users);
script = pkgs.writeShellScript "create-nvme-dirs.sh" (lib.concatLines commands);
in {
enable = true;
wants = [ "local-fs.target" ];
after = [ "local-fs.target" ];
wantedBy = [ "multi-user.target" ];
serviceConfig.ExecStart = script;
};
}

View File

@ -1,54 +0,0 @@
{ config, ... }:
{
networking.firewall = {
allowedUDPPorts = [ 666 ];
};
age.secrets.wgFox.file = ../../secrets/wg-fox.age;
networking.wireguard.enable = true;
networking.wireguard.interfaces = {
# "wg0" is the network interface name. You can name the interface arbitrarily.
wg0 = {
# Determines the IP address and subnet of the server's end of the tunnel interface.
ips = [ "10.106.0.1/24" ];
# The port that WireGuard listens to. Must be accessible by the client.
listenPort = 666;
# Path to the private key file.
privateKeyFile = config.age.secrets.wgFox.path;
# Public key: VfMPBQLQTKeyXJSwv8wBhc6OV0j2qAxUpX3kLHunK2Y=
peers = [
# List of allowed peers.
{
name = "apex";
publicKey = "VwhcN8vSOzdJEotQTpmPHBC52x3Hbv1lkFIyKubrnUA=";
# List of IPs assigned to this peer within the tunnel subnet. Used to configure routing.
allowedIPs = [ "10.106.0.30/32" "10.0.40.7/32" ];
}
{
name = "raccoon";
publicKey = "QUfnGXSMEgu2bviglsaSdCjidB51oEDBFpnSFcKGfDI=";
allowedIPs = [ "10.106.0.236/32" "192.168.0.0/16" "10.0.44.0/24" ];
}
];
};
};
networking.hosts = {
"10.106.0.30" = [ "apex" ];
"10.0.40.7" = [ "hut" ];
"10.106.0.236" = [ "raccoon" ];
"10.0.44.4" = [ "tent" ];
};
networking.firewall = {
extraCommands = ''
# Accept slurm connections to slurmd from apex (via wireguard)
iptables -A nixos-fw -p tcp -i wg0 -s 10.106.0.30/32 -d 10.106.0.1/32 --dport 6818 -j nixos-fw-accept
'';
};
}

View File

@ -1,14 +0,0 @@
modules:
http_2xx:
prober: http
timeout: 5s
http:
follow_redirects: true
preferred_ip_protocol: "ip4"
valid_status_codes: [] # Defaults to 2xx
method: GET
icmp:
prober: icmp
timeout: 5s
icmp:
preferred_ip_protocol: "ip4"

View File

@ -1,43 +1,22 @@
{ config, pkgs, lib, ... }:
{ config, pkgs, ... }:
{
imports = [
../common/ssf.nix
../common/main.nix
../module/ceph.nix
../module/debuginfod.nix
../module/emulation.nix
./gitlab-runner.nix
./monitoring.nix
./nfs.nix
./slurm-daemon.nix
./nix-serve.nix
./public-inbox.nix
./gitea.nix
./msmtp.nix
./postgresql.nix
./nginx.nix
./p.nix
./ompss2-timer.nix
#./pxe.nix
];
boot.binfmt.emulatedSystems = [ "aarch64-linux" "powerpc64le-linux" "riscv64-linux" ];
# Select the this using the ID to avoid mismatches
boot.loader.grub.device = "/dev/disk/by-id/wwn-0x55cd2e414d53567f";
fileSystems = {
"/" = lib.mkForce {
device = "/dev/disk/by-label/nvme";
fsType = "ext4";
neededForBoot = true;
options = [ "noatime" ];
};
"/boot" = lib.mkForce {
device = "/dev/disk/by-label/nixos-boot";
fsType = "ext4";
neededForBoot = true;
};
};
boot.loader.grub.device = "/dev/disk/by-id/ata-INTEL_SSDSC2BB240G7_PHDV6462004Y240AGN";
networking = {
hostName = "hut";
@ -49,20 +28,5 @@
address = "10.0.42.7";
prefixLength = 24;
} ];
firewall = {
extraCommands = ''
# Accept all proxy traffic from compute nodes but not the login
iptables -A nixos-fw -p tcp -s 10.0.40.30 --dport 23080 -j nixos-fw-log-refuse
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 23080 -j nixos-fw-accept
'';
# Flush all rules and chains on stop so it won't break on start
extraStopCommands = ''
iptables -F
iptables -X
'';
};
};
# Allow proxy to bind to the ethernet interface
services.openssh.settings.GatewayPorts = "clientspecified";
}

View File

@ -1,66 +0,0 @@
{ config, lib, ... }:
{
age.secrets.giteaRunnerToken.file = ../../secrets/gitea-runner-token.age;
services.gitea = {
enable = true;
appName = "Gitea in the jungle";
settings = {
server = {
ROOT_URL = "https://jungle.bsc.es/git/";
LOCAL_ROOT_URL = "https://jungle.bsc.es/git/";
LANDING_PAGE = "explore";
};
metrics.ENABLED = true;
service = {
REGISTER_MANUAL_CONFIRM = true;
ENABLE_NOTIFY_MAIL = true;
};
log.LEVEL = "Warn";
mailer = {
ENABLED = true;
FROM = "jungle-robot@bsc.es";
PROTOCOL = "sendmail";
SENDMAIL_PATH = "/run/wrappers/bin/sendmail";
SENDMAIL_ARGS = "--";
};
};
};
# Allow gitea user to send mail
users.users.gitea.extraGroups = [ "mail-robot" ];
services.gitea-actions-runner.instances = {
runrun = {
enable = true;
name = "runrun";
url = "https://jungle.bsc.es/git/";
tokenFile = config.age.secrets.giteaRunnerToken.path;
labels = [ "native:host" ];
settings.runner.capacity = 8;
};
};
systemd.services.gitea-runner-runrun = {
path = [ "/run/current-system/sw" ];
serviceConfig = {
# DynamicUser doesn't work well with SSH
DynamicUser = lib.mkForce false;
User = "gitea-runner";
Group = "gitea-runner";
};
};
users.users.gitea-runner = {
isSystemUser = true;
home = "/var/lib/gitea-runner";
description = "Gitea Runner";
group = "gitea-runner";
extraGroups = [ "docker" ];
createHome = true;
};
users.groups.gitea-runner = {};
}

View File

@ -1,111 +1,54 @@
{ pkgs, lib, config, ... }:
{
age.secrets.gitlab-pm-shell.file = ../../secrets/gitlab-runner-shell-token.age;
age.secrets.gitlab-pm-docker.file = ../../secrets/gitlab-runner-docker-token.age;
age.secrets.gitlab-bsc-docker.file = ../../secrets/gitlab-bsc-docker-token.age;
age.secrets.ovniToken.file = ../../secrets/ovni-token.age;
age.secrets.nosvToken.file = ../../secrets/nosv-token.age;
services.gitlab-runner = {
enable = true;
settings.concurrent = 5;
services = let
common-shell = {
services = {
ovni-shell = {
registrationConfigFile = config.age.secrets.ovniToken.path;
executor = "shell";
tagList = [ "nix" "xeon" ];
registrationFlags = [
# Using space doesn't work, and causes it to misread the next flag
"--locked='false'"
];
environmentVariables = {
SHELL = "${pkgs.bash}/bin/bash";
};
};
common-docker = {
executor = "docker";
ovni-docker = {
registrationConfigFile = config.age.secrets.ovniToken.path;
dockerImage = "debian:stable";
tagList = [ "docker" "xeon" ];
registrationFlags = [
"--locked='false'"
"--docker-network-mode host"
];
environmentVariables = {
https_proxy = "http://hut:23080";
http_proxy = "http://hut:23080";
https_proxy = "http://localhost:23080";
http_proxy = "http://localhost:23080";
};
};
in {
# For pm.bsc.es/gitlab
gitlab-pm-shell = common-shell // {
authenticationTokenConfigFile = config.age.secrets.gitlab-pm-shell.path;
};
gitlab-pm-docker = common-docker // {
authenticationTokenConfigFile = config.age.secrets.gitlab-pm-docker.path;
};
gitlab-bsc-docker = {
# gitlab.bsc.es still uses the old token mechanism
registrationConfigFile = config.age.secrets.gitlab-bsc-docker.path;
tagList = [ "docker" "hut" ];
environmentVariables = {
# We cannot access the hut local interface from docker, so we connect
# to hut directly via the ethernet one.
https_proxy = "http://hut:23080";
http_proxy = "http://hut:23080";
};
executor = "docker";
dockerImage = "alpine";
dockerVolumes = [
"/nix/store:/nix/store:ro"
"/nix/var/nix/db:/nix/var/nix/db:ro"
"/nix/var/nix/daemon-socket:/nix/var/nix/daemon-socket:ro"
];
dockerExtraHosts = [
# Required to pass the proxy via hut
"hut:10.0.40.7"
];
dockerDisableCache = true;
nosv-docker = {
registrationConfigFile = config.age.secrets.nosvToken.path;
dockerImage = "debian:stable";
tagList = [ "docker" "xeon" ];
registrationFlags = [
# Increase build log length to 64 MiB
"--output-limit 65536"
"--docker-network-mode host"
"--docker-cpus 56"
];
preBuildScript = pkgs.writeScript "setup-container" ''
mkdir -p -m 0755 /nix/var/log/nix/drvs
mkdir -p -m 0755 /nix/var/nix/gcroots
mkdir -p -m 0755 /nix/var/nix/profiles
mkdir -p -m 0755 /nix/var/nix/temproots
mkdir -p -m 0755 /nix/var/nix/userpool
mkdir -p -m 1777 /nix/var/nix/gcroots/per-user
mkdir -p -m 1777 /nix/var/nix/profiles/per-user
mkdir -p -m 0755 /nix/var/nix/profiles/per-user/root
mkdir -p -m 0700 "$HOME/.nix-defexpr"
mkdir -p -m 0700 "$HOME/.ssh"
cat > "$HOME/.ssh/config" << EOF
Host bscpm04.bsc.es gitlab-internal.bsc.es
User git
ProxyCommand nc -X connect -x hut:23080 %h %p
Host amdlogin1.bsc.es armlogin1.bsc.es hualogin1.bsc.es glogin1.bsc.es glogin2.bsc.es fpgalogin1.bsc.es
ProxyCommand nc -X connect -x hut:23080 %h %p
EOF
cat >> "$HOME/.ssh/known_hosts" << EOF
bscpm04.bsc.es ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPx4mC0etyyjYUT2Ztc/bs4ZXSbVMrogs1ZTP924PDgT
gitlab-internal.bsc.es ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIF9arsAOSRB06hdy71oTvJHG2Mg8zfebADxpvc37lZo3
EOF
. ${pkgs.nix}/etc/profile.d/nix-daemon.sh
# Required to load SSL certificate paths
. ${pkgs.cacert}/nix-support/setup-hook
'';
environmentVariables = {
ENV = "/etc/profile";
USER = "root";
NIX_REMOTE = "daemon";
PATH = "${config.system.path}/bin:/bin:/sbin:/usr/bin:/usr/sbin";
https_proxy = "http://localhost:23080";
http_proxy = "http://localhost:23080";
};
};
};
};
# DOCKER* chains are useless, override at FORWARD and nixos-fw
networking.firewall.extraCommands = ''
# Don't forward any traffic from docker
iptables -I FORWARD 1 -p all -i docker0 -j nixos-fw-log-refuse
# Allow incoming traffic from docker to 23080
iptables -A nixos-fw -p tcp -i docker0 -d hut --dport 23080 -j ACCEPT
'';
#systemd.services.gitlab-runner.serviceConfig.Shell = "${pkgs.bash}/bin/bash";
systemd.services.gitlab-runner.serviceConfig.DynamicUser = lib.mkForce false;
systemd.services.gitlab-runner.serviceConfig.User = "gitlab-runner";

View File

@ -1,31 +0,0 @@
{ pkgs, config, lib, ... }:
let
gpfs-probe-script = pkgs.runCommand "gpfs-probe.sh" { }
''
cp ${./gpfs-probe.sh} $out;
chmod +x $out
''
;
in
{
# Use a new user to handle the SSH keys
users.groups.ssh-robot = { };
users.users.ssh-robot = {
description = "SSH Robot";
isNormalUser = true;
home = "/var/lib/ssh-robot";
};
systemd.services.gpfs-probe = {
description = "Daemon to report GPFS latency via SSH";
path = [ pkgs.openssh pkgs.netcat ];
after = [ "network.target" ];
wantedBy = [ "default.target" ];
serviceConfig = {
Type = "simple";
ExecStart = "${pkgs.socat}/bin/socat TCP4-LISTEN:9966,fork EXEC:${gpfs-probe-script}";
User = "ssh-robot";
Group = "ssh-robot";
};
};
}

View File

@ -1,18 +0,0 @@
#!/bin/sh
N=500
t=$(timeout 5 ssh bsc015557@glogin2.bsc.es "timeout 3 command time -f %e touch /gpfs/projects/bsc15/bsc015557/gpfs.{1..$N} 2>&1; rm -f /gpfs/projects/bsc15/bsc015557/gpfs.{1..$N}")
if [ -z "$t" ]; then
t="5.00"
fi
cat <<EOF
HTTP/1.1 200 OK
Content-Type: text/plain; version=0.0.4; charset=utf-8; escaping=values
# HELP gpfs_touch_latency Time to create $N files.
# TYPE gpfs_touch_latency gauge
gpfs_touch_latency $t
EOF

13
m/hut/ipmi.yml Normal file
View File

@ -0,0 +1,13 @@
modules:
default:
collectors:
- bmc
- ipmi
- chassis
lan:
collectors:
- ipmi
- chassis
user: ""
pass: ""

View File

@ -1,22 +1,6 @@
{ config, lib, ... }:
{
imports = [
../module/slurm-exporter.nix
../module/meteocat-exporter.nix
../module/upc-qaire-exporter.nix
./gpfs-probe.nix
../module/nix-daemon-exporter.nix
];
age.secrets.grafanaJungleRobotPassword = {
file = ../../secrets/jungle-robot-password.age;
owner = "grafana";
mode = "400";
};
age.secrets.ipmiYml.file = ../../secrets/ipmi.yml.age;
services.grafana = {
enable = true;
settings = {
@ -27,29 +11,14 @@
http_port = 2342;
http_addr = "127.0.0.1";
};
smtp = {
enabled = true;
from_address = "jungle-robot@bsc.es";
user = "jungle-robot";
# Read the password from a file, which is only readable by grafana user
# https://grafana.com/docs/grafana/latest/setup-grafana/configure-grafana/#file-provider
password = "$__file{${config.age.secrets.grafanaJungleRobotPassword.path}}";
host = "mail.bsc.es:465";
startTLS_policy = "NoStartTLS";
};
feature_toggles.publicDashboards = true;
"auth.anonymous".enabled = true;
log.level = "warn";
};
};
# Make grafana alerts also use the proxy
systemd.services.grafana.environment = config.networking.proxy.envVars;
services.prometheus = {
enable = true;
port = 9001;
retentionTime = "5y";
retentionTime = "1y";
listenAddress = "127.0.0.1";
};
@ -78,13 +47,13 @@
enable = true;
group = "root";
user = "root";
configFile = config.age.secrets.ipmiYml.path;
# extraFlags = [ "--log.level=debug" ];
configFile = ./ipmi.yml;
#extraFlags = [ "--log.level=debug" ];
listenAddress = "127.0.0.1";
};
node = {
enable = true;
enabledCollectors = [ "systemd" "logind" ];
enabledCollectors = [ "systemd" ];
port = 9002;
listenAddress = "127.0.0.1";
};
@ -92,11 +61,6 @@
enable = true;
listenAddress = "127.0.0.1";
};
blackbox = {
enable = true;
listenAddress = "127.0.0.1";
configFile = ./blackbox.yml;
};
};
scrapeConfigs = [
@ -109,12 +73,6 @@
"127.0.0.1:9323"
"127.0.0.1:9252"
"127.0.0.1:${toString config.services.prometheus.exporters.smartctl.port}"
"127.0.0.1:9341" # Slurm exporter
"127.0.0.1:9966" # GPFS custom exporter
"127.0.0.1:9999" # Nix-daemon custom exporter
"127.0.0.1:9929" # Meteocat custom exporter
"127.0.0.1:9928" # UPC Qaire custom exporter
"127.0.0.1:${toString config.services.prometheus.exporters.blackbox.port}"
];
}];
}
@ -128,74 +86,6 @@
];
}];
}
{
job_name = "blackbox-http";
metrics_path = "/probe";
params = { module = [ "http_2xx" ]; };
static_configs = [{
targets = [
"https://www.google.com/robots.txt"
"https://pm.bsc.es/"
"https://pm.bsc.es/gitlab/"
"https://jungle.bsc.es/"
"https://gitlab.bsc.es/"
];
}];
relabel_configs = [
{
# Takes the address and sets it in the "target=<xyz>" URL parameter
source_labels = [ "__address__" ];
target_label = "__param_target";
}
{
# Sets the "instance" label with the remote host we are querying
source_labels = [ "__param_target" ];
target_label = "instance";
}
{
# Shows the host target address instead of the blackbox address
target_label = "__address__";
replacement = "127.0.0.1:${toString config.services.prometheus.exporters.blackbox.port}";
}
];
}
{
job_name = "blackbox-icmp";
metrics_path = "/probe";
params = { module = [ "icmp" ]; };
static_configs = [{
targets = [
"1.1.1.1"
"8.8.8.8"
"ssfhead"
"anella-bsc.cesca.cat"
"upc-anella.cesca.cat"
"fox.ac.upc.edu"
"arenys5.ac.upc.edu"
];
}];
relabel_configs = [
{
# Takes the address and sets it in the "target=<xyz>" URL parameter
source_labels = [ "__address__" ];
target_label = "__param_target";
}
{
# Sets the "instance" label with the remote host we are querying
source_labels = [ "__param_target" ];
target_label = "instance";
}
{
# Shows the host target address instead of the blackbox address
target_label = "__address__";
replacement = "127.0.0.1:${toString config.services.prometheus.exporters.blackbox.port}";
}
];
}
{
job_name = "gitea";
static_configs = [{ targets = [ "127.0.0.1:3000" ]; }];
}
{
# Scrape the IPMI info of the hosts remotely via LAN
job_name = "ipmi-lan";
@ -217,7 +107,7 @@
# Sets the "instance" label with the remote host we are querying
source_labels = [ "__param_target" ];
separator = ";";
regex = "(.*)-ipmi"; # Remove "-ipm̀i" at the end
regex = "(.*)";
target_label = "instance";
replacement = "\${1}";
action = "replace";
@ -248,25 +138,6 @@
}
];
}
{
job_name = "ipmi-raccoon";
metrics_path = "/ipmi";
static_configs = [
{ targets = [ "127.0.0.1:9291" ]; }
];
params = {
target = [ "84.88.51.142" ];
module = [ "raccoon" ];
};
}
{
job_name = "raccoon";
static_configs = [
{
targets = [ "127.0.0.1:19002" ]; # Node exporter
}
];
}
];
};
}

View File

@ -1,27 +0,0 @@
{ config, lib, ... }:
{
# Robot user that can see the password to send mail from jungle-robot
users.groups.mail-robot = {};
age.secrets.jungleRobotPassword = {
file = ../../secrets/jungle-robot-password.age;
group = "mail-robot";
mode = "440";
};
programs.msmtp = {
enable = true;
accounts = {
default = {
auth = true;
tls = true;
tls_starttls = false;
port = 465;
host = "mail.bsc.es";
user = "jungle-robot";
passwordeval = "cat ${config.age.secrets.jungleRobotPassword.path}";
from = "jungle-robot@bsc.es";
};
};
};
}

View File

@ -1,76 +0,0 @@
{ theFlake, pkgs, ... }:
let
website = pkgs.stdenv.mkDerivation {
name = "jungle-web";
src = pkgs.fetchgit {
url = "https://jungle.bsc.es/git/rarias/jungle-website.git";
rev = "52abaf4d71652a9ef77a0b098db14ca33bffff4c";
hash = "sha256-/ul9GazbOrOkmlvSgDz/+2W+V+ir5725Y7mVLc3rb0M=";
};
buildInputs = [ pkgs.hugo ];
buildPhase = ''
rm -rf public/
hugo
'';
installPhase = ''
cp -r public $out
'';
# Don't mess doc/
dontFixup = true;
};
in
{
networking.firewall.allowedTCPPorts = [ 80 ];
services.nginx = {
enable = true;
virtualHosts."jungle.bsc.es" = {
root = "${website}";
listen = [
{
addr = "0.0.0.0";
port = 80;
}
];
extraConfig = ''
set_real_ip_from 127.0.0.1;
set_real_ip_from 84.88.52.107;
real_ip_recursive on;
real_ip_header X-Forwarded-For;
location /git {
rewrite ^/git$ / break;
rewrite ^/git/(.*) /$1 break;
proxy_pass http://127.0.0.1:3000;
proxy_redirect http:// $scheme://;
}
location /cache {
rewrite ^/cache/(.*) /$1 break;
proxy_pass http://127.0.0.1:5000;
proxy_redirect http:// $scheme://;
}
location /lists {
proxy_pass http://127.0.0.1:8081;
proxy_redirect http:// $scheme://;
}
location /grafana {
proxy_pass http://127.0.0.1:2342;
proxy_redirect http:// $scheme://;
proxy_set_header Host $host;
# Websockets
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
location ~ ^/~(.+?)(/.*)?$ {
alias /ceph/home/$1/public_html$2;
index index.html index.htm;
autoindex on;
absolute_redirect off;
}
location /p/ {
alias /ceph/p/;
}
'';
};
};
}

View File

@ -1,85 +0,0 @@
{ config, pkgs, ... }:
{
systemd.timers = {
"ompss2-closing" = {
wantedBy = [ "timers.target" ];
timerConfig = {
Unit = "ompss2-closing.service";
OnCalendar = [ "*-03-15 07:00:00" "*-09-15 07:00:00"];
};
};
"ompss2-freeze" = {
wantedBy = [ "timers.target" ];
timerConfig = {
Unit = "ompss2-freeze.service";
OnCalendar = [ "*-04-15 07:00:00" "*-10-15 07:00:00" ];
};
};
"ompss2-release" = {
wantedBy = [ "timers.target" ];
timerConfig = {
Unit = "ompss2-release.service";
OnCalendar = [ "*-05-15 07:00:00" "*-11-15 07:00:00" ];
};
};
};
systemd.services =
let
closing = pkgs.writeText "closing.txt"
''
Subject: OmpSs-2 release enters closing period
Hi,
You have one month to merge the remaining features for the next OmpSs-2
release. Please, identify what needs to be merged and discuss it in the next
OmpSs-2 meeting.
Thanks!,
Jungle robot
'';
freeze = pkgs.writeText "freeze.txt"
''
Subject: OmpSs-2 release enters freeze period
Hi,
The period to introduce new features or breaking changes is over, only bug
fixes are allowed now. During this time, please prepare the release notes
to be included in the next OmpSs-2 release.
Thanks!,
Jungle robot
'';
release = pkgs.writeText "release.txt"
''
Subject: OmpSs-2 release now
Hi,
The period to introduce bug fixes is now over. Please, proceed to do the
OmpSs-2 release.
Thanks!,
Jungle robot
'';
mkServ = name: mail: {
"ompss2-${name}" = {
script = ''
set -eu
set -o pipefail
cat ${mail} | ${config.security.wrapperDir}/sendmail star@bsc.es
'';
serviceConfig = {
Type = "oneshot";
DynamicUser = true;
Group = "mail-robot";
};
};
};
in
(mkServ "closing" closing) //
(mkServ "freeze" freeze) //
(mkServ "release" release);
}

View File

@ -1,43 +0,0 @@
{ pkgs, lib, config, ... }:
let
p = pkgs.writeShellScriptBin "p" ''
set -e
cd /ceph
pastedir="p/$USER"
mkdir -p "$pastedir"
ext="txt"
if [ -n "$1" ]; then
ext="$1"
fi
out=$(mktemp "$pastedir/XXXXXXXX.$ext")
cat > "$out"
chmod go+r "$out"
echo "https://jungle.bsc.es/$out"
'';
in
{
environment.systemPackages = with pkgs; [ p ];
# Make sure we have a directory per user. We cannot use the nice
# systemd-tmpfiles-setup.service service because this is a remote FS, and it
# may not be mounted when it runs.
systemd.services.create-paste-dirs = let
# Take only normal users in hut
users = lib.filterAttrs (_: v: v.isNormalUser) config.users.users;
commands = lib.concatLists (lib.mapAttrsToList
(_: user: [
"install -d -o ${user.name} -g ${user.group} -m 0755 /ceph/p/${user.name}"
]) users);
script = pkgs.writeShellScript "create-paste-dirs.sh" (lib.concatLines commands);
in {
enable = true;
wants = [ "remote-fs.target" ];
after = [ "remote-fs.target" ];
wantedBy = [ "multi-user.target" ];
serviceConfig.ExecStart = script;
};
}

View File

@ -1,19 +0,0 @@
{ lib, ... }:
{
services.postgresql = {
enable = true;
ensureDatabases = [ "perftestsdb" ];
ensureUsers = [
{ name = "anavarro"; ensureClauses.superuser = true; }
{ name = "rarias"; ensureClauses.superuser = true; }
{ name = "grafana"; }
];
authentication = ''
#type database DBuser auth-method
local perftestsdb rarias trust
local perftestsdb anavarro trust
local perftestsdb grafana trust
'';
};
}

View File

@ -1,79 +0,0 @@
/*
* CC0-1.0 <https://creativecommons.org/publicdomain/zero/1.0/legalcode>
* Dark color scheme using 216 web-safe colors, inspired
* somewhat by the default color scheme in mutt.
* It reduces eyestrain for me, and energy usage for all:
* https://en.wikipedia.org/wiki/Light-on-dark_color_scheme
*/
* {
font-size: 14px;
font-family: monospace;
}
pre {
white-space: pre-wrap;
padding: 10px;
background: #f5f5f5;
}
hr {
margin: 30px 0;
}
body {
max-width: 120ex; /* 120 columns wide */
margin: 50px auto;
}
/*
* Underlined links add visual noise which make them hard-to-read.
* Use colors to make them stand out, instead.
*/
a:link {
color: #007;
text-decoration: none;
}
a:visited {
color:#504;
}
a:hover {
text-decoration: underline;
}
/* quoted text in emails gets a different color */
*.q { color:gray }
/*
* these may be used with cgit <https://git.zx2c4.com/cgit/>, too.
* (cgit uses <div>, public-inbox uses <span>)
*/
*.add { color:darkgreen } /* diff post-image lines */
*.del { color:darkred } /* diff pre-image lines */
*.head { color:black } /* diff header (metainformation) */
*.hunk { color:gray } /* diff hunk-header */
/*
* highlight 3.x colors (tested 3.18) for displaying blobs.
* This doesn't use most of the colors available, as I find too
* many colors overwhelming, so the default is commented out.
*/
.hl.num { color:#f30 } /* number */
.hl.esc { color:#f0f } /* escape character */
.hl.str { color:#f30 } /* string */
.hl.ppc { color:#f0f } /* preprocessor */
.hl.pps { color:#f30 } /* preprocessor string */
.hl.slc { color:#09f } /* single-line comment */
.hl.com { color:#09f } /* multi-line comment */
/* .hl.opt { color:#ccc } */ /* operator */
/* .hl.ipl { color:#ccc } */ /* interpolation */
/* keyword groups kw[a-z] */
.hl.kwa { color:#ff0 }
.hl.kwb { color:#0f0 }
.hl.kwc { color:#ff0 }
/* .hl.kwd { color:#ccc } */
/* line-number (unused by public-inbox) */
/* .hl.lin { color:#ccc } */

View File

@ -1,47 +0,0 @@
{ lib, ... }:
{
services.public-inbox = {
enable = true;
http = {
enable = true;
port = 8081;
mounts = [ "/lists" ];
};
settings.publicinbox = {
css = [ "${./public-inbox.css}" ];
wwwlisting = "all";
};
inboxes = {
bscpkgs = {
url = "https://jungle.bsc.es/lists/bscpkgs";
address = [ "~rodarima/bscpkgs@lists.sr.ht" ];
watch = [ "imaps://jungle-robot%40gmx.com@imap.gmx.com/INBOX" ];
description = "Patches for bscpkgs";
listid = "~rodarima/bscpkgs.lists.sr.ht";
};
jungle = {
url = "https://jungle.bsc.es/lists/jungle";
address = [ "~rodarima/jungle@lists.sr.ht" ];
watch = [ "imaps://jungle-robot%40gmx.com@imap.gmx.com/INBOX" ];
description = "Patches for jungle";
listid = "~rodarima/jungle.lists.sr.ht";
};
};
};
# We need access to the network for the watch service, as we will fetch the
# emails directly from the IMAP server.
systemd.services.public-inbox-watch.serviceConfig = {
PrivateNetwork = lib.mkForce false;
RestrictAddressFamilies = lib.mkForce [ "AF_UNIX" "AF_INET" "AF_INET6" ];
KillSignal = "SIGKILL"; # Avoid slow shutdown
# Required for chmod(..., 02750) on directories by git, from
# systemd.exec(8):
# > Note that this restricts marking of any type of file system object with
# > these bits, including both regular files and directories (where the SGID
# > is a different meaning than for files, see documentation).
RestrictSUIDSGID = lib.mkForce false;
};
}

7
m/hut/slurm-daemon.nix Normal file
View File

@ -0,0 +1,7 @@
{ ... }:
{
services.slurm = {
server.enable = true;
};
}

View File

@ -1,15 +1,15 @@
- targets:
- owl1-ipmi
- owl2-ipmi
- xeon03-ipmi
- xeon04-ipmi
- koro-ipmi
- weasel-ipmi
- hut-ipmi
- eudy-ipmi
- 10.0.40.101
- 10.0.40.102
- 10.0.40.103
- 10.0.40.104
- 10.0.40.105
- 10.0.40.106
- 10.0.40.107
- 10.0.40.108
# Storage
- bay-ipmi
- oss01-ipmi
- lake2-ipmi
- 10.0.40.141
- 10.0.40.142
- 10.0.40.143
labels:
job: ipmi-lan

View File

@ -2,11 +2,13 @@
{
imports = [
../common/ssf.nix
../common/main.nix
#(modulesPath + "/installer/netboot/netboot-minimal.nix")
../eudy/cpufreq.nix
../eudy/users.nix
../eudy/slurm.nix
./users.nix
./kernel.nix
];

View File

@ -1,29 +1,9 @@
{ pkgs, lib, ... }:
let
#fcs-devel = pkgs.linuxPackages_custom {
# version = "6.2.8";
# src = /mnt/data/kernel/fcs/kernel/src;
# configfile = /mnt/data/kernel/fcs/kernel/configs/defconfig;
#};
kernel = nixos-fcsv4;
#fcsv1 = fcs-kernel "bc11660676d3d68ce2459b9fb5d5e654e3f413be" false;
#fcsv2 = fcs-kernel "db0f2eca0cd57a58bf456d7d2c7d5d8fdb25dfb1" false;
#fcsv1-lockdep = fcs-kernel "bc11660676d3d68ce2459b9fb5d5e654e3f413be" true;
#fcsv2-lockdep = fcs-kernel "db0f2eca0cd57a58bf456d7d2c7d5d8fdb25dfb1" true;
#fcs-kernel = gitCommit: lockdep: pkgs.linuxPackages_custom {
# version = "6.2.8";
# src = builtins.fetchGit {
# url = "git@bscpm03.bsc.es:ompss-kernel/linux.git";
# rev = gitCommit;
# ref = "fcs";
# };
# configfile = if lockdep then ./configs/lockdep else ./configs/defconfig;
#};
kernel = nixos-fcs;
nixos-fcs-kernel = lib.makeOverridable ({gitCommit, lockStat ? false, preempt ? false, branch ? "fcs"}: pkgs.linuxPackagesFor (pkgs.buildLinux rec {
nixos-fcs-kernel = {gitCommit, lockStat ? false, preempt ? false, branch ? "fcs"}: pkgs.linuxPackagesFor (pkgs.buildLinux rec {
version = "6.2.8";
src = builtins.fetchGit {
url = "git@bscpm03.bsc.es:ompss-kernel/linux.git";
@ -40,13 +20,27 @@ let
};
kernelPatches = [];
extraMeta.branch = lib.versions.majorMinor version;
}));
});
nixos-fcs = nixos-fcs-kernel {gitCommit = "8a09822dfcc8f0626b209d6d2aec8b5da459dfee";};
nixos-fcs-lockstat = nixos-fcs.override {
nixos-fcsv1 = nixos-fcs-kernel {gitCommit = "bc11660676d3d68ce2459b9fb5d5e654e3f413be";};
nixos-fcsv2 = nixos-fcs-kernel {gitCommit = "db0f2eca0cd57a58bf456d7d2c7d5d8fdb25dfb1";};
nixos-fcsv3 = nixos-fcs-kernel {gitCommit = "6c17394890704c3345ac1a521bb547164b36b154";};
nixos-fcsv4 = nixos-fcs-kernel {gitCommit = "c94c3d946f33ac3e5782a02ee002cc1164c0cb4f";};
nixos-fcsv1-lockstat = nixos-fcs-kernel {
gitCommit = "bc11660676d3d68ce2459b9fb5d5e654e3f413be";
lockStat = true;
};
nixos-fcs-lockstat-preempt = nixos-fcs.override {
nixos-fcsv2-lockstat = nixos-fcs-kernel {
gitCommit = "db0f2eca0cd57a58bf456d7d2c7d5d8fdb25dfb1";
lockStat = true;
};
nixos-fcsv3-lockstat = nixos-fcs-kernel {
gitCommit = "6c17394890704c3345ac1a521bb547164b36b154";
lockStat = true;
};
nixos-fcsv3-lockstat-preempt = nixos-fcs-kernel {
gitCommit = "6c17394890704c3345ac1a521bb547164b36b154";
lockStat = true;
preempt = true;
};
@ -66,5 +60,5 @@ in {
# enable memory overcommit, needed to build a taglibc system using nix after
# increasing the openblas memory footprint
boot.kernel.sysctl."vm.overcommit_memory" = 1;
boot.kernel.sysctl."vm.overcommit_memory" = lib.mkForce 1;
}

17
m/koro/users.nix Normal file
View File

@ -0,0 +1,17 @@
{ ... }:
{
users.users = {
vlopez = {
uid = 4334;
isNormalUser = true;
home = "/home/Computational/vlopez";
description = "Victor López";
group = "Computational";
hashedPassword = "$6$0ZBkgIYE/renVqtt$1uWlJsb0FEezRVNoETTzZMx4X2SvWiOsKvi0ppWCRqI66S6TqMBXBdP4fcQyvRRBt0e4Z7opZIvvITBsEtO0f0";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGMwlUZRf9jfG666Qa5Sb+KtEhXqkiMlBV2su3x/dXHq victor@arch"
];
};
};
}

View File

@ -2,21 +2,20 @@
{
imports = [
../common/ssf.nix
../module/monitoring.nix
../module/hut-substituter.nix
../common/main.nix
../common/monitoring.nix
];
boot.loader.grub.device = "/dev/disk/by-id/wwn-0x55cd2e414d53563a";
boot.kernel.sysctl = {
"kernel.yama.ptrace_scope" = lib.mkForce "1";
};
environment.systemPackages = with pkgs; [
ceph
];
services.slurm = {
client.enable = lib.mkForce false;
};
services.ceph = {
enable = true;
global = {
@ -50,16 +49,6 @@
address = "10.0.42.42";
prefixLength = 24;
} ];
firewall = {
extraCommands = ''
# Accept all incoming TCP traffic from bay
iptables -A nixos-fw -p tcp -s bay -j nixos-fw-accept
# Accept monitoring requests from hut
iptables -A nixos-fw -p tcp -s hut --dport 9002 -j nixos-fw-accept
# Accept all Ceph traffic from the local network
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 -m multiport --dport 3300,6789,6800:7568 -j nixos-fw-accept
'';
};
};
# Missing service for volumes, see:

View File

@ -1,70 +0,0 @@
{
# In physical order from top to bottom (see note below)
ssf = {
# Switches for Ethernet and OmniPath
switch-C6-S1A-05 = { pos=42; size=1; model="Dell S3048-ON"; };
switch-opa = { pos=41; size=1; };
# SSF login
apex = { pos=39; size=2; label="SSFHEAD"; board="R2208WTTYSR"; contact="rodrigo.arias@bsc.es"; };
# Storage
bay = { pos=38; size=1; label="MDS01"; board="S2600WT2R"; sn="BQWL64850303"; contact="rodrigo.arias@bsc.es"; };
lake1 = { pos=37; size=1; label="OSS01"; board="S2600WT2R"; sn="BQWL64850234"; contact="rodrigo.arias@bsc.es"; };
lake2 = { pos=36; size=1; label="OSS02"; board="S2600WT2R"; sn="BQWL64850266"; contact="rodrigo.arias@bsc.es"; };
# Compute xeon
owl1 = { pos=35; size=1; label="SSF-XEON01"; board="S2600WTTR"; sn="BQWL64954172"; contact="rodrigo.arias@bsc.es"; };
owl2 = { pos=34; size=1; label="SSF-XEON02"; board="S2600WTTR"; sn="BQWL64756560"; contact="rodrigo.arias@bsc.es"; };
xeon03 = { pos=33; size=1; label="SSF-XEON03"; board="S2600WTTR"; sn="BQWL64750826"; contact="rodrigo.arias@bsc.es"; };
# Slot 34 empty
koro = { pos=31; size=1; label="SSF-XEON05"; board="S2600WTTR"; sn="BQWL64954293"; contact="rodrigo.arias@bsc.es"; };
weasel = { pos=30; size=1; label="SSF-XEON06"; board="S2600WTTR"; sn="BQWL64750846"; contact="antoni.navarro@bsc.es"; };
hut = { pos=29; size=1; label="SSF-XEON07"; board="S2600WTTR"; sn="BQWL64751184"; contact="rodrigo.arias@bsc.es"; };
eudy = { pos=28; size=1; label="SSF-XEON08"; board="S2600WTTR"; sn="BQWL64756586"; contact="aleix.rocanonell@bsc.es"; };
# 16 KNL nodes, 4 per chassis
knl01_04 = { pos=26; size=2; label="KNL01..KNL04"; board="HNS7200APX"; };
knl05_08 = { pos=24; size=2; label="KNL05..KNL18"; board="HNS7200APX"; };
knl09_12 = { pos=22; size=2; label="KNL09..KNL12"; board="HNS7200APX"; };
knl13_16 = { pos=20; size=2; label="KNL13..KNL16"; board="HNS7200APX"; };
# Slot 19 empty
# EPI (hw team, guessed order)
epi01 = { pos=18; size=1; contact="joan.cabre@bsc.es"; };
epi02 = { pos=17; size=1; contact="joan.cabre@bsc.es"; };
epi03 = { pos=16; size=1; contact="joan.cabre@bsc.es"; };
anon = { pos=14; size=2; }; # Unlabeled machine. Operative
# These are old and decommissioned (off)
power8 = { pos=12; size=2; label="BSCPOWER8N3"; decommissioned=true; };
powern1 = { pos=8; size=4; label="BSCPOWERN1"; decommissioned=true; };
gustafson = { pos=7; size=1; label="gustafson"; decommissioned=true; };
odap01 = { pos=3; size=4; label="ODAP01"; decommissioned=true; };
amhdal = { pos=2; size=1; label="AMHDAL"; decommissioned=true; }; # sic
moore = { pos=1; size=1; label="moore (earth)"; decommissioned=true; };
};
bsc2218 = {
raccoon = { board="W2600CR"; sn="QSIP22500829"; contact="rodrigo.arias@bsc.es"; };
tent = { label="SSF-XEON04"; board="S2600WTTR"; sn="BQWL64751229"; contact="rodrigo.arias@bsc.es"; };
};
upc = {
fox = { board="H13DSG-O-CPU"; sn="UM24CS600392"; prod="AS-4125GS-TNRT"; prod_sn="E508839X5103339"; contact="rodrigo.arias@bsc.es"; };
};
# NOTE: Position is specified in "U" units (44.45 mm) and starts at 1 from the
# bottom. Example:
#
# | ... | - [pos+size] <--- Label in chassis
# +--------+
# | node | - [pos+1]
# | 2U | - [pos]
# +------- +
# | ... | - [pos-1]
#
# NOTE: The board and sn refers to the FRU information (Board Product and
# Board Serial) via `ipmitool fru print 0`.
}

View File

@ -1,357 +0,0 @@
{
config,
options,
lib,
pkgs,
...
}:
with lib;
let
cfg = config.age;
isDarwin = lib.attrsets.hasAttrByPath [ "environment" "darwinConfig" ] options;
ageBin = config.age.ageBin;
users = config.users.users;
sysusersEnabled =
if isDarwin then
false
else
options.systemd ? sysusers && (config.systemd.sysusers.enable || config.services.userborn.enable);
mountCommand =
if isDarwin then
''
if ! diskutil info "${cfg.secretsMountPoint}" &> /dev/null; then
num_sectors=1048576
dev=$(hdiutil attach -nomount ram://"$num_sectors" | sed 's/[[:space:]]*$//')
newfs_hfs -v agenix "$dev"
mount -t hfs -o nobrowse,nodev,nosuid,-m=0751 "$dev" "${cfg.secretsMountPoint}"
fi
''
else
''
grep -q "${cfg.secretsMountPoint} ramfs" /proc/mounts ||
mount -t ramfs none "${cfg.secretsMountPoint}" -o nodev,nosuid,mode=0751
'';
newGeneration = ''
_agenix_generation="$(basename "$(readlink ${cfg.secretsDir})" || echo 0)"
(( ++_agenix_generation ))
echo "[agenix] creating new generation in ${cfg.secretsMountPoint}/$_agenix_generation"
mkdir -p "${cfg.secretsMountPoint}"
chmod 0751 "${cfg.secretsMountPoint}"
${mountCommand}
mkdir -p "${cfg.secretsMountPoint}/$_agenix_generation"
chmod 0751 "${cfg.secretsMountPoint}/$_agenix_generation"
'';
chownGroup = if isDarwin then "admin" else "keys";
# chown the secrets mountpoint and the current generation to the keys group
# instead of leaving it root:root.
chownMountPoint = ''
chown :${chownGroup} "${cfg.secretsMountPoint}" "${cfg.secretsMountPoint}/$_agenix_generation"
'';
setTruePath = secretType: ''
${
if secretType.symlink then
''
_truePath="${cfg.secretsMountPoint}/$_agenix_generation/${secretType.name}"
''
else
''
_truePath="${secretType.path}"
''
}
'';
installSecret = secretType: ''
${setTruePath secretType}
echo "decrypting '${secretType.file}' to '$_truePath'..."
TMP_FILE="$_truePath.tmp"
IDENTITIES=()
for identity in ${toString cfg.identityPaths}; do
test -r "$identity" || continue
test -s "$identity" || continue
IDENTITIES+=(-i)
IDENTITIES+=("$identity")
done
test "''${#IDENTITIES[@]}" -eq 0 && echo "[agenix] WARNING: no readable identities found!"
mkdir -p "$(dirname "$_truePath")"
[ "${secretType.path}" != "${cfg.secretsDir}/${secretType.name}" ] && mkdir -p "$(dirname "${secretType.path}")"
(
umask u=r,g=,o=
test -f "${secretType.file}" || echo '[agenix] WARNING: encrypted file ${secretType.file} does not exist!'
test -d "$(dirname "$TMP_FILE")" || echo "[agenix] WARNING: $(dirname "$TMP_FILE") does not exist!"
LANG=${
config.i18n.defaultLocale or "C"
} ${ageBin} --decrypt "''${IDENTITIES[@]}" -o "$TMP_FILE" "${secretType.file}"
)
chmod ${secretType.mode} "$TMP_FILE"
mv -f "$TMP_FILE" "$_truePath"
${optionalString secretType.symlink ''
[ "${secretType.path}" != "${cfg.secretsDir}/${secretType.name}" ] && ln -sfT "${cfg.secretsDir}/${secretType.name}" "${secretType.path}"
''}
'';
testIdentities = map (path: ''
test -f ${path} || echo '[agenix] WARNING: config.age.identityPaths entry ${path} not present!'
'') cfg.identityPaths;
cleanupAndLink = ''
_agenix_generation="$(basename "$(readlink ${cfg.secretsDir})" || echo 0)"
(( ++_agenix_generation ))
echo "[agenix] symlinking new secrets to ${cfg.secretsDir} (generation $_agenix_generation)..."
ln -sfT "${cfg.secretsMountPoint}/$_agenix_generation" ${cfg.secretsDir}
(( _agenix_generation > 1 )) && {
echo "[agenix] removing old secrets (generation $(( _agenix_generation - 1 )))..."
rm -rf "${cfg.secretsMountPoint}/$(( _agenix_generation - 1 ))"
}
'';
installSecrets = builtins.concatStringsSep "\n" (
[ "echo '[agenix] decrypting secrets...'" ]
++ testIdentities
++ (map installSecret (builtins.attrValues cfg.secrets))
++ [ cleanupAndLink ]
);
chownSecret = secretType: ''
${setTruePath secretType}
chown ${secretType.owner}:${secretType.group} "$_truePath"
'';
chownSecrets = builtins.concatStringsSep "\n" (
[ "echo '[agenix] chowning...'" ]
++ [ chownMountPoint ]
++ (map chownSecret (builtins.attrValues cfg.secrets))
);
secretType = types.submodule (
{ config, ... }:
{
options = {
name = mkOption {
type = types.str;
default = config._module.args.name;
defaultText = literalExpression "config._module.args.name";
description = ''
Name of the file used in {option}`age.secretsDir`
'';
};
file = mkOption {
type = types.path;
description = ''
Age file the secret is loaded from.
'';
};
path = mkOption {
type = types.str;
default = "${cfg.secretsDir}/${config.name}";
defaultText = literalExpression ''
"''${cfg.secretsDir}/''${config.name}"
'';
description = ''
Path where the decrypted secret is installed.
'';
};
mode = mkOption {
type = types.str;
default = "0400";
description = ''
Permissions mode of the decrypted secret in a format understood by chmod.
'';
};
owner = mkOption {
type = types.str;
default = "0";
description = ''
User of the decrypted secret.
'';
};
group = mkOption {
type = types.str;
default = users.${config.owner}.group or "0";
defaultText = literalExpression ''
users.''${config.owner}.group or "0"
'';
description = ''
Group of the decrypted secret.
'';
};
symlink = mkEnableOption "symlinking secrets to their destination" // {
default = true;
};
};
}
);
in
{
imports = [
(mkRenamedOptionModule [ "age" "sshKeyPaths" ] [ "age" "identityPaths" ])
];
options.age = {
ageBin = mkOption {
type = types.str;
default = "${pkgs.age}/bin/age";
defaultText = literalExpression ''
"''${pkgs.age}/bin/age"
'';
description = ''
The age executable to use.
'';
};
secrets = mkOption {
type = types.attrsOf secretType;
default = { };
description = ''
Attrset of secrets.
'';
};
secretsDir = mkOption {
type = types.path;
default = "/run/agenix";
description = ''
Folder where secrets are symlinked to
'';
};
secretsMountPoint = mkOption {
type =
types.addCheck types.str (
s:
(builtins.match "[ \t\n]*" s) == null # non-empty
&& (builtins.match ".+/" s) == null
) # without trailing slash
// {
description = "${types.str.description} (with check: non-empty without trailing slash)";
};
default = "/run/agenix.d";
description = ''
Where secrets are created before they are symlinked to {option}`age.secretsDir`
'';
};
identityPaths = mkOption {
type = types.listOf types.path;
default =
if isDarwin then
[
"/etc/ssh/ssh_host_ed25519_key"
"/etc/ssh/ssh_host_rsa_key"
]
else if (config.services.openssh.enable or false) then
map (e: e.path) (
lib.filter (e: e.type == "rsa" || e.type == "ed25519") config.services.openssh.hostKeys
)
else
[ ];
defaultText = literalExpression ''
if isDarwin
then [
"/etc/ssh/ssh_host_ed25519_key"
"/etc/ssh/ssh_host_rsa_key"
]
else if (config.services.openssh.enable or false)
then map (e: e.path) (lib.filter (e: e.type == "rsa" || e.type == "ed25519") config.services.openssh.hostKeys)
else [];
'';
description = ''
Path to SSH keys to be used as identities in age decryption.
'';
};
};
config = mkIf (cfg.secrets != { }) (mkMerge [
{
assertions = [
{
assertion = cfg.identityPaths != [ ];
message = "age.identityPaths must be set, for example by enabling openssh.";
}
];
}
(optionalAttrs (!isDarwin) {
# When using sysusers we no longer be started as an activation script
# because those are started in initrd while sysusers is started later.
systemd.services.agenix-install-secrets = mkIf sysusersEnabled {
wantedBy = [ "sysinit.target" ];
after = [ "systemd-sysusers.service" ];
unitConfig.DefaultDependencies = "no";
path = [ pkgs.mount ];
serviceConfig = {
Type = "oneshot";
ExecStart = pkgs.writeShellScript "agenix-install" (concatLines [
newGeneration
installSecrets
chownSecrets
]);
RemainAfterExit = true;
};
};
# Create a new directory full of secrets for symlinking (this helps
# ensure removed secrets are actually removed, or at least become
# invalid symlinks).
system.activationScripts = mkIf (!sysusersEnabled) {
agenixNewGeneration = {
text = newGeneration;
deps = [
"specialfs"
];
};
agenixInstall = {
text = installSecrets;
deps = [
"agenixNewGeneration"
"specialfs"
];
};
# So user passwords can be encrypted.
users.deps = [ "agenixInstall" ];
# Change ownership and group after users and groups are made.
agenixChown = {
text = chownSecrets;
deps = [
"users"
"groups"
];
};
# So other activation scripts can depend on agenix being done.
agenix = {
text = "";
deps = [ "agenixChown" ];
};
};
})
(optionalAttrs isDarwin {
launchd.daemons.activate-agenix = {
script = ''
set -e
set -o pipefail
export PATH="${pkgs.gnugrep}/bin:${pkgs.coreutils}/bin:@out@/sw/bin:/usr/bin:/bin:/usr/sbin:/sbin"
${newGeneration}
${installSecrets}
${chownSecrets}
exit 0
'';
serviceConfig = {
RunAtLoad = true;
KeepAlive.SuccessfulExit = false;
};
};
})
]);
}

View File

@ -1,49 +0,0 @@
{ config, lib, pkgs, ... }:
{
options = {
services.amd-uprof = {
enable = lib.mkOption {
type = lib.types.bool;
default = false;
description = "Whether to enable AMD uProf.";
};
};
};
# Only setup amd-uprof if enabled
config = lib.mkIf config.services.amd-uprof.enable {
# First make sure that we add the module to the list of available modules
# in the kernel matching the same kernel version of this configuration.
boot.extraModulePackages = with config.boot.kernelPackages; [ amd-uprof-driver ];
boot.kernelModules = [ "AMDPowerProfiler" ];
# Make the userspace tools available in $PATH.
environment.systemPackages = with pkgs; [ amd-uprof ];
# The AMDPowerProfiler module doesn't create the /dev device nor it emits
# any uevents, so we cannot use udev rules to automatically create the
# device. Instead, we run a systemd unit that does it after loading the
# modules.
systemd.services.amd-uprof-device = {
description = "Create /dev/AMDPowerProfiler device";
after = [ "systemd-modules-load.service" ];
wantedBy = [ "multi-user.target" ];
unitConfig.ConditionPathExists = [
"/proc/AMDPowerProfiler/device"
"!/dev/AMDPowerProfiler"
];
serviceConfig = {
Type = "oneshot";
RemainAfterExit = true;
ExecStart = pkgs.writeShellScript "add-amd-uprof-dev.sh" ''
mknod /dev/AMDPowerProfiler -m 666 c $(< /proc/AMDPowerProfiler/device) 0
'';
ExecStop = pkgs.writeShellScript "remove-amd-uprof-dev.sh" ''
rm -f /dev/AMDPowerProfiler
'';
};
};
};
}

View File

@ -3,6 +3,7 @@
# Mounts the /ceph filesystem at boot
{
environment.systemPackages = with pkgs; [
ceph
ceph-client
fio # For benchmarks
];

View File

@ -1,3 +0,0 @@
{
services.nixseparatedebuginfod.enable = true;
}

View File

@ -1,3 +0,0 @@
{
boot.binfmt.emulatedSystems = [ "armv7l-linux" "aarch64-linux" "powerpc64le-linux" "riscv64-linux" ];
}

View File

@ -1,13 +0,0 @@
{ config, ... }:
{
nix.settings =
# Don't add hut as a cache to itself
assert config.networking.hostName != "hut";
{
extra-substituters = [ "http://hut/cache" ];
extra-trusted-public-keys = [ "jungle.bsc.es:pEc7MlAT0HEwLQYPtpkPLwRsGf80ZI26aj29zMw/HH0=" ];
# Set a low timeout in case hut is down
connect-timeout = 3; # seconds
};
}

View File

@ -1,24 +0,0 @@
{ config, lib, ... }:
with lib;
{
options = {
users.jungleUsers = mkOption {
type = types.attrsOf (types.anything // { check = (x: x ? "hosts"); });
description = ''
Same as users.users but with the extra `hosts` attribute, which controls
access to the nodes by `networking.hostName`.
'';
};
};
config = let
allowedUser = host: userConf: builtins.elem host userConf.hosts;
filterUsers = host: users: filterAttrs (n: v: allowedUser host v) users;
removeHosts = users: mapAttrs (n: v: builtins.removeAttrs v [ "hosts" ]) users;
currentHost = config.networking.hostName;
in {
users.users = removeHosts (filterUsers currentHost config.users.jungleUsers);
};
}

View File

@ -1,17 +0,0 @@
{ config, lib, pkgs, ... }:
with lib;
{
systemd.services."prometheus-meteocat-exporter" = {
wantedBy = [ "multi-user.target" ];
after = [ "network.target" ];
serviceConfig = {
Restart = mkDefault "always";
PrivateTmp = mkDefault true;
WorkingDirectory = mkDefault "/tmp";
DynamicUser = mkDefault true;
ExecStart = "${pkgs.meteocat-exporter}/bin/meteocat-exporter";
};
};
}

View File

@ -1,26 +0,0 @@
#!/bin/sh
# Locate nix daemon pid
nd=$(pgrep -o nix-daemon)
# Locate children of nix-daemon
pids1=$(tr ' ' '\n' < "/proc/$nd/task/$nd/children")
# For each children, locate 2nd level children
pids2=$(echo "$pids1" | xargs -I @ /bin/sh -c 'cat /proc/@/task/*/children' | tr ' ' '\n')
cat <<EOF
HTTP/1.1 200 OK
Content-Type: text/plain; version=0.0.4; charset=utf-8; escaping=values
# HELP nix_daemon_build Nix daemon derivation build state.
# TYPE nix_daemon_build gauge
EOF
for pid in $pids2; do
name=$(cat /proc/$pid/environ 2>/dev/null | tr '\0' '\n' | rg "^name=(.+)" - --replace '$1' | tr -dc ' [:alnum:]_\-\.')
user=$(ps -o uname= -p "$pid")
if [ -n "$name" -a -n "$user" ]; then
printf 'nix_daemon_build{user="%s",name="%s"} 1\n' "$user" "$name"
fi
done

View File

@ -1,23 +0,0 @@
{ pkgs, config, lib, ... }:
let
script = pkgs.runCommand "nix-daemon-exporter.sh" { }
''
cp ${./nix-daemon-builds.sh} $out;
chmod +x $out
''
;
in
{
systemd.services.nix-daemon-exporter = {
description = "Daemon to export nix-daemon metrics";
path = [ pkgs.procps pkgs.ripgrep ];
wantedBy = [ "default.target" ];
serviceConfig = {
Type = "simple";
ExecStart = "${pkgs.socat}/bin/socat TCP4-LISTEN:9999,fork EXEC:${script}";
# Needed root to read the environment, potentially unsafe
User = "root";
Group = "root";
};
};
}

View File

@ -1,20 +0,0 @@
{ lib, config, pkgs, ... }:
{
# Configure Nvidia driver to use with CUDA
hardware.nvidia.package = config.boot.kernelPackages.nvidiaPackages.production;
hardware.nvidia.open = lib.mkDefault (builtins.abort "hardware.nvidia.open not set");
hardware.graphics.enable = true;
nixpkgs.config.nvidia.acceptLicense = true;
services.xserver.videoDrivers = [ "nvidia" ];
# enable support for derivations which require nvidia-gpu to be available
# > requiredSystemFeatures = [ "cuda" ];
programs.nix-required-mounts.enable = true;
programs.nix-required-mounts.presets.nvidia-gpu.enable = true;
# They forgot to add the symlink
programs.nix-required-mounts.allowedPatterns.nvidia-gpu.paths = [
config.systemd.tmpfiles.settings.graphics-driver."/run/opengl-driver"."L+".argument
];
environment.systemPackages = [ pkgs.cudainfo ];
}

View File

@ -1,68 +0,0 @@
{ config, lib, pkgs, ... }:
let
cfg = config.services.p;
in
{
options = {
services.p = {
enable = lib.mkOption {
type = lib.types.bool;
default = false;
description = "Whether to enable the p service.";
};
path = lib.mkOption {
type = lib.types.str;
default = "/var/lib/p";
description = "Where to save the pasted files on disk.";
};
url = lib.mkOption {
type = lib.types.str;
default = "https://jungle.bsc.es/p";
description = "URL prefix for the printed file.";
};
};
};
config = lib.mkIf cfg.enable {
environment.systemPackages = let
p = pkgs.writeShellScriptBin "p" ''
set -e
pastedir="${cfg.path}/$USER"
cd "$pastedir"
ext="txt"
if [ -n "$1" ]; then
ext="$1"
fi
out=$(mktemp "XXXXXXXX.$ext")
cat > "$out"
chmod go+r "$out"
echo "${cfg.url}/$USER/$out"
'';
in [ p ];
systemd.services.p = let
# Take only normal users
users = lib.filterAttrs (_: v: v.isNormalUser) config.users.users;
# Create a directory for each user
commands = lib.concatLists (lib.mapAttrsToList (_: user: [
"install -d -o ${user.name} -g ${user.group} -m 0755 ${cfg.path}/${user.name}"
]) users);
in {
description = "P service setup";
requires = [ "network-online.target" ];
#wants = [ "remote-fs.target" ];
#after = [ "remote-fs.target" ];
wantedBy = [ "multi-user.target" ];
serviceConfig = {
ExecStart = pkgs.writeShellScript "p-init.sh" (''
install -d -o root -g root -m 0755 ${cfg.path}
'' + (lib.concatLines commands));
};
};
};
}

View File

@ -1,33 +0,0 @@
{ config, lib, pkgs, ... }:
with lib;
let
cfg = config.power.policy;
in
{
options = {
power.policy = mkOption {
type = types.nullOr (types.enum [ "always-on" "previous" "always-off" ]);
default = null;
description = "Set power policy to use via IPMI.";
};
};
config = mkIf (cfg != null) {
systemd.services."power-policy" = {
description = "Set power policy to use via IPMI";
wantedBy = [ "multi-user.target" ];
unitConfig = {
StartLimitBurst = "10";
StartLimitIntervalSec = "10m";
};
serviceConfig = {
ExecStart = "${pkgs.ipmitool}/bin/ipmitool chassis policy ${cfg}";
Type = "oneshot";
Restart = "on-failure";
RestartSec = "5s";
};
};
};
}

View File

@ -0,0 +1,69 @@
{ ... }:
{
# Don't make the nix store read-only, as this would prevent the overlay FS
# from being able to mount it.
boot.readOnlyNixStore = false;
# The nix-daemon.socket has an unnecessary dependency over the /nix/store
# mount point. But that mount point won't be provided until the network is
# ready. However, the network-address-eno1.service, has a dependency over
# sockets.target, causing a cycle.
# One solution is to make the nix-daemon.socket depend only on the socket
# patch (which is already covered by ConditionPathIsReadWrite =
# /nix/var/nix/daemon-socket), instead on the /nix/store.
#
# Using systemd.sockets.nix-daemon.unitConfig.RequiresMountsFor =
# "/nix/var/nix/daemon-socket" doesn't work, as the the mount options get
# added by systemd when the override config is merged with the one that Nix
# provides:
#
# owl2% sudo systemctl show nix-daemon.socket | grep RequiresMountsFor
# RequiresMountsFor=/nix/store /nix/var/nix/daemon-socket/socket /nix/var/nix/daemon-socket
#
# To fix this, the Nix package is patched to only depend on /nix/var instead.
# See ../../pkgs/overlay.nix for details.
# Mount the hut nix store via NFS in read-only mode.
fileSystems."/mnt/hut-nix-store" = {
device = "hut:/nix/store";
fsType = "nfs";
options = [ "ro" ];
};
# A workdir is also needed, so setup a permanent dir using tmpfiles.
systemd.tmpfiles.rules = [
"d /mnt/nix-work 0700 root root -"
];
# Mount an overlay in /nix/store using as lower layer the NFS store and upper
# layer the disk nix store. The destination is still the nix store in
# /nix/store (confusing). We need rw access, as the daemon need to write the
# lock files to build derivations locally. Use a systemd mount unit directly
# so we can specify the LazyUmount option and we avoid having it mounted
# in the stage1 before systemd.
systemd.mounts = [
{
what = "overlay";
type = "overlay";
where = "/nix/store";
# We need the local-fs.target to be ready, so the network interfaces can
# be configured to the network.target is reached. So make this a netdev
# mount.
options = "_netdev,lowerdir=/mnt/hut-nix-store,upperdir=/nix/store,workdir=/mnt/nix-work";
description = "Overlay /nix/store mount";
mountConfig = {
LazyUnmount = true;
};
# Run the unit after remote-fs-pre.target but before the remote-fs.target
after = [ "remote-fs-pre.target"];
before = [ "umount.target" "remote-fs.target" ];
# Install by using wantedBy over remote-fs.target
wantedBy = [ "remote-fs.target" ];
unitConfig = {
# We need to wait for the NFS mount
RequiresMountsFor = "/nix/store /mnt/hut-nix-store";
};
}
];
}

View File

@ -1,40 +0,0 @@
{ lib, pkgs, ... }:
{
imports = [
./slurm-common.nix
];
systemd.services.slurmd.serviceConfig = {
# Kill all processes in the control group on stop/restart. This will kill
# all the jobs running, so ensure that we only upgrade when the nodes are
# not in use. See:
# https://github.com/NixOS/nixpkgs/commit/ae93ed0f0d4e7be0a286d1fca86446318c0c6ffb
# https://bugs.schedmd.com/show_bug.cgi?id=2095#c24
KillMode = lib.mkForce "control-group";
# If slurmd fails to contact the control server it will fail, causing the
# node to remain out of service until manually restarted. Always try to
# restart it.
Restart = "always";
RestartSec = "30s";
};
services.slurm.client.enable = true;
# Only allow SSH connections from users who have a SLURM allocation
# See: https://slurm.schedmd.com/pam_slurm_adopt.html
security.pam.services.sshd.rules.account.slurm = {
control = "required";
enable = true;
modulePath = "${pkgs.slurm}/lib/security/pam_slurm_adopt.so";
args = [ "log_level=debug5" ];
order = 999999; # Make it last one
};
# Disable systemd session (pam_systemd.so) as it will conflict with the
# pam_slurm_adopt.so module. What happens is that the shell is first adopted
# into the slurmstepd task and then into the systemd session, which is not
# what we want, otherwise it will linger even if all jobs are gone.
security.pam.services.sshd.startSession = lib.mkForce false;
}

View File

@ -1,78 +0,0 @@
{ config, pkgs, ... }:
{
services.slurm = {
controlMachine = "apex";
clusterName = "jungle";
nodeName = [
"owl[1,2] Sockets=2 CoresPerSocket=14 ThreadsPerCore=2 Feature=owl"
"fox Sockets=8 CoresPerSocket=24 ThreadsPerCore=1"
];
partitionName = [
"owl Nodes=owl[1-2] Default=YES DefaultTime=01:00:00 MaxTime=INFINITE State=UP"
"fox Nodes=fox Default=NO DefaultTime=01:00:00 MaxTime=INFINITE State=UP"
];
# See slurm.conf(5) for more details about these options.
extraConfig = ''
# Use PMIx for MPI by default. It works okay with MPICH and OpenMPI, but
# not with Intel MPI. For that use the compatibility shim libpmi.so
# setting I_MPI_PMI_LIBRARY=$pmix/lib/libpmi.so while maintaining the PMIx
# library in SLURM (--mpi=pmix). See more details here:
# https://pm.bsc.es/gitlab/rarias/jungle/-/issues/16
MpiDefault=pmix
# When a node reboots return that node to the slurm queue as soon as it
# becomes operative again.
ReturnToService=2
# Track all processes by using a cgroup
ProctrackType=proctrack/cgroup
# Enable task/affinity to allow the jobs to run in a specified subset of
# the resources. Use the task/cgroup plugin to enable process containment.
TaskPlugin=task/affinity,task/cgroup
# Reduce port range so we can allow only this range in the firewall
SrunPortRange=60000-61000
# Use cores as consumable resources. In SLURM terms, a core may have
# multiple hardware threads (or CPUs).
SelectType=select/cons_tres
# Ignore memory constraints and only use unused cores to share a node with
# other jobs.
SelectTypeParameters=CR_Core
# Required for pam_slurm_adopt, see https://slurm.schedmd.com/pam_slurm_adopt.html
# This sets up the "extern" step into which ssh-launched processes will be
# adopted. Alloc runs the prolog at job allocation (salloc) rather than
# when a task runs (srun) so we can ssh early.
PrologFlags=Alloc,Contain,X11
LaunchParameters=use_interactive_step
SlurmdDebug=debug5
#DebugFlags=Protocol,Cgroup
'';
extraCgroupConfig = ''
CgroupPlugin=cgroup/v2
#ConstrainCores=yes
'';
};
# Place the slurm config in /etc as this will be required by PAM
environment.etc.slurm.source = config.services.slurm.etcSlurm;
age.secrets.mungeKey = {
file = ../../secrets/munge-key.age;
owner = "munge";
group = "munge";
};
services.munge = {
enable = true;
password = config.age.secrets.mungeKey.path;
};
}

View File

@ -1,28 +0,0 @@
{ config, lib, pkgs, ... }:
# See also: https://github.com/NixOS/nixpkgs/pull/112010
# And: https://github.com/NixOS/nixpkgs/pull/115839
with lib;
{
systemd.services."prometheus-slurm-exporter" = {
wantedBy = [ "multi-user.target" ];
after = [ "network.target" ];
serviceConfig = {
Restart = mkDefault "always";
PrivateTmp = mkDefault true;
WorkingDirectory = mkDefault "/tmp";
DynamicUser = mkDefault true;
ExecStart = ''
${pkgs.prometheus-slurm-exporter}/bin/prometheus-slurm-exporter --listen-address "127.0.0.1:9341"
'';
Environment = [
"PATH=${pkgs.slurm}/bin"
# We need to specify the slurm config to be able to talk to the slurmd
# daemon.
"SLURM_CONF=${config.services.slurm.etcSlurm}/slurm.conf"
];
};
};
}

View File

@ -1,19 +0,0 @@
{ ... }:
{
# Mount the hut nix store via NFS
fileSystems."/mnt/hut-nix-store" = {
device = "hut:/nix/store";
fsType = "nfs";
options = [ "ro" ];
};
systemd.services.slurmd.serviceConfig = {
# When running a job, bind the hut store in /nix/store so the paths are
# available too.
# FIXME: This doesn't keep the programs in /run/current-system/sw/bin
# available in the store. Ideally they should be merged but the overlay FS
# doesn't work when the underlying directories change.
BindReadOnlyPaths = "/mnt/hut-nix-store:/nix/store";
};
}

View File

@ -1,23 +0,0 @@
{ ... }:
{
imports = [
./slurm-common.nix
];
services.slurm.server.enable = true;
networking.firewall = {
extraCommands = ''
# Accept slurm connections to controller from compute nodes
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 6817 -j nixos-fw-accept
# Accept slurm connections from compute nodes for srun
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 60000:61000 -j nixos-fw-accept
# Accept slurm connections to controller from fox (via wireguard)
iptables -A nixos-fw -p tcp -i wg0 -s 10.106.0.1/32 --dport 6817 -j nixos-fw-accept
# Accept slurm connections from fox for srun (via wireguard)
iptables -A nixos-fw -p tcp -i wg0 -s 10.106.0.1/32 --dport 60000:61000 -j nixos-fw-accept
'';
};
}

View File

@ -1,17 +0,0 @@
{ config, lib, pkgs, ... }:
with lib;
{
systemd.services."prometheus-upc-qaire-exporter" = {
wantedBy = [ "multi-user.target" ];
after = [ "network.target" ];
serviceConfig = {
Restart = mkDefault "always";
PrivateTmp = mkDefault true;
WorkingDirectory = mkDefault "/tmp";
DynamicUser = mkDefault true;
ExecStart = "${pkgs.upc-qaire-exporter}/bin/upc-qaire-exporter";
};
};
}

View File

@ -1,35 +0,0 @@
{config, ...}:
{
age.secrets.vpn-dac-login.file = ../../secrets/vpn-dac-login.age;
age.secrets.vpn-dac-client-key.file = ../../secrets/vpn-dac-client-key.age;
services.openvpn.servers = {
# systemctl status openvpn-dac.service
dac = {
config = ''
client
dev tun
proto tcp
remote vpn.ac.upc.edu 1194
remote vpn.ac.upc.edu 80
resolv-retry infinite
nobind
persist-key
persist-tun
ca ${./vpn-dac/ca.crt}
cert ${./vpn-dac/client.crt}
# Only key needs to be secret
key ${config.age.secrets.vpn-dac-client-key.path}
remote-cert-tls server
comp-lzo
verb 3
auth-user-pass ${config.age.secrets.vpn-dac-login.path}
reneg-sec 0
# Only route fox-ipmi
pull-filter ignore "route "
route 147.83.35.27 255.255.255.255
'';
};
};
}

View File

@ -1,31 +0,0 @@
-----BEGIN CERTIFICATE-----
MIIFUjCCBDqgAwIBAgIJAJH118PApk5hMA0GCSqGSIb3DQEBCwUAMIHLMQswCQYD
VQQGEwJFUzESMBAGA1UECBMJQmFyY2Vsb25hMRIwEAYDVQQHEwlCYXJjZWxvbmEx
LTArBgNVBAoTJFVuaXZlcnNpdGF0IFBvbGl0ZWNuaWNhIGRlIENhdGFsdW55YTEk
MCIGA1UECxMbQXJxdWl0ZWN0dXJhIGRlIENvbXB1dGFkb3JzMRAwDgYDVQQDEwdM
Q0FDIENBMQ0wCwYDVQQpEwRMQ0FDMR4wHAYJKoZIhvcNAQkBFg9sY2FjQGFjLnVw
Yy5lZHUwHhcNMTYwMTEyMTI0NDIxWhcNNDYwMTEyMTI0NDIxWjCByzELMAkGA1UE
BhMCRVMxEjAQBgNVBAgTCUJhcmNlbG9uYTESMBAGA1UEBxMJQmFyY2Vsb25hMS0w
KwYDVQQKEyRVbml2ZXJzaXRhdCBQb2xpdGVjbmljYSBkZSBDYXRhbHVueWExJDAi
BgNVBAsTG0FycXVpdGVjdHVyYSBkZSBDb21wdXRhZG9yczEQMA4GA1UEAxMHTENB
QyBDQTENMAsGA1UEKRMETENBQzEeMBwGCSqGSIb3DQEJARYPbGNhY0BhYy51cGMu
ZWR1MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA0CteSeof7Xwi51kC
F0nQ4E9iR5Lq7wtfRuVPn6JJcIxJJ6+F9gr4R/HIHTztW4XAzReE36DYfexupx3D
6UgQIkMLlVyGqRbulNF+RnCx20GosF7Dm4RGBVvOxBP1PGjYq/A+XhaaDAFd0cOF
LMNkzuYP7PF0bnBEaHnxmN8bPmuyDyas7fK9AAc3scyWT2jSBPbOVFvCJwPg8MH9
V/h+hKwL/7hRt1MVfVv2qyIuKwTki8mUt0RcVbP7oJoRY5K1+R52phIz/GL/b4Fx
L6MKXlQxLi8vzP4QZXgCMyV7oFNdU3VqCEXBA11YIRvsOZ4QS19otIk/ZWU5x+HH
LAIJ7wIDAQABo4IBNTCCATEwHQYDVR0OBBYEFNyezX1cH1N4QR14ebBpljqmtE7q
MIIBAAYDVR0jBIH4MIH1gBTcns19XB9TeEEdeHmwaZY6prRO6qGB0aSBzjCByzEL
MAkGA1UEBhMCRVMxEjAQBgNVBAgTCUJhcmNlbG9uYTESMBAGA1UEBxMJQmFyY2Vs
b25hMS0wKwYDVQQKEyRVbml2ZXJzaXRhdCBQb2xpdGVjbmljYSBkZSBDYXRhbHVu
eWExJDAiBgNVBAsTG0FycXVpdGVjdHVyYSBkZSBDb21wdXRhZG9yczEQMA4GA1UE
AxMHTENBQyBDQTENMAsGA1UEKRMETENBQzEeMBwGCSqGSIb3DQEJARYPbGNhY0Bh
Yy51cGMuZWR1ggkAkfXXw8CmTmEwDAYDVR0TBAUwAwEB/zANBgkqhkiG9w0BAQsF
AAOCAQEAUAmOvVXIQrR+aZVO0bOTeugKBHB75eTIZSIHIn2oDUvDbAP5GXIJ56A1
6mZXxemSMY8/9k+pRcwJhfat3IgvAN159XSqf9kRv0NHgc3FWUI1Qv/BsAn0vJO/
oK0dbmbbRWqt86qNrCN+cUfz5aovvxN73jFfnvfDQFBk/8enj9wXxYfokjjLPR1Q
+oTkH8dY68qf71oaUB9MndppPEPSz0K1S6h1XxvJoSu9MVSXOQHiq1cdZdxRazI3
4f7q9sTCL+khwDAuZxAYzlEYxFFa/NN8PWU6xPw6V+t/aDhOiXUPJQB/O/K7mw3Z
TQQx5NqM7B5jjak5fauR3/oRD8XXsA==
-----END CERTIFICATE-----

Some files were not shown because too many files have changed in this diff Show More