9871517be2
Add ICMP probes
...
These probes check if we can reach several targets via ICMP, which is
not proxied, so they can be used to see if ICMP forwarding is working in
the login node.
In particular, we test if we can reach the Google (8.8.8.8) and
Cloudflare (1.1.1.1) DNS servers, the BSC gateway which responds to ping
only from the intranet and the login node (ssfhead).
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-25 17:13:03 +02:00
736eacaac5
Enable proxy for Grafana too
...
The alerts need to contact the slack endpoint, so we add the proxy
environment variables to the grafana systemd service.
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-25 16:55:56 +02:00
0e66aad099
Make blackbox exporter use the proxy
...
By default it was trying to reach the targets using the default gateway,
but since the electrical cut of 2023-10-20, the login node has not
enabled forwarding again. So better if we don't rely on it.
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-25 16:55:24 +02:00
67a4905a0a
Don't log SLURM connection attempts from ssfhead
2023-10-06 15:22:04 +02:00
d52d22e0db
Add docker runner too
2023-10-06 15:17:07 +02:00
42920c2521
Monitor gitlab.bsc.es too
2023-10-06 15:17:07 +02:00
4acd35e036
Monitor PM webpage via blackbox
2023-10-06 15:17:07 +02:00
621d20db3a
Temporarily disable pm runners
2023-10-06 15:17:07 +02:00
0926f6ec1f
Add runner for gitlab.bsc.es
2023-10-06 15:17:07 +02:00
61646cb3bd
Allow anonymous access to grafana
2023-09-22 10:51:30 +02:00
c0066c4744
Remove user/group when using DynamicUsers
2023-09-22 10:13:06 +02:00
ffd0593f51
Set the SLURM_CONF variable
2023-09-21 22:22:00 +02:00
f49ae0773e
Enable slurm-exporter service
2023-09-21 21:40:02 +02:00
8de3d2b149
Mount the hut nix store for SLURM jobs
2023-09-20 19:38:43 +02:00
bc62e28ca3
Enable direnv integration
2023-09-20 09:32:58 +02:00
653d411b9e
Remove bscpkgs from the registry and nixPath
...
This is done to prevent accidental evaluations where the nixpkgs input
of bscpkgs is still pointing to a different version that the one
specified in the jungle flake. Instead use jungle#bscpkgs.X to get a
package from bscpkgs.
2023-09-15 12:00:33 +02:00
a1e8cfea47
Don't fetch registry flakes from the net
2023-09-15 12:00:28 +02:00
e88805947e
Open ports in firewall of compute nodes
2023-09-14 15:45:43 +02:00
d9d249411d
Monitor storage nodes via IPMI too
2023-09-13 15:57:13 +02:00
10ca572aec
Enable fstrim service
2023-09-12 16:39:45 +02:00
75b0f48715
Serve the nix store from hut
2023-09-12 12:19:43 +02:00
19a451db77
Add encrypted munge key with agenix
2023-09-08 19:05:45 +02:00
ec9be9bb62
Remove unused large port hole in firewall
2023-09-08 18:22:48 +02:00
7ddd1977f3
Make exporters listen in localhost only
2023-09-08 18:13:04 +02:00
7050c505b5
Allow only some ports for srun
2023-09-08 17:51:37 +02:00
033a1fe97b
Block ssfhead from reaching our slurm daemon
2023-09-08 17:36:28 +02:00
77cb3c494e
Poweroff idle slurm nodes after 1 hour
2023-09-08 16:49:53 +02:00
6db5772ac4
Add IB and IPMI node host names
2023-09-08 13:21:37 +02:00
dca274d020
Unlock ovni gitlab runners
2023-09-05 16:59:45 +02:00
02f40a8217
Add agenix to all nodes
2023-09-04 22:10:43 +02:00
77d43b6da9
Add agenix module to ceph
2023-09-04 22:07:07 +02:00
ab55aac5ff
Remove old secrets
2023-09-04 22:04:32 +02:00
9b5bfbb7a3
Mount /ceph in owl1 and owl2
2023-09-04 22:00:36 +02:00
a69a71d1b0
Warn about the owl2 omnipath device
2023-09-04 22:00:17 +02:00
98374bd303
Clean owl2 configuration
2023-09-04 21:59:56 +02:00
3b6be8a2fc
Move the ceph client config to an external module
2023-09-04 21:59:04 +02:00
2bb366b9ac
Reorganize secrets and ssh keys
...
The agenix tools needs to read the secrets from a standalone file, but
we also need the same information for the SSH keys.
2023-09-04 21:36:31 +02:00
2d16709648
Add anavarro user
2023-09-04 16:00:01 +02:00
9344daa31c
Set zsh inc_append_history option
2023-09-03 16:57:53 +02:00
80c98041b5
Set zsh shell for rarias
2023-09-03 16:46:27 +02:00
3418e57907
Enable zsh and fix key bindings
2023-09-03 16:42:04 +02:00
6848b58e39
Keep a log over time with the config commits
2023-09-03 00:02:14 +02:00
f9c77b433a
Store nixos config in /etc/nixos/config.rev
2023-09-02 23:37:11 +02:00
9d487845f6
Enable binary emulation for other architectures
2023-08-31 17:27:08 +02:00
3c99c2a662
Enable watchdog
2023-08-30 16:32:17 +02:00
7d09108c9f
Enable all osd on boot in lake2
2023-08-30 16:32:17 +02:00
0f0a861896
Scrape lake2 too
2023-08-29 12:33:26 +02:00
beb0d5940e
Also enable monitoring in lake2
2023-08-29 12:29:41 +02:00
70321ce237
Scrape metrics from bay
2023-08-29 11:58:00 +02:00
5bd1d67333
Add monitoring in the bay node
2023-08-29 11:53:32 +02:00
fad9df61e1
Add fio tool
2023-08-29 11:27:50 +02:00
d2a80c8c18
Add ceph tools in hut too
2023-08-28 17:58:21 +02:00
599613d139
Switch ceph logs to journal
2023-08-28 17:58:08 +02:00
cb3a7b19f7
Move pkgs overlay to overlay.nix
2023-08-25 18:12:00 +02:00
f5d6bf627b
Enable ceph osd daemons in lake2
2023-08-25 14:54:51 +02:00
f1ce815edd
Add the lake2 hostname to the hosts
2023-08-25 14:44:35 +02:00
a2075cfd65
Use the sda for lake2
2023-08-25 13:40:10 +02:00
8f1f6f92a8
Remove netboot module
2023-08-25 13:39:01 +02:00
3416416864
Disable pixiecore in hut for now
2023-08-25 13:21:00 +02:00
815888fb07
Add PXE helper
2023-08-25 12:05:33 +02:00
029d9cb1db
Enable netboot again for PXE
2023-08-24 19:08:23 +02:00
95fa67ede1
Specify the disk by path
2023-08-24 15:27:37 +02:00
a19347161f
Prepare lake2 config after bootstrap
...
The disk ID is different under NixOS.
2023-08-24 13:54:53 +02:00
58c1cc1f7c
Add lake2 bootstrap config
2023-08-24 12:30:46 +02:00
077eece6b9
Add agenix to PATH in hut
2023-08-23 17:42:50 +02:00
b3ef53de51
Store ceph secret key in age
...
This allows a node to mount the ceph FS without any extra ceph
configuration in /etc/ceph.
2023-08-23 17:26:44 +02:00
e0852ee89b
Add rarias key for secrets
2023-08-23 17:15:26 +02:00
dfffc0bdce
Add ceph metrics to prometheus
2023-08-22 16:33:55 +02:00
8257c245b1
Mount the ceph filesystem in hut
2023-08-22 16:15:46 +02:00
cd5853cf53
Add ceph config in bay
2023-08-22 15:58:48 +02:00
b677b827d4
Add the bay host name
2023-08-22 15:56:09 +02:00
b1d5185cca
Remove netboot and fixes
2023-08-22 12:12:15 +02:00
a7e66e2246
Add bay node
2023-08-22 12:12:15 +02:00
f8fb5fa4ff
Monitor power from other nodes via LAN
2023-08-22 11:28:54 +02:00
acf9b71f04
Increase prometheus retention time to one year
2023-08-22 11:28:54 +02:00
bf692e6e4e
Don't set all_proxy
2023-08-22 11:28:54 +02:00
55d6c17776
Allow access to devices for node_exporter
2023-07-28 13:55:35 +02:00
14b173f67e
GRUB version no longer needed
2023-07-27 17:22:20 +02:00
f892d43b47
Kill slurmd remaining processes on upgrade
2023-07-27 14:49:20 +02:00
79adbe76a8
koro: Add vlopez user
2023-07-21 13:00:43 +02:00
66fb848ba8
Add koro node
2023-07-21 13:00:08 +02:00
40b1a8f0df
eudy: Add fcsv3 and intermediate versions for testing
2023-07-21 11:27:51 +02:00
a0b9d10b14
eudy: Enable memory overcommit
2023-07-21 11:27:51 +02:00
4c309dea2f
eudy: disable all cpu mitigations
2023-07-21 11:27:51 +02:00
7c1fe1455b
Enable NTP using the BSC time server
2023-06-30 14:02:15 +02:00
2d4b178895
Add the ssfhead node as gateway
2023-06-30 14:01:35 +02:00
4dd25f2f89
Use our host names first by default
2023-06-23 16:22:18 +02:00
6dcd9d8144
Add DNS tools to resolve hosts
2023-06-23 16:15:45 +02:00
31be81d2b1
Lower perf_event_paranoid to -1
2023-06-23 16:01:27 +02:00
826cfdf43f
Set perf paranoid to 0 by default
2023-06-21 16:24:19 +02:00
a1f258c5ce
Add perf to packages
2023-06-21 15:41:06 +02:00
1c1d3f3231
Allow srun to specify the cpu binding
...
The task/affinity plugin needs to be selected.
2023-06-21 13:16:23 +02:00
623d46c03f
Move authorized keys to users.nix
2023-06-20 14:08:34 +02:00
518a4d6af3
Add rpenacob user
2023-06-20 12:54:26 +02:00
60077948d6
Add osumb to the system packages
2023-06-16 19:22:41 +02:00
1724535495
Use explicit order in overlays
2023-06-16 18:26:51 +02:00
ab04855382
Add mpich overlay
2023-06-16 18:26:51 +02:00
684d5e41c5
Add coments in slurm config
2023-06-16 18:26:50 +02:00
316ea18e24
Add eudy host key to known hosts
2023-06-16 17:29:48 +02:00
c916157fcc
Rename xeon08 to eudy
...
From Eudyptula, a little penguin.
2023-06-16 17:16:05 +02:00