736eacaac5
Enable proxy for Grafana too
...
The alerts need to contact the slack endpoint, so we add the proxy
environment variables to the grafana systemd service.
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-25 16:55:56 +02:00
0e66aad099
Make blackbox exporter use the proxy
...
By default it was trying to reach the targets using the default gateway,
but since the electrical cut of 2023-10-20, the login node has not
enabled forwarding again. So better if we don't rely on it.
Reviewed-By: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2023-10-25 16:55:24 +02:00
67a4905a0a
Don't log SLURM connection attempts from ssfhead
2023-10-06 15:22:04 +02:00
d52d22e0db
Add docker runner too
2023-10-06 15:17:07 +02:00
42920c2521
Monitor gitlab.bsc.es too
2023-10-06 15:17:07 +02:00
4acd35e036
Monitor PM webpage via blackbox
2023-10-06 15:17:07 +02:00
621d20db3a
Temporarily disable pm runners
2023-10-06 15:17:07 +02:00
0926f6ec1f
Add runner for gitlab.bsc.es
2023-10-06 15:17:07 +02:00
61646cb3bd
Allow anonymous access to grafana
2023-09-22 10:51:30 +02:00
c0066c4744
Remove user/group when using DynamicUsers
2023-09-22 10:13:06 +02:00
ffd0593f51
Set the SLURM_CONF variable
2023-09-21 22:22:00 +02:00
f49ae0773e
Enable slurm-exporter service
2023-09-21 21:40:02 +02:00
8de3d2b149
Mount the hut nix store for SLURM jobs
2023-09-20 19:38:43 +02:00
bc62e28ca3
Enable direnv integration
2023-09-20 09:32:58 +02:00
653d411b9e
Remove bscpkgs from the registry and nixPath
...
This is done to prevent accidental evaluations where the nixpkgs input
of bscpkgs is still pointing to a different version that the one
specified in the jungle flake. Instead use jungle#bscpkgs.X to get a
package from bscpkgs.
2023-09-15 12:00:33 +02:00
a1e8cfea47
Don't fetch registry flakes from the net
2023-09-15 12:00:28 +02:00
e88805947e
Open ports in firewall of compute nodes
2023-09-14 15:45:43 +02:00
d9d249411d
Monitor storage nodes via IPMI too
2023-09-13 15:57:13 +02:00
10ca572aec
Enable fstrim service
2023-09-12 16:39:45 +02:00
75b0f48715
Serve the nix store from hut
2023-09-12 12:19:43 +02:00
19a451db77
Add encrypted munge key with agenix
2023-09-08 19:05:45 +02:00
ec9be9bb62
Remove unused large port hole in firewall
2023-09-08 18:22:48 +02:00
7ddd1977f3
Make exporters listen in localhost only
2023-09-08 18:13:04 +02:00
7050c505b5
Allow only some ports for srun
2023-09-08 17:51:37 +02:00
033a1fe97b
Block ssfhead from reaching our slurm daemon
2023-09-08 17:36:28 +02:00
77cb3c494e
Poweroff idle slurm nodes after 1 hour
2023-09-08 16:49:53 +02:00
6db5772ac4
Add IB and IPMI node host names
2023-09-08 13:21:37 +02:00
dca274d020
Unlock ovni gitlab runners
2023-09-05 16:59:45 +02:00
02f40a8217
Add agenix to all nodes
2023-09-04 22:10:43 +02:00
77d43b6da9
Add agenix module to ceph
2023-09-04 22:07:07 +02:00
ab55aac5ff
Remove old secrets
2023-09-04 22:04:32 +02:00
9b5bfbb7a3
Mount /ceph in owl1 and owl2
2023-09-04 22:00:36 +02:00
a69a71d1b0
Warn about the owl2 omnipath device
2023-09-04 22:00:17 +02:00
98374bd303
Clean owl2 configuration
2023-09-04 21:59:56 +02:00
3b6be8a2fc
Move the ceph client config to an external module
2023-09-04 21:59:04 +02:00
2bb366b9ac
Reorganize secrets and ssh keys
...
The agenix tools needs to read the secrets from a standalone file, but
we also need the same information for the SSH keys.
2023-09-04 21:36:31 +02:00
2d16709648
Add anavarro user
2023-09-04 16:00:01 +02:00
9344daa31c
Set zsh inc_append_history option
2023-09-03 16:57:53 +02:00
80c98041b5
Set zsh shell for rarias
2023-09-03 16:46:27 +02:00
3418e57907
Enable zsh and fix key bindings
2023-09-03 16:42:04 +02:00
6848b58e39
Keep a log over time with the config commits
2023-09-03 00:02:14 +02:00
f9c77b433a
Store nixos config in /etc/nixos/config.rev
2023-09-02 23:37:11 +02:00
9d487845f6
Enable binary emulation for other architectures
2023-08-31 17:27:08 +02:00
3c99c2a662
Enable watchdog
2023-08-30 16:32:17 +02:00
7d09108c9f
Enable all osd on boot in lake2
2023-08-30 16:32:17 +02:00
0f0a861896
Scrape lake2 too
2023-08-29 12:33:26 +02:00
beb0d5940e
Also enable monitoring in lake2
2023-08-29 12:29:41 +02:00
70321ce237
Scrape metrics from bay
2023-08-29 11:58:00 +02:00
5bd1d67333
Add monitoring in the bay node
2023-08-29 11:53:32 +02:00
fad9df61e1
Add fio tool
2023-08-29 11:27:50 +02:00
d2a80c8c18
Add ceph tools in hut too
2023-08-28 17:58:21 +02:00
599613d139
Switch ceph logs to journal
2023-08-28 17:58:08 +02:00
cb3a7b19f7
Move pkgs overlay to overlay.nix
2023-08-25 18:12:00 +02:00
f5d6bf627b
Enable ceph osd daemons in lake2
2023-08-25 14:54:51 +02:00
f1ce815edd
Add the lake2 hostname to the hosts
2023-08-25 14:44:35 +02:00
a2075cfd65
Use the sda for lake2
2023-08-25 13:40:10 +02:00
8f1f6f92a8
Remove netboot module
2023-08-25 13:39:01 +02:00
3416416864
Disable pixiecore in hut for now
2023-08-25 13:21:00 +02:00
815888fb07
Add PXE helper
2023-08-25 12:05:33 +02:00
029d9cb1db
Enable netboot again for PXE
2023-08-24 19:08:23 +02:00
95fa67ede1
Specify the disk by path
2023-08-24 15:27:37 +02:00
a19347161f
Prepare lake2 config after bootstrap
...
The disk ID is different under NixOS.
2023-08-24 13:54:53 +02:00
58c1cc1f7c
Add lake2 bootstrap config
2023-08-24 12:30:46 +02:00
077eece6b9
Add agenix to PATH in hut
2023-08-23 17:42:50 +02:00
b3ef53de51
Store ceph secret key in age
...
This allows a node to mount the ceph FS without any extra ceph
configuration in /etc/ceph.
2023-08-23 17:26:44 +02:00
e0852ee89b
Add rarias key for secrets
2023-08-23 17:15:26 +02:00
dfffc0bdce
Add ceph metrics to prometheus
2023-08-22 16:33:55 +02:00
8257c245b1
Mount the ceph filesystem in hut
2023-08-22 16:15:46 +02:00
cd5853cf53
Add ceph config in bay
2023-08-22 15:58:48 +02:00
b677b827d4
Add the bay host name
2023-08-22 15:56:09 +02:00
b1d5185cca
Remove netboot and fixes
2023-08-22 12:12:15 +02:00
a7e66e2246
Add bay node
2023-08-22 12:12:15 +02:00
f8fb5fa4ff
Monitor power from other nodes via LAN
2023-08-22 11:28:54 +02:00
acf9b71f04
Increase prometheus retention time to one year
2023-08-22 11:28:54 +02:00
bf692e6e4e
Don't set all_proxy
2023-08-22 11:28:54 +02:00
55d6c17776
Allow access to devices for node_exporter
2023-07-28 13:55:35 +02:00
14b173f67e
GRUB version no longer needed
2023-07-27 17:22:20 +02:00
f892d43b47
Kill slurmd remaining processes on upgrade
2023-07-27 14:49:20 +02:00
79adbe76a8
koro: Add vlopez user
2023-07-21 13:00:43 +02:00
66fb848ba8
Add koro node
2023-07-21 13:00:08 +02:00
40b1a8f0df
eudy: Add fcsv3 and intermediate versions for testing
2023-07-21 11:27:51 +02:00
a0b9d10b14
eudy: Enable memory overcommit
2023-07-21 11:27:51 +02:00
4c309dea2f
eudy: disable all cpu mitigations
2023-07-21 11:27:51 +02:00
7c1fe1455b
Enable NTP using the BSC time server
2023-06-30 14:02:15 +02:00
2d4b178895
Add the ssfhead node as gateway
2023-06-30 14:01:35 +02:00
4dd25f2f89
Use our host names first by default
2023-06-23 16:22:18 +02:00
6dcd9d8144
Add DNS tools to resolve hosts
2023-06-23 16:15:45 +02:00
31be81d2b1
Lower perf_event_paranoid to -1
2023-06-23 16:01:27 +02:00
826cfdf43f
Set perf paranoid to 0 by default
2023-06-21 16:24:19 +02:00
a1f258c5ce
Add perf to packages
2023-06-21 15:41:06 +02:00
1c1d3f3231
Allow srun to specify the cpu binding
...
The task/affinity plugin needs to be selected.
2023-06-21 13:16:23 +02:00
623d46c03f
Move authorized keys to users.nix
2023-06-20 14:08:34 +02:00
518a4d6af3
Add rpenacob user
2023-06-20 12:54:26 +02:00
60077948d6
Add osumb to the system packages
2023-06-16 19:22:41 +02:00
1724535495
Use explicit order in overlays
2023-06-16 18:26:51 +02:00
ab04855382
Add mpich overlay
2023-06-16 18:26:51 +02:00
684d5e41c5
Add coments in slurm config
2023-06-16 18:26:50 +02:00
316ea18e24
Add eudy host key to known hosts
2023-06-16 17:29:48 +02:00
c916157fcc
Rename xeon08 to eudy
...
From Eudyptula, a little penguin.
2023-06-16 17:16:05 +02:00
94320d9256
Add ssh host keys
2023-06-16 12:01:12 +02:00