Compare commits

..

19 Commits

Author SHA1 Message Date
4bd1648074 Set the serial console to ttyS1 in raccoon
Apparently the ttyS0 console doesn't exist but ttyS1 does:

  raccoon% sudo stty -F /dev/ttyS0
  stty: /dev/ttyS0: Input/output error
  raccoon% sudo stty -F /dev/ttyS1
  speed 9600 baud; line = 0;
  -brkint -imaxbel

The dmesg line agrees:

  00:03: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A

The console configuration is then moved from base to xeon to allow
changing it for the raccoon machine.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:56 +02:00
15b114ffd6 Remove setLdLibraryPath and driSupport options
They have been removed from NixOS. The "hardware.opengl" group is now
renamed to "hardware.graphics".

See: 98cef4c273
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:53 +02:00
dd6d8c9735 Add documentation section about GRUB chain loading
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:47 +02:00
e15a3867d4 Add 10 min shutdown jitter to avoid spikes
The shutdown timer will fire at slightly different times for the
different nodes, so we slowly decrease the power consumption.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:44 +02:00
5cad208de6 Don't mount the nix store in owl nodes
Initially we planned to run jobs in those nodes by sharing the same nix
store from hut. However, these nodes are now used to build packages
which are not available in hut. Users also ssh to the nodes, which
doesn't mount the hut store, so it doesn't make much sense to keep
mounting it.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:42 +02:00
c8687f7e45 Emulate other architectures in owl nodes too
Allows cross-compilation of packages for RISC-V that are known to try to
run RISC-V programs in the host.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:39 +02:00
d988ef2eff Program shutdown for August 2nd for all machines
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:36 +02:00
b07929eab3 Enable debuginfod daemon in owl nodes
WARNING: This will introduce noise, as the daemon wakes up from time to
time to check for new packages.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:30 +02:00
b3e397eb4c Set gitea and grafana log level to warn
Prevents filling the journal logs with information messages.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:27 +02:00
5ad2c683ed Set default SLURM job time limit to one hour
Prevents enless jobs from being left forever, while allow users to
request a larger time limit.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:24 +02:00
1f06f0fa0c Allow other jobs to run in unused cores
The current select mechanism was using the memory too as a consumable
resource, which by default only sets 1 MiB per node. As each job already
requests 1 MiB, it prevents other jobs from running.

As we are not really concerned with memory usage, we only use the unused
cores in the select criteria.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:22 +02:00
8ca1d84844 Use authentication tokens for PM GitLab runner
Starting with GitLab 16, there is a new mechanism to authenticate the
runners via authentication tokens, so use it instead.  Older tokens and
runners are also removed, as they are no longer used.

With the new way of managing tokens, both the tags and the locked state
are managed from the GitLab web page.

See: https://docs.gitlab.com/ee/ci/runners/new_creation_workflow.html
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:16 +02:00
998f599be3 flake.lock: Update
Flake lock file updates:

• Updated input 'agenix':
    'github:ryantm/agenix/1381a759b205dff7a6818733118d02253340fd5e' (2024-04-02)
  → 'github:ryantm/agenix/de96bd907d5fbc3b14fc33ad37d1b9a3cb15edc6' (2024-07-09)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/6143fc5eeb9c4f00163267708e26191d1e918932' (2024-04-21)
  → 'github:NixOS/nixpkgs/693bc46d169f5af9c992095736e82c3488bf7dbb' (2024-07-14)

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:13 +02:00
fcfc6ac149 Allow ptrace to any process of the same user
Allows users to attach GDB to their own processes, without requiring
running the program with GDB from the start. It is only available in
compute nodes, the storage nodes continue with the restricted settings.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:09 +02:00
6e87130166 Add abonerib user to hut, raccon, owl1 and owl2
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:07 +02:00
06f9e6ac6b Grant rpenacob access to owl1 and owl2 nodes
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:05 +02:00
da07aedce2 Access private repositories via hut SSH proxy
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:36:03 +02:00
61427a8bf9 Set the default proxy to point to hut
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:35:56 +02:00
958ad1f025 Allow incoming traffic to hut proxy
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2024-09-12 08:35:23 +02:00
27 changed files with 125 additions and 73 deletions

View File

@ -151,12 +151,26 @@ And update grub.
# nix build .#nixosConfigurations.xeon02.config.system.build.kexecTree -v
```
## Chain NixOS in same disk
## Chain NixOS in same disk with other systems
To install NixOS on a partition along another system which controls the GRUB,
first disable the grub device, so the GRUB is not installed in the disk by
NixOS (only the /boot files will be generated):
```
boot.loader.grub.device = "nodev";
```
Then add the following entry to the old GRUB configuration:
```
menuentry 'NixOS' {
insmod chain
set root=(hd3,1)
search --no-floppy --label nixos --set root
configfile /boot/grub/grub.cfg
}
```
The partition with NixOS must have the label "nixos" for it to be found. New
system configuration entries will be stored in the GRUB configuration managed
by NixOS, so there is no need to change the old GRUB settings.

View File

@ -10,11 +10,11 @@
"systems": "systems"
},
"locked": {
"lastModified": 1712079060,
"narHash": "sha256-/JdiT9t+zzjChc5qQiF+jhrVhRt8figYH29rZO7pFe4=",
"lastModified": 1720546205,
"narHash": "sha256-boCXsjYVxDviyzoEyAk624600f3ZBo/DKtUdvMTpbGY=",
"owner": "ryantm",
"repo": "agenix",
"rev": "1381a759b205dff7a6818733118d02253340fd5e",
"rev": "de96bd907d5fbc3b14fc33ad37d1b9a3cb15edc6",
"type": "github"
},
"original": {
@ -88,11 +88,11 @@
},
"nixpkgs": {
"locked": {
"lastModified": 1713714899,
"narHash": "sha256-+z/XjO3QJs5rLE5UOf015gdVauVRQd2vZtsFkaXBq2Y=",
"lastModified": 1720957393,
"narHash": "sha256-oedh2RwpjEa+TNxhg5Je9Ch6d3W1NKi7DbRO1ziHemA=",
"owner": "NixOS",
"repo": "nixpkgs",
"rev": "6143fc5eeb9c4f00163267708e26191d1e918932",
"rev": "693bc46d169f5af9c992095736e82c3488bf7dbb",
"type": "github"
},
"original": {

View File

@ -9,6 +9,10 @@
# Select the this using the ID to avoid mismatches
boot.loader.grub.device = "/dev/disk/by-id/wwn-0x55cd2e414d53562d";
boot.kernel.sysctl = {
"kernel.yama.ptrace_scope" = lib.mkForce "1";
};
environment.systemPackages = with pkgs; [
ceph
];

View File

@ -3,6 +3,7 @@
# Includes the basic configuration for an Intel server.
imports = [
./base/agenix.nix
./base/august-shutdown.nix
./base/boot.nix
./base/env.nix
./base/fs.nix

View File

@ -0,0 +1,14 @@
{
# Shutdown all machines on August 2nd at 11:00 AM, so we can protect the
# hardware from spurious electrical peaks on the yearly electrical cut for
# manteinance that starts on August 4th.
systemd.timers.august-shutdown = {
description = "Shutdown on August 2nd for maintenance";
wantedBy = [ "timers.target" ];
timerConfig = {
OnCalendar = "*-08-02 11:00:00";
RandomizedDelaySec = "10min";
Unit = "systemd-poweroff.service";
};
};
}

View File

@ -11,14 +11,12 @@
terminal_output --append serial
'';
# Enable serial console
boot.kernelParams = [
"console=tty1"
"console=ttyS0,115200"
];
boot.kernel.sysctl = {
"kernel.perf_event_paranoid" = lib.mkDefault "-1";
# Allow ptracing (i.e. attach with GDB) any process of the same user, see:
# https://www.kernel.org/doc/Documentation/security/Yama.txt
"kernel.yama.ptrace_scope" = "0";
};
boot.kernelPackages = pkgs.linuxPackages_latest;

View File

@ -12,7 +12,7 @@ in
programs.ssh.extraConfig = ''
Host bscpm02.bsc.es bscpm03.bsc.es gitlab-internal.bsc.es alya.gitlab.bsc.es
User git
ProxyCommand nc -X connect -x localhost:23080 %h %p
ProxyCommand nc -X connect -x hut:23080 %h %p
'';
programs.ssh.knownHosts = hostsKeys // {

View File

@ -55,7 +55,7 @@
home = "/home/Computational/rpenacob";
description = "Raúl Peñacoba";
group = "Computational";
hosts = [ "hut" ];
hosts = [ "owl1" "owl2" "hut" ];
hashedPassword = "$6$TZm3bDIFyPrMhj1E$uEDXoYYd1z2Wd5mMPfh3DZAjP7ztVjJ4ezIcn82C0ImqafPA.AnTmcVftHEzLB3tbe2O4SxDyPSDEQgJ4GOtj/";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFYfXg37mauGeurqsLpedgA2XQ9d4Nm0ZGo/hI1f7wwH rpenacob@bsc"
@ -75,6 +75,19 @@
];
};
abonerib = {
uid = 4541;
isNormalUser = true;
home = "/home/Computational/abonerib";
description = "Aleix Boné";
group = "Computational";
hosts = [ "owl1" "owl2" "hut" "raccoon" ];
hashedPassword = "$6$V1EQWJr474whv7XJ$OfJ0wueM2l.dgiJiiah0Tip9ITcJ7S7qDvtSycsiQ43QBFyP4lU0e0HaXWps85nqB4TypttYR4hNLoz3bz662/";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIIIFiqXqt88VuUfyANkZyLJNiuroIITaGlOOTMhVDKjf abonerib@bsc"
];
};
vlopez = {
uid = 4334;
isNormalUser = true;

View File

@ -3,7 +3,7 @@
imports = [
./base.nix
./xeon/fs.nix
./xeon/getty.nix
./xeon/console.nix
./xeon/net.nix
];
}

View File

@ -5,4 +5,10 @@
wantedBy = [ "getty.target" ];
serviceConfig.Restart = "always";
};
# Enable serial console
boot.kernelParams = [
"console=tty1"
"console=ttyS0,115200"
];
}

View File

@ -10,7 +10,7 @@
nameservers = ["8.8.8.8"];
proxy = {
default = "http://localhost:23080/";
default = "http://hut:23080/";
noProxy = "127.0.0.1,localhost,internal.domain,10.0.40.40";
# Don't set all_proxy as go complains and breaks the gitlab runner, see:
# https://github.com/golang/go/issues/16715

View File

@ -6,6 +6,7 @@
../module/ceph.nix
../module/debuginfod.nix
../module/emulation.nix
../module/slurm-client.nix
./gitlab-runner.nix
./monitoring.nix
@ -19,8 +20,6 @@
#./pxe.nix
];
boot.binfmt.emulatedSystems = [ "armv7l-linux" "aarch64-linux" "powerpc64le-linux" "riscv64-linux" ];
# Select the this using the ID to avoid mismatches
boot.loader.grub.device = "/dev/disk/by-id/ata-INTEL_SSDSC2BB240G7_PHDV6462004Y240AGN";
@ -34,5 +33,15 @@
address = "10.0.42.7";
prefixLength = 24;
} ];
firewall = {
extraCommands = ''
# Accept all proxy traffic from compute nodes but not the login
iptables -A nixos-fw -p tcp -s 10.0.40.30 --dport 23080 -j nixos-fw-log-refuse
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 23080 -j nixos-fw-accept
'';
};
};
# Allow proxy to bind to the ethernet interface
services.openssh.settings.GatewayPorts = "clientspecified";
}

View File

@ -17,6 +17,7 @@
REGISTER_MANUAL_CONFIRM = true;
ENABLE_NOTIFY_MAIL = true;
};
log.LEVEL = "Warn";
mailer = {
ENABLED = true;

View File

@ -1,9 +1,8 @@
{ pkgs, lib, config, ... }:
{
age.secrets.ovniToken.file = ../../secrets/ovni-token.age;
age.secrets.gitlabToken.file = ../../secrets/gitlab-bsc-es-token.age;
age.secrets.nosvToken.file = ../../secrets/nosv-token.age;
age.secrets.gitlabRunnerShellToken.file = ../../secrets/gitlab-runner-shell-token.age;
age.secrets.gitlabRunnerDockerToken.file = ../../secrets/gitlab-runner-docker-token.age;
services.gitlab-runner = {
enable = true;
@ -11,20 +10,14 @@
services = let
common-shell = {
executor = "shell";
tagList = [ "nix" "xeon" ];
registrationFlags = [
# Using space doesn't work, and causes it to misread the next flag
"--locked='false'"
];
environmentVariables = {
SHELL = "${pkgs.bash}/bin/bash";
};
};
common-docker = {
executor = "docker";
dockerImage = "debian:stable";
tagList = [ "docker" "xeon" ];
registrationFlags = [
"--locked='false'"
"--docker-network-mode host"
];
environmentVariables = {
@ -33,19 +26,12 @@
};
};
in {
# For gitlab.bsc.es
gitlab-bsc-es-shell = common-shell // {
registrationConfigFile = config.age.secrets.gitlabToken.path;
};
gitlab-bsc-es-docker = common-docker // {
registrationConfigFile = config.age.secrets.gitlabToken.path;
};
# For pm.bsc.es/gitlab
gitlab-pm-shell = common-shell // {
registrationConfigFile = config.age.secrets.ovniToken.path;
authenticationTokenConfigFile = config.age.secrets.gitlabRunnerShellToken.path;
};
gitlab-pm-docker = common-docker // {
registrationConfigFile = config.age.secrets.ovniToken.path;
authenticationTokenConfigFile = config.age.secrets.gitlabRunnerDockerToken.path;
};
};
};

View File

@ -31,6 +31,7 @@
};
feature_toggles.publicDashboards = true;
"auth.anonymous".enabled = true;
log.level = "warn";
};
};

View File

@ -8,6 +8,10 @@
boot.loader.grub.device = "/dev/disk/by-id/wwn-0x55cd2e414d53563a";
boot.kernel.sysctl = {
"kernel.yama.ptrace_scope" = lib.mkForce "1";
};
environment.systemPackages = with pkgs; [
ceph
];

3
m/module/emulation.nix Normal file
View File

@ -0,0 +1,3 @@
{
boot.binfmt.emulatedSystems = [ "armv7l-linux" "aarch64-linux" "powerpc64le-linux" "riscv64-linux" ];
}

View File

@ -47,8 +47,8 @@ in {
];
partitionName = [
"owl Nodes=owl[1-2] Default=YES MaxTime=INFINITE State=UP"
"all Nodes=owl[1-2],hut Default=NO MaxTime=INFINITE State=UP"
"owl Nodes=owl[1-2] Default=YES DefaultTime=01:00:00 MaxTime=INFINITE State=UP"
"all Nodes=owl[1-2],hut Default=NO DefaultTime=01:00:00 MaxTime=INFINITE State=UP"
];
# See slurm.conf(5) for more details about these options.
@ -83,6 +83,14 @@ in {
# Reduce port range so we can allow only this range in the firewall
SrunPortRange=60000-61000
# Use cores as consumable resources. In SLURM terms, a core may have
# multiple hardware threads (or CPUs).
SelectType=select/cons_tres
# Ignore memory constraints and only use unused cores to share a node with
# other jobs.
SelectTypeParameters=CR_Core
'';
};

View File

@ -4,9 +4,10 @@
imports = [
../common/xeon.nix
../module/ceph.nix
../module/emulation.nix
../module/slurm-client.nix
../module/slurm-firewall.nix
../module/slurm-hut-nix-store.nix
../module/debuginfod.nix
];
# Select the this using the ID to avoid mismatches

View File

@ -4,9 +4,10 @@
imports = [
../common/xeon.nix
../module/ceph.nix
../module/emulation.nix
../module/slurm-client.nix
../module/slurm-firewall.nix
../module/slurm-hut-nix-store.nix
../module/debuginfod.nix
];
# Select the this using the ID to avoid mismatches

View File

@ -8,6 +8,12 @@
# Don't install Grub on the disk yet
boot.loader.grub.device = "nodev";
# Enable serial console
boot.kernelParams = [
"console=tty1"
"console=ttyS1,115200"
];
networking = {
hostName = "raccoon";
# Only BSC DNSs seem to be reachable from the office VLAN
@ -21,11 +27,7 @@
# Configure Nvidia driver to use with CUDA
hardware.nvidia.package = config.boot.kernelPackages.nvidiaPackages.production;
hardware.opengl = {
enable = true;
driSupport = true;
setLdLibraryPath = true;
};
hardware.graphics.enable = true;
nixpkgs.config.allowUnfree = true;
nixpkgs.config.nvidia.acceptLicense = true;
services.xserver.videoDrivers = [ "nvidia" ];

View File

@ -1,11 +0,0 @@
age-encryption.org/v1
-> ssh-ed25519 HY2yRg caTbx0NBmsTSmZH4HtBaxhsauWqWUDTesJqT08UsoEQ
8ND31xuco+H8d5SKg8xsCFRPVDhU4d8UKwV1BnmKVjQ
-> ssh-ed25519 CAWG4Q 4ETYuhCwHHECkut4DWDknMMgpAvFqtzLWVC2Wi2L8FM
BGMvRnAfd8qZG5hzLefmk32FkGvwzE9pqBUyx4JY0co
-> ssh-ed25519 MSF3dg hj5QL4ZfylN8/W/MXQHvVqtI7mRvlQOYr8HsaQEmPB0
kvB7sljmmkswSGZDQnrwdTbTsN78EAwH3pz1pPe0Hu0
-> )Q-grease vHF} [8p1> @7z;C"/
tgSUKFyyrf2jLXZp+pakigwB2fRO/WFj2Qnt1aPjtVPEK92JbJ4
--- xzM0AhV4gTQE0Q7inJNo9vFj+crJQxWeI7u9pl7bqAI
á6nGJÖ0Bˆ7F° bßÙ½2®L³äÇ]²2zl<7A>À&e†KÄx®àé9SWNàV"MfŽ€ëÙKHUC:1b;9St‰ëõ±DuѧçÏ¢žÌŸ¡<02>èÐéîÀ<C380>ÔfÕ7¨î1§I(õdÓþô‡ï ó

View File

@ -0,0 +1,9 @@
age-encryption.org/v1
-> ssh-ed25519 HY2yRg WvKK6U1wQtx2pbUDfuaUIXTQiCulDkz7hgUCSwMfMzQ
jLktUMqKuVxukqzz++pHOKvmucUQqeKYy5IwBma7KxY
-> ssh-ed25519 CAWG4Q XKGuNNoYFl9bdZzsqYYTY7GsEt5sypLW4R+1uk78NmU
8dIA2GzRAwTGM5CDHSM2BUBsbXzEAUssWUz2PY2PaTg
-> ssh-ed25519 MSF3dg T630RsKuZIF/bp+KITnIIWWHsg6M/VQGqbWQZxqT+AA
SraZcgZJVtmUzHF/XR9J7aK5t5EDNpkC/av/WJUT/G8
--- /12G8pj9sbs591OM/ryhoLnSWWmzYcoqprk9uN/3g18
ä·ù¼Â‡%å]yi"ô<>»LÓ âùH`ªa$Æþ)¦9ve<76>.0úmÉK<EFBFBD>vƒÀ ïu"|1cÞ-%ÔÕ"åWFï¡ÞA«<41>hº$•ºj<eñ¶xÅLx«ç.?œÈâ:L…¬ƒ,ëu»|³F|Õi²äÔ

Binary file not shown.

View File

@ -1,11 +0,0 @@
age-encryption.org/v1
-> ssh-ed25519 HY2yRg hrdS7Dl/j+u3XVfM79ZJpZSlre9TcD7DTQ+EEAT6kEE
avUO96P1h7w2BYWgrQ7GpUgdaCV9AZL7eOTTcF9gfro
-> ssh-ed25519 CAWG4Q A5raRY1CAgFYZgoQ92GMyNejYNdHx/7Y6uTS+EjLPWA
FRFqT2Jz7qRcybaxkQTKHGl797LVXoHpYG4RZSrX/70
-> ssh-ed25519 MSF3dg D+R80Bg7W9AuiOMAqtGFZQl994dRBIegYRLmmTaeZ3o
BHvZsugRiuZ91b4jk91h30o3eF3hadSnVCwxXge95T8
-> BT/El`a-grease W{nq|Vm )bld 2Nl}4 N$#JGB4t
oLG+0S1aGfO/ohCfgGmhDhwwLi4H
--- 2I5C+FvBG/K1ZHh7C5QD39feTSLoFGwcTeZAmeILNsI
¹õW©ÙÄd;ËÐC¾.¹¡_(“u G¡€‰#ìvâœgÉ<67>†õõy¹Y‰žl9ŒÈ¡Ïµ.Œé0x<30>Þ½úN. <>tB×b‡ü¼K¼ì:Q×—È\¹ÀÍT_´»Átxïm——_JñÞž-š

Binary file not shown.

View File

@ -6,10 +6,9 @@ let
safe = keys.hostGroup.safe ++ adminsKeys;
in
{
"gitlab-bsc-es-token.age".publicKeys = hut;
"gitea-runner-token.age".publicKeys = hut;
"ovni-token.age".publicKeys = hut;
"nosv-token.age".publicKeys = hut;
"gitlab-runner-docker-token.age".publicKeys = hut;
"gitlab-runner-shell-token.age".publicKeys = hut;
"nix-serve.age".publicKeys = hut;
"jungle-robot-password.age".publicKeys = hut;