Update NixOS and other changes #72

rarias · 2024-07-22T12:19:02+02:00

rarias commented

2024-07-22 12:19:02 +02:00

No description provided.

rarias added 17 commits 2024-07-22 12:19:03 +02:00

Allow incoming traffic to hut proxy 43e61a8da3

Set the default proxy to point to hut f5ebf43019

Access private repositories via hut SSH proxy 53c200fbc5

Grant rpenacob access to owl1 and owl2 nodes 3ea7edf950

Add abonerib user to hut, raccon, owl1 and owl2 9fe29b864a

Allow ptrace to any process of the same user e3985b28a0

Allows users to attach GDB to their own processes, without requiring
running the program with GDB from the start.

flake.lock: Update dba11ea88a

Flake lock file updates:

• Updated input 'agenix':
    'github:ryantm/agenix/1381a759b205dff7a6818733118d02253340fd5e' (2024-04-02)
  → 'github:ryantm/agenix/de96bd907d5fbc3b14fc33ad37d1b9a3cb15edc6' (2024-07-09)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/6143fc5eeb9c4f00163267708e26191d1e918932' (2024-04-21)
  → 'github:NixOS/nixpkgs/693bc46d169f5af9c992095736e82c3488bf7dbb' (2024-07-14)

Use authentication tokens for PM GitLab runner 32c919d1fc

Starting with GitLab 16, there is a new mechanism to authenticate the
runners via authentication tokens, so use it instead.  Older tokens and
runners are also removed, as they are no longer used.

With the new way of managing tokens, both the tags and the locked state
are managed from the GitLab web page.

See: https://docs.gitlab.com/ee/ci/runners/new_creation_workflow.html

Allow other jobs to run in unused cores 29110d2d54

The current select mechanism was using the memory too as a consumable
resource, which by default only sets 1 MiB per node. As each job already
requests 1 MiB, it prevents other jobs from running.

As we are not really concerned with memory usage, we only use the unused
cores in the select criteria.

Set default SLURM job time limit to one hour 5e6cf2b563

Prevents enless jobs from being left forever, while allow users to
request a larger time limit.

Set gitea and grafana log level to warn accb656c5e

Prevents filling the journal logs with information messages.

Enable debuginfod daemon in owl nodes 86f5bea6c7

WARNING: This will introduce noise, as the daemon wakes up from time to
time to check for new packages.

Program shutdown for August 2nd for all machines d3489f8e48

Emulate other architectures in owl nodes too f3167c0cc0

Allows cross-compilation of packages for RISC-V that are known to try to
run RISC-V programs in the host.

Don't mount the nix store in owl nodes 7b58d8fbcc

Initially we planned to run jobs in those nodes by sharing the same nix
store from hut. However, these nodes are now used to build packages
which are not available in hut. Users also ssh to the nodes, which
doesn't mount the hut store, so it doesn't make much sense to keep
mounting it.

Add 10 min shutdown jitter to avoid spikes 4a52970821

The shutdown timer will fire at slightly different times for the
different nodes, so we slowly decrease the power consumption.

Add documentation section about GRUB chain loading f9970c0ac7

rarias added 2 commits 2024-07-22 14:07:48 +02:00

Remove setLdLibraryPath and driSupport options dfc44d2be6

They have been removed from NixOS. The "hardware.opengl" group is now
renamed to "hardware.graphics".

See: 98cef4c273

Set the serial console to ttyS1 in raccoon 0a8db8bda6

Apparently the ttyS0 console doesn't exist but ttyS1 does:

  raccoon% sudo stty -F /dev/ttyS0
  stty: /dev/ttyS0: Input/output error
  raccoon% sudo stty -F /dev/ttyS1
  speed 9600 baud; line = 0;
  -brkint -imaxbel

The dmesg line agrees:

  00:03: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A

The console configuration is then moved from base to xeon to allow
changing it for the raccoon machine.

abonerib reviewed 2024-09-10 16:44:52 +02:00

abonerib left a comment

LGTM, I left a couple of comments, but it's mostly nitpicking. I didn't go through all the docker gitlab details, I assumed the configuration has been tested.

m/common/base/august-shutdown.nix Outdated

						
				@ -0,0 +8,4 @@

				    timerConfig = {

				      OnCalendar = "*-08-02 11:00:00";

				      RandomizedDelaySec = "10min";

				      Unit = "systemd-poweroff.service";

abonerib commented

2024-09-10 16:36:23 +02:00

It would be nice to have broadcast a wall message some time before shutdown

It would be nice to have broadcast a `wall` message some time before shutdown

rarias commented

2024-09-12 07:57:19 +02:00

I usually send some emails on the mailing list prior to that day. I think it would be good to send via wall too, feel free to send a patch or PR :-)

rarias marked this conversation as resolved

m/common/base/boot.nix Outdated

						
				@ -22,0 +16,4 @@

				    # Allow ptracing (i.e. attach with GDB) any process of the same user, see:

				    # https://www.kernel.org/doc/Documentation/security/Yama.txt

				    "kernel.yama.ptrace_scope" = "0";

abonerib commented

2024-09-10 16:43:10 +02:00

Perhaps it would be wiser to only do this on the machines where it's needed, since it could be a security concern?

rarias commented

2024-09-12 08:02:56 +02:00

I think this would be needed in most machines, but it can be disabled in the storage nodes.

rarias marked this conversation as resolved

m/module/slurm-client.nix Outdated

						
				@ -86,0 +90,4 @@

				      # Ignore memory constraints and only use unused cores to share a node with

				      # other jobs.

				      SelectTypeParameters=CR_CORE