jungle

Author	SHA1	Message	Date
Rodrigo Arias Mallo	5c30975b8b	Mount NFS home in tent at /nfs/home Reviewed-by: Aleix Boné <abonerib@bsc.es>	2026-03-09 15:27:41 +01:00
Rodrigo Arias Mallo	d4c00679ee	Increase NFS subnet to allow tent Reviewed-by: Aleix Boné <abonerib@bsc.es>	2026-03-09 15:27:41 +01:00
Rodrigo Arias Mallo	32a576e870	Copy Gitea backup in /ceph too Reviewed-by: Aleix Boné <abonerib@bsc.es>	2026-03-09 15:27:37 +01:00
Rodrigo Arias Mallo	8197221146	Mount /ceph in tent Reviewed-by: Aleix Boné <abonerib@bsc.es>	2026-03-09 08:52:11 +01:00
Rodrigo Arias Mallo	374cd4ce48	Allow tent to reach ceph Reviewed-by: Aleix Boné <abonerib@bsc.es>	2026-03-09 08:52:08 +01:00
Rodrigo Arias Mallo	46b7efb5ac	Rename Gitea backup service and directory Reviewed-by: Aleix Boné <abonerib@bsc.es>	2026-03-09 08:51:48 +01:00
Aleix Boné	56ab099017	Override files in rotating gitea dump service Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>	2026-03-09 08:51:44 +01:00
Aleix Boné	2654b9fdd9	Enable rotating gitea backups Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>	2026-03-09 08:51:23 +01:00
Rodrigo Arias Mallo	84a5cb09ee	Use host mode for docker network In order to reduce the traffic of the secondary Ethernet device we need to be able to directly use the physical device instead of the virtual one. For now use the host mode and see later if we can revert it. Reviewed-by: Aleix Boné <abonerib@bsc.es>	2026-03-05 15:29:23 +01:00
Aleix Boné	4899d20748	Fix weasel infiniband interface name Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>	2026-02-26 10:26:01 +01:00
Rodrigo Arias Mallo	76cd6d64b2	Add ssanzmar user to apex and fox Reviewed-by: Aleix Boné <abonerib@bsc.es>	2026-02-24 14:06:12 +01:00
Rodrigo Arias Mallo	8dab0d82ba	Update fox documentation in website Reviewed-by: Aleix Boné <abonerib@bsc.es>	2026-02-04 15:08:13 +01:00
Rodrigo Arias Mallo	958dcd4774	Add emonteir user to apex and fox Reviewed-by: Aleix Boné <abonerib@bsc.es>	2026-02-04 15:08:08 +01:00
Aleix Boné	7a6e4232de	Add nom and nixfmt-tree to system packages Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>	2026-02-03 15:17:30 +01:00
Aleix Boné	3b56e905e5	Add standalone home-manager to system packages Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>	2026-02-03 15:17:29 +01:00
Aleix Boné	2d41309466	Format and sort default package list Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>	2026-02-03 15:17:24 +01:00
Rodrigo Arias Mallo	deb0cd1488	Allow USB access to TC1 from Gitlab Runner Reviewed-by: Aleix Boné <abonerib@bsc.es>	2026-01-23 17:56:16 +01:00
Rodrigo Arias Mallo	cd1f502ecc	Allow user USB access to FTDI device in tent Reviewed-by: Aleix Boné <abonerib@bsc.es>	2026-01-23 17:56:11 +01:00
Rodrigo Arias Mallo	dda6a66782	Fix gitea user to allow sending email In order to send email, the gitea user needs to be in the mail-robot group. Fixes: rarias/jungle#220 Reviewed-by: Aleix Boné <abonerib@bsc.es>	2026-01-20 12:18:52 +01:00
Rodrigo Arias Mallo	22420e6ac8	Remove unneeded perf package from eudy It is already included in the base list of packages, which is now only "perf" and doesn't depend on the kernel version. Reviewed-by: Aleix Boné <abonerib@bsc.es>	2026-01-20 12:18:49 +01:00
Rodrigo Arias Mallo	a71cd78b4c	Fix infiniband interface names Reviewed-by: Aleix Boné <abonerib@bsc.es>	2026-01-20 12:18:46 +01:00
Aleix Boné	933c78a80b	Fix moved package linuxPackages.perf is now perf Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>	2026-01-20 12:15:10 +01:00
Aleix Boné	150969be9b	Fix replaced nixseparatedebuginfod nixseparatedebuginfod has been replaced by nixseparatedebuginfod2 Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>	2026-01-20 12:15:06 +01:00
Aleix Boné	779449f1db	Fix renamed option watchdog.runtimeTime The option 'systemd.watchdog.runtimeTime' has been renamed to 'systemd.settings.Manager.RuntimeWatchdogSec'. Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>	2026-01-20 12:14:59 +01:00
Vincent A. Arcila	859eebda98	Change varcila shell to zsh Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>	2026-01-07 13:22:17 +01:00
Rodrigo Arias Mallo	c2a201b085	Increase fail2ban ban time on each attempt Reviewed-by: Aleix Boné <abonerib@bsc.es>	2026-01-07 13:14:34 +01:00
Rodrigo Arias Mallo	f921f0a4bd	Disable password login via SSH in apex Reviewed-by: Aleix Boné <abonerib@bsc.es>	2026-01-07 13:14:30 +01:00
Rodrigo Arias Mallo	aa16bfc0bc	Enable fail2ban in apex login node We are seeing a lot of failed attempts from the same IPs: apex% sudo journalctl -u sshd -b0 \| grep 'Failed password' \| wc -l 2441 Reviewed-by: Aleix Boné <abonerib@bsc.es>	2026-01-07 13:14:22 +01:00
Rodrigo Arias Mallo	fc69ef3217	Enable pam_slurm_adopt in all compute nodes Prevents access to owl1 and owl2 too if the user doesn't have any jobs running there. Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-10-31 11:41:50 +01:00
Rodrigo Arias Mallo	1d025f7a38	Don't suspend owl compute nodes Currently the owl nodes are located on top of the rack and turning them off causes a high temperature increase at that region, which accumulates heat from the whole rack. To maximize airflow we will leave them on at all times. This also makes allocations immediate at the extra cost of around 200 W. In the future, if we include more nodes in SLURM we can configure those to turn off if needed. Fixes: rarias/jungle#156 Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-10-31 11:41:44 +01:00
Rodrigo Arias Mallo	5ff1b1343b	Add nixgen to all machines Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-10-29 16:28:05 +01:00
Rodrigo Arias Mallo	019826d09e	Add OmpSs-2 release timers and services Send a reminder email to the STAR group to mark the release cycle dates. Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-10-28 12:38:37 +01:00
Rodrigo Arias Mallo	a294daf7e3	Use specific mail-robot group to send mail Allows any user to be able to send mail from the robot account as long as it is added to the mail-robot group. Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-10-28 12:38:17 +01:00
Rodrigo Arias Mallo	e3d1785285	Run a shell in the allocated node with salloc By default, salloc will open a new shell in the current node instead of in the allocated node. This often causes users to leave the extra shell running once the allocation ends. Repeating this process several times causes chains of shells. By running the shell in the remote node, once the allocation ends the shell finishes as well. Fixes: rarias/jungle#174 See: https://slurm.schedmd.com/faq.html#prompt Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-10-28 11:44:14 +01:00
Rodrigo Arias Mallo	14f2393d30	Update website Add apex page and replace bscpkgs references for jungle after the merge. See: rarias/jungle-website#1 Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-10-22 15:48:13 +02:00
Rodrigo Arias Mallo	f115d611e7	Add aaguirre user Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-10-22 15:28:29 +02:00
Rodrigo Arias Mallo	4261d327c6	Include agenix module and package directly Avoids adding an extra flake input only to fetch a single module and package. Reviewed-by: Aleix Boné <abonerib@bsc.es> Tested-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>	2025-10-14 09:37:47 +02:00
Aleix Boné	98d17b19d3	Enable custom sys-devices system feature Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>	2025-10-09 11:40:44 +02:00
Rodrigo Arias Mallo	188ba6df0a	Remove bscpkgs input Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-10-07 16:07:26 +02:00
Aleix Boné	e42058f08b	Allow access to hut from fox Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>	2025-10-02 17:03:21 +02:00
Rodrigo Arias Mallo	f3bfe89f27	Fetch website from its own git repository Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-10-02 15:45:21 +02:00
Rodrigo Arias Mallo	b040bebd1d	Add acinca user Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-10-01 12:27:43 +02:00
Rodrigo Arias Mallo	f69629d2da	Restart slurmd on failure A failure to reach the control node can cause slurmd to fail and the unit remains in the failed state until is manually restarted. Instead, try to restart the service every 30 seconds, forever: owl1% systemctl show slurmd \| grep -E 'Restart=\|RestartUSec=' Restart=on-failure RestartUSec=30s owl1% pgrep slurmd 5903 owl1% sudo kill -SEGV 5903 owl1% pgrep slurmd 6137 Fixes: rarias/jungle#177 Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-09-30 17:20:39 +02:00
Aleix Boné	0668f0db74	Lower connect timeout when using hut substituter Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>	2025-09-29 18:44:48 +02:00
Aleix Boné	5fcd57a061	Use hut substituter in all nodes Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>	2025-09-29 18:44:38 +02:00
Aleix Boné	ad1544759f	Remove machine access for user csiringo Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>	2025-09-29 18:23:24 +02:00
Rodrigo Arias Mallo	e1c950a530	Mount apex /home via NFS in raccoon Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-09-26 12:28:53 +02:00
Rodrigo Arias Mallo	f9632c37f8	Remove extra SSH jump configuration We now have direct visibility among nodes so we don't need any extra SSH configuration to reach them. Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-09-26 12:28:51 +02:00
Rodrigo Arias Mallo	1f0cb4ae76	Add raccoon peer to wireguard It routes traffic from fox, apex and the compute nodes so that we can reach the git servers and tent. Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-09-26 12:28:48 +02:00
Rodrigo Arias Mallo	e98fdb89ab	Restrict fox peer to a single IP Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-09-26 12:28:43 +02:00

1 2 3 4 5 ...

368 Commits