113 Commits

Author SHA1 Message Date
e57b1b5fd8 Keep compute nodes off when power comes back
When the power comes back, we don't know if the AC unit will be
operating properly or if the room will be at a safe temperature. So,
instead of powering all the machines back, only configure the login to
power on, so we can check the state of the room and power the rest of
the machines.
2025-07-31 17:39:21 +02:00
d948f8b752 Move StartLimit* options to unit section
The StartLimitBurst and StartLimitIntervalSec options belong to the
[Unit] section, otherwise they are ignored in [Service]:

> Unknown key 'StartLimitIntervalSec' in section [Service], ignoring.

When using [Unit], the limits are properly set:

  apex% systemctl show power-policy.service | grep StartLimit
  StartLimitIntervalUSec=10min
  StartLimitBurst=10
  StartLimitAction=none

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-24 14:32:46 +02:00
8f7787e217 Set power policy to always turn on
In all machines, as soon as we recover the power, turn the machine back
on. We cannot rely on the previous state as we will shut them down
before the power is cut to prevent damage on the power supply
monitoring circuit.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-24 11:22:38 +02:00
30b9b23112 Add NixOS module to control power policy
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-24 11:22:36 +02:00
9a056737de Move August shutdown to 3rd at 22h
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-24 11:22:33 +02:00
ac700d34a5 Disable automatic August shutdown for Fox
The UPC has different dates for the yearly power cut, and Fox can
recover properly from a power loss, so we don't need to have it turned
off before the power cut. Simply disabling the timer is enough.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-24 11:22:10 +02:00
9b681ab7ce Add cudainfo program to test CUDA
The cudainfo program checks that we can initialize the CUDA RT library
and communicate with the driver. It can be used as standalone program or
built with cudainfo.gpuCheck so it is executed inside the build sandbox
to see if it also works fine. It uses the autoAddDriverRunpath hook to
inject in the runpath the location of the library directory for CUDA
libraries.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-23 11:52:09 +02:00
9ce394bffd Add missing symlink in cuda sandbox
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-23 11:51:47 +02:00
8cd7b713ca Enable cuda systemFeature in raccoon and fox
This allows running derivations which depend on cuda runtime without
breaking the sandbox. We only need to add `requiredSystemFeatures = [ "cuda" ];`
to the derivation.

Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-07-22 17:07:13 +02:00
8eed90d2bd Move shared nvidia settings to a separate module
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-07-22 17:06:45 +02:00
aee54ef39f Replace xeon07 by hut in ssh config
The xeon07 machine has been renamed to hut.

Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-07-21 18:10:08 +02:00
69f7ab701b Enable automatic Nix GC in raccoon
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-21 17:58:26 +02:00
4c9bcebcdc Select proprietary NVIDIA driver in raccoon
The NVIDIA GTX 960 from 2016 has the Maxwell architecture, and NixOS
suggests using the proprietary driver for older than Turing:

> It is suggested to use the open source kernel modules on Turing or
> later GPUs (RTX series, GTX 16xx), and the closed source modules
> otherwise.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-21 17:58:21 +02:00
86e7c72b9b Enable open source NVidia driver in fox
It is recommended for newer versions.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-18 09:57:38 +02:00
a7dffc33b5 Remove option allowUnfree from fox and raccoon
It is already set to true for all machines.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-18 09:57:21 +02:00
6765dba3e4 Ban another scanner trying to connect via SSH
It is constantly spamming out logs:

apex# journalctl | grep 'Connection closed by 84.88.52.176' | wc -l
2255

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-18 09:51:49 +02:00
0acfb7a8e0 Update weasel IPMI hostname for monitoring
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-18 09:51:21 +02:00
dfbb21a5bd Remove merged MPICH patch
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-07-16 13:07:12 +02:00
2bb3b2fc4a Remove package ix as it is gone
Fails with: "error: ix has been removed from Nixpkgs, as the ix.io
pastebin has been offline since Dec. 2023".

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-07-16 13:07:06 +02:00
3270fe50a2 flake.lock: Update
Flake lock file updates:

• Updated input 'agenix':
    'github:ryantm/agenix/f6291c5935fdc4e0bef208cfc0dcab7e3f7a1c41?narHash=sha256-b%2Buqzj%2BWa6xgMS9aNbX4I%2BsXeb5biPDi39VgvSFqFvU%3D' (2024-08-10)
  → 'github:ryantm/agenix/531beac616433bac6f9e2a19feb8e99a22a66baf?narHash=sha256-9P1FziAwl5%2B3edkfFcr5HeGtQUtrSdk/MksX39GieoA%3D' (2025-06-17)
• Updated input 'agenix/darwin':
    'github:lnl7/nix-darwin/4b9b83d5a92e8c1fbfd8eb27eda375908c11ec4d?narHash=sha256-gzGLZSiOhf155FW7262kdHo2YDeugp3VuIFb4/GGng0%3D' (2023-11-24)
  → 'github:lnl7/nix-darwin/43975d782b418ebf4969e9ccba82466728c2851b?narHash=sha256-dyN%2BteG9G82G%2Bm%2BPX/aSAagkC%2BvUv0SgUw3XkPhQodQ%3D' (2025-04-12)
• Updated input 'agenix/home-manager':
    'github:nix-community/home-manager/3bfaacf46133c037bb356193bd2f1765d9dc82c1?narHash=sha256-7ulcXOk63TIT2lVDSExj7XzFx09LpdSAPtvgtM7yQPE%3D' (2023-12-20)
  → 'github:nix-community/home-manager/abfad3d2958c9e6300a883bd443512c55dfeb1be?narHash=sha256-YZCh2o9Ua1n9uCvrvi5pRxtuVNml8X2a03qIFfRKpFs%3D' (2025-04-24)
• Updated input 'bscpkgs':
    'git+https://git.sr.ht/~rodarima/bscpkgs?ref=refs/heads/master&rev=6782fc6c5b5a29e84a7f2c2d1064f4bcb1288c0f' (2024-11-29)
  → 'git+https://git.sr.ht/~rodarima/bscpkgs?ref=refs/heads/master&rev=9d1944c658929b6f98b3f3803fead4d1b91c4405' (2025-06-11)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/9c6b49aeac36e2ed73a8c472f1546f6d9cf1addc?narHash=sha256-i/UJ5I7HoqmFMwZEH6vAvBxOrjjOJNU739lnZnhUln8%3D' (2025-01-14)
  → 'github:NixOS/nixpkgs/dfcd5b901dbab46c9c6e80b265648481aafb01f8?narHash=sha256-Kt1UIPi7kZqkSc5HVj6UY5YLHHEzPBkgpNUByuyxtlw%3D' (2025-07-13)

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-07-16 13:07:01 +02:00
499112cdad Upgrade nixpkgs to nixos 25.05
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-07-16 13:06:40 +02:00
a6698e6a6b Silently ban OpenVAS BSC scanner from apex
It is spamming our logs with refused connection lines:

apex% sudo journalctl -b0 | grep 'refused connection.*SRC=192.168.8.16' | wc -l
13945

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 17:40:41 +02:00
b394c5a8f4 Rotate anavarro password and SSH key
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 17:24:41 +02:00
3d5b845057 Add weasel machine configuration
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 17:24:38 +02:00
9e83565977 Remove extra flush commands on firewall stop
They are not needed as they are already flushed when the firewall
starts or stops.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 11:18:45 +02:00
ce2cda1c41 Prevent accidental use of nftables
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 11:18:42 +02:00
e6aef2cbd0 Add proxy configuration for internal hosts
Access internal hosts via apex proxy. From the compute nodes we first
open an SSH connection to apex, and then tunnel it through the HTTP
proxy with netcat.

This way we allow reaching internal GitLab repositories without
requiring the user to have credentials in the remote host, while we can
use multiple remotes to provide redundancy.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 11:18:36 +02:00
b7603053fa Remove unused blackbox configuration modules
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 11:18:30 +02:00
3ca55acfdf Use IPv4 in blackbox probes
Otherwise they simply fail as IPv6 doesn't work.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 11:18:26 +02:00
e505a952af Make NFS mount async to improve latency
Don't wait to flush writes, as we don't care about consistency on a
crash:

> This option allows the NFS server to violate the NFS protocol and
> reply to requests before any changes made by that request have been
> committed to stable storage (e.g. disc drive).
>
> Using this option usually improves performance, but at the cost that
> an unclean server restart (i.e. a crash) can cause data to be lost or
> corrupted.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 11:18:20 +02:00
3ad9452637 Disable root_squash from NFS
Allows root to read files in the NFS export, so we can directly run
`nixos-rebuild switch` from /home.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 11:18:16 +02:00
fdd21d0dd0 Remove SSH proxy to access BSC clusters
We now have direct connection to them.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 11:18:13 +02:00
c40871bbfe Add users to apex machine
They need to be able to login to apex to access any other machine from
the SSF rack.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 11:18:09 +02:00
e8f5ce735e Remove proxy from hut HTTP probes
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 11:18:04 +02:00
4a25056897 Remove proxy configuration from environment
All machines have now direct connection with the outside world.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 11:18:00 +02:00
89e0c0df28 Add storcli utility to apex
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 11:17:57 +02:00
1b731a756a Add new configuration for apex
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-15 11:17:43 +02:00
3d97fada6d Add pmartin1 user with access to fox
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-03 11:16:43 +02:00
d1a2bfc90e Add access to fox for rpenacob user
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-02 16:58:53 +02:00
44e76ce630 Revert "Only allow Vincent to access fox for now"
This reverts commit efac36b186.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-02 16:58:49 +02:00
adb7e0ef35 Add all terminfo files in environment
Fixes problems with the kitty terminal when opening vim or kakoune.

Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-07-02 16:02:45 +02:00
b0875816f2 Monitor Fox BMC with ICMP probes too
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-02 15:51:22 +02:00
592da155a9 Restrict DAC VPN to fox-ipmi machine only
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-02 15:51:19 +02:00
5376613ec4 Monitor fox via VPN
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-02 15:51:16 +02:00
74891f0784 Add OpenVPN service to connect to fox BMC
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-02 15:51:13 +02:00
d66f9f21dd Add ac.upc.edu as name search server
Allows referring to fox.ac.upc.edu directly as fox.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-02 15:51:09 +02:00
cb05482b4f Update access instructions
We no longer need to request a petition through BSC, as we will be in
charge of the login. Remove link to the old repository as well and
prefer only email.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-02 15:24:51 +02:00
e660268661 Disable kptr_restrict in fox
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-02 15:08:42 +02:00
d45b7ea717 Disable NUMA balancing in fox
See: https://www.kernel.org/doc/html/latest/admin-guide/sysctl/kernel.html#numa-balancing

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-02 15:08:02 +02:00
c205fa4e34 Load amd_uncore module in fox
Needed for L3 events in perf.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-02 15:07:58 +02:00
5f055388a5 Enable SSH X11 forwarding
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-07-02 15:07:54 +02:00
0bc69789d9 Disable registration in Gitea
Get rid of all the spam accounts they are trying to register.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:36:18 +02:00
09bc9d9c25 Enable msmtp configuration in tent
Allows gitea to send notifications via email.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:36:15 +02:00
6b53ab4413 Add GitLab runner with debian docker for PM
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:36:13 +02:00
4618a149b3 Monitor nix-daemon in tent
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:36:11 +02:00
448d85ef9d Move nix-daemon exporter to modules
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:36:09 +02:00
956b99f02a Add p service for pastes
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:36:07 +02:00
ec2eb8c3ed Enable public-inbox service in tent
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:36:06 +02:00
09a5bdfbe4 Enable gitea in tent
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:36:04 +02:00
c49dd15303 Add bsc.es to resolve domain names
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:36:02 +02:00
38fd0eefa3 Monitor AXLE machine too
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:36:00 +02:00
e386a320ff Use IPv4 for blackbox exporter
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:35:59 +02:00
5ea8d6a6dd Add public html files to tent
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:35:57 +02:00
7b108431dc Add docker GitLab runner for BSC GitLab
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:35:55 +02:00
e80b4d7c31 Add GitLab shell runner in tent for PM
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:35:54 +02:00
e4c22e91b2 Enable jungle robot emails for Grafana in tent
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:35:52 +02:00
27d4f4f272 Add tent key for nix-serve
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:35:50 +02:00
978087e53a Remove jungle nix cache from tent
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:35:48 +02:00
ad9a5bc906 Enable nix cache
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:35:47 +02:00
7aeb78426e Serve Grafana from subpath
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:35:45 +02:00
a0d1b31bb6 Add nginx server in tent
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:35:43 +02:00
a7775f9a8d Add monitoring in tent
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-18 15:35:00 +02:00
7bb11611a8 Disable nix garbage collector in tent
Reviewed-by: Aleix Boné <abonerib@bsc.es>
Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-06-11 16:05:05 +02:00
cf9bcc27e0 Rekey secrets with tent keys
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-11 16:04:20 +02:00
81073540b0 Add tent host key and admin keys
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-11 16:04:16 +02:00
a43f856b53 Create directories in /vault/home for tent users
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-11 16:04:12 +02:00
be231b6d2d Add software RAID in tent using 3 disks
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-11 16:04:10 +02:00
2f2381ad0f Add access to tent to all hut users too
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-11 16:04:06 +02:00
19e90a1ef7 Add hut SSH configuration from outside SSF LAN
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-11 16:04:04 +02:00
090100f180 Don't use proxy in base preset
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-11 16:04:00 +02:00
3d48d224c9 Add tent machine from xeon04
We moved the tent machine to the server room in the BSC building and is
now directly connected to the raccoon via NAT.

Fixes: #106
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-11 16:03:54 +02:00
0317f42613 Create specific SSF rack configuration
Allow xeon machines to optionally inherit SSF configuration such as the
NFS mount point and the network configuration.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-11 16:03:49 +02:00
efac36b186 Only allow Vincent to access fox for now
Needed to run benchmarks without interference.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-11 12:08:57 +02:00
d2385ac639 Use performance governor in fox
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-11 12:08:55 +02:00
d28ed0ab69 Add hut as nix cache in fox
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-11 12:08:51 +02:00
1ef6f9a2bb Use extra- for substituters and trusted-public-keys
From the nix manual:

> A configuration setting usually overrides any previous value. However,
> for settings that take a list of items, you can prefix the name of the
> setting by extra- to append to the previous value.

Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>
2025-06-11 11:27:37 +02:00
86b7032bbb Use DHCP for Ethernet in fox
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-11 10:24:53 +02:00
8c5f4defd7 Use UPC time servers as others are blocked
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-11 10:24:47 +02:00
b802a59868 Create tracing group and add arocanon in raccoon
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-03 11:09:41 +02:00
7247f7e665 Extend perf support in raccoon
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-03 11:09:30 +02:00
1d555871a5 Enable nixdebuginfod in raccoon
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-03 10:50:01 +02:00
a2535c996d Make raccoon use performance governor
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-03 10:45:35 +02:00
37e60afb54 Enable binfmt emulation in raccoon
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-03 10:45:33 +02:00
3fe138a418 Disable nix garbage collector in raccoon
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-03 10:45:31 +02:00
4e7a9f7ce4 Add dbautist user to raccoon machine
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-03 10:45:28 +02:00
a6a1af673a Add node exporter monitoring in raccoon
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-03 10:45:26 +02:00
2a3a7b2fb2 Allow X11 forwarding via SSH
Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-03 10:45:23 +02:00
b4ab1c836a Enable linger for user rarias
Allows services to run without a login session.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-03 10:45:19 +02:00
fb8b4defa7 Only proxy SSH git remotes via hut in xeon
Other machines like raccoon have direct access.

Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>
2025-06-03 10:44:31 +02:00
1bcfbf8cd6 Add machine map file
Documents the location, board and serial numbers so we can track the
machines if they move around. Some information is unkown.

Using the Nix language to encode the machines location and properties
allows us to later use that information in the configuration of the
machines themselves.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-02 14:55:58 +02:00
9f43a0e13b Remove fox monitoring via IPMI
We will need to setup an VPN to be able to access fox in its new
location, so for now we simply remove the IPMI monitoring.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-02 11:26:53 +02:00
3a3c3050ef Monitor fox, gateway and UPC anella via ICMP
Fox should reply once the machine is connected to the UPC network.
Monitoring also the gateway and UPC anella allows us to estimate if the
whole network is down or just fox.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-02 11:26:51 +02:00
4419f68948 Update configuration for UPC network
The fox machine will be placed in the UPC network, so we update the
configuration with the new IP and gateway. We won't be able to reach hut
directly so we also remove the host entry and proxy.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-02 11:26:48 +02:00
e51fc9ffa5 Disable home via NFS in fox
It won't be accesible anymore as we won't be in the same LAN.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-02 11:26:46 +02:00
2ae9e9b635 Rekey all secrets
Fox is no longer able to use munge or ceph, so we remove the key and
rekey them.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-02 11:26:44 +02:00
be77f6a5f5 Rotate fox SSH host key
Prevent decrypting old secrets by reading the git history.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-02 11:26:42 +02:00
6316a12a67 Distrust fox SSH key
We no longer will share secrets with fox until we can regain our trust.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-02 11:26:38 +02:00
db663913d8 Remove Ceph module from fox
It will no longer be accesible from the UPC.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-02 11:26:36 +02:00
b4846b0f6c Remove fox from SLURM
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-02 11:26:20 +02:00
64a52801ed Remove pam_slurm_adopt from fox
We no longer will be able to use SLURM from jungle.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-06-02 11:26:02 +02:00
7a2f37aaa2 Add UPC temperature sensor monitoring
These sensors are part of their air quality measurements, which just
happen to be very close to our server room.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-05-29 13:01:37 +02:00
aae6585f66 Add meteocat exporter
Allows us to track ambient temperature changes and estimate the
temperature delta between the server room and exterior temperature.
We should be able to predict when we would need to stop the machines due
to excesive temperature as summer approaches.

Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-05-29 13:01:29 +02:00
1c15e77c83 Add custom nix-daemon exporter
Allows us to see which derivations are being built in realtime. It is a
bit of a hack, but it seems to work. We simply look at the environment
of the child processes of nix-daemon (usually bash) and then look for
the $name variable which should hold the current derivation being
built. Needs root to be able to read the environ file of the different
nix-daemon processes as they are owned by the nixbld* users.

See: https://discourse.nixos.org/t/query-ongoing-builds/23486
Reviewed-by: Aleix Boné <abonerib@bsc.es>
2025-05-29 12:57:07 +02:00
83 changed files with 2328 additions and 450 deletions

34
flake.lock generated
View File

@@ -10,11 +10,11 @@
"systems": "systems"
},
"locked": {
"lastModified": 1723293904,
"narHash": "sha256-b+uqzj+Wa6xgMS9aNbX4I+sXeb5biPDi39VgvSFqFvU=",
"lastModified": 1750173260,
"narHash": "sha256-9P1FziAwl5+3edkfFcr5HeGtQUtrSdk/MksX39GieoA=",
"owner": "ryantm",
"repo": "agenix",
"rev": "f6291c5935fdc4e0bef208cfc0dcab7e3f7a1c41",
"rev": "531beac616433bac6f9e2a19feb8e99a22a66baf",
"type": "github"
},
"original": {
@@ -30,11 +30,11 @@
]
},
"locked": {
"lastModified": 1732868163,
"narHash": "sha256-qck4h298AgcNI6BnGhEwl26MTLXjumuJVr+9kak7uPo=",
"lastModified": 1749650500,
"narHash": "sha256-2MHfVPV6RA7qPSCtXh4+KK0F0UjN+J4z8//+n6NK7Xs=",
"ref": "refs/heads/master",
"rev": "6782fc6c5b5a29e84a7f2c2d1064f4bcb1288c0f",
"revCount": 952,
"rev": "9d1944c658929b6f98b3f3803fead4d1b91c4405",
"revCount": 961,
"type": "git",
"url": "https://git.sr.ht/~rodarima/bscpkgs"
},
@@ -51,11 +51,11 @@
]
},
"locked": {
"lastModified": 1700795494,
"narHash": "sha256-gzGLZSiOhf155FW7262kdHo2YDeugp3VuIFb4/GGng0=",
"lastModified": 1744478979,
"narHash": "sha256-dyN+teG9G82G+m+PX/aSAagkC+vUv0SgUw3XkPhQodQ=",
"owner": "lnl7",
"repo": "nix-darwin",
"rev": "4b9b83d5a92e8c1fbfd8eb27eda375908c11ec4d",
"rev": "43975d782b418ebf4969e9ccba82466728c2851b",
"type": "github"
},
"original": {
@@ -73,11 +73,11 @@
]
},
"locked": {
"lastModified": 1703113217,
"narHash": "sha256-7ulcXOk63TIT2lVDSExj7XzFx09LpdSAPtvgtM7yQPE=",
"lastModified": 1745494811,
"narHash": "sha256-YZCh2o9Ua1n9uCvrvi5pRxtuVNml8X2a03qIFfRKpFs=",
"owner": "nix-community",
"repo": "home-manager",
"rev": "3bfaacf46133c037bb356193bd2f1765d9dc82c1",
"rev": "abfad3d2958c9e6300a883bd443512c55dfeb1be",
"type": "github"
},
"original": {
@@ -88,16 +88,16 @@
},
"nixpkgs": {
"locked": {
"lastModified": 1736867362,
"narHash": "sha256-i/UJ5I7HoqmFMwZEH6vAvBxOrjjOJNU739lnZnhUln8=",
"lastModified": 1752436162,
"narHash": "sha256-Kt1UIPi7kZqkSc5HVj6UY5YLHHEzPBkgpNUByuyxtlw=",
"owner": "NixOS",
"repo": "nixpkgs",
"rev": "9c6b49aeac36e2ed73a8c472f1546f6d9cf1addc",
"rev": "dfcd5b901dbab46c9c6e80b265648481aafb01f8",
"type": "github"
},
"original": {
"owner": "NixOS",
"ref": "nixos-24.11",
"ref": "nixos-25.05",
"repo": "nixpkgs",
"type": "github"
}

View File

@@ -1,6 +1,6 @@
{
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixos-24.11";
nixpkgs.url = "github:NixOS/nixpkgs/nixos-25.05";
agenix.url = "github:ryantm/agenix";
agenix.inputs.nixpkgs.follows = "nixpkgs";
bscpkgs.url = "git+https://git.sr.ht/~rodarima/bscpkgs";
@@ -18,6 +18,7 @@ in
{
nixosConfigurations = {
hut = mkConf "hut";
tent = mkConf "tent";
owl1 = mkConf "owl1";
owl2 = mkConf "owl2";
eudy = mkConf "eudy";
@@ -26,6 +27,8 @@ in
lake2 = mkConf "lake2";
raccoon = mkConf "raccoon";
fox = mkConf "fox";
apex = mkConf "apex";
weasel = mkConf "weasel";
};
packages.x86_64-linux = self.nixosConfigurations.hut.pkgs // {

View File

@@ -2,29 +2,35 @@
# here all the public keys
rec {
hosts = {
hut = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICO7jIp6JRnRWTMDsTB/aiaICJCl4x8qmKMPSs4lCqP1 hut";
owl1 = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIMqMEXO0ApVsBA6yjmb0xP2kWyoPDIWxBB0Q3+QbHVhv owl1";
owl2 = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIHurEYpQzNHqWYF6B9Pd7W8UPgF3BxEg0BvSbsA7BAdK owl2";
eudy = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIL+WYPRRvZupqLAG0USKmd/juEPmisyyJaP8hAgYwXsG eudy";
koro = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIImiTFDbxyUYPumvm8C4mEnHfuvtBY1H8undtd6oDd67 koro";
bay = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICvGBzpRQKuQYHdlUQeAk6jmdbkrhmdLwTBqf3el7IgU bay";
lake2 = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAINo66//S1yatpQHE/BuYD/Gfq64TY7ZN5XOGXmNchiO0 lake2";
fox = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIDa9lId4rB/EKGkkCCVOy0cuId2SYLs+8W8kx0kmpO1y fox";
hut = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICO7jIp6JRnRWTMDsTB/aiaICJCl4x8qmKMPSs4lCqP1 hut";
owl1 = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIMqMEXO0ApVsBA6yjmb0xP2kWyoPDIWxBB0Q3+QbHVhv owl1";
owl2 = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIHurEYpQzNHqWYF6B9Pd7W8UPgF3BxEg0BvSbsA7BAdK owl2";
eudy = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIL+WYPRRvZupqLAG0USKmd/juEPmisyyJaP8hAgYwXsG eudy";
koro = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIImiTFDbxyUYPumvm8C4mEnHfuvtBY1H8undtd6oDd67 koro";
bay = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICvGBzpRQKuQYHdlUQeAk6jmdbkrhmdLwTBqf3el7IgU bay";
lake2 = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAINo66//S1yatpQHE/BuYD/Gfq64TY7ZN5XOGXmNchiO0 lake2";
fox = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIDwItIk5uOJcQEVPoy/CVGRzfmE1ojrdDcI06FrU4NFT fox";
tent = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFAtTpHtdYoelbknD/IcfBlThwLKJv/dSmylOgpg3FRM tent";
apex = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIBvUFjSfoxXnKwXhEFXx5ckRKJ0oewJ82mRitSMNMKjh apex";
weasel = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFLJrQ8BF6KcweQV8pLkSbFT+tbDxSG9qxrdQE65zJZp weasel";
};
hostGroup = with hosts; rec {
compute = [ owl1 owl2 fox ];
playground = [ eudy koro ];
untrusted = [ fox ];
compute = [ owl1 owl2 ];
playground = [ eudy koro weasel ];
storage = [ bay lake2 ];
monitor = [ hut ];
login = [ apex ];
system = storage ++ monitor;
system = storage ++ monitor ++ login;
safe = system ++ compute;
all = safe ++ playground;
};
admins = {
rarias = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIE1oZTPtlEXdGt0Ak+upeCIiBdaDQtcmuWoTUCVuSVIR rarias@hut";
root = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIII/1TNArcwA6D47mgW4TArwlxQRpwmIGiZDysah40Gb root@hut";
"rarias@hut" = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIE1oZTPtlEXdGt0Ak+upeCIiBdaDQtcmuWoTUCVuSVIR rarias@hut";
"rarias@tent" = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIIwlWSBTZi74WTz5xn6gBvTmCoVltmtIAeM3RMmkh4QZ rarias@tent";
root = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIII/1TNArcwA6D47mgW4TArwlxQRpwmIGiZDysah40Gb root@hut";
};
}

86
m/apex/configuration.nix Normal file
View File

@@ -0,0 +1,86 @@
{ lib, config, pkgs, ... }:
{
imports = [
../common/xeon.nix
../common/ssf/hosts.nix
../module/ceph.nix
../module/power-policy.nix
./nfs.nix
];
power.policy = "always-on";
# Don't install grub MBR for now
boot.loader.grub.device = "nodev";
boot.initrd.kernelModules = [
"megaraid_sas" # For HW RAID
];
environment.systemPackages = with pkgs; [
storcli # To manage HW RAID
];
fileSystems."/home" = {
device = "/dev/disk/by-label/home";
fsType = "ext4";
};
# No swap, there is plenty of RAM
swapDevices = lib.mkForce [];
networking = {
hostName = "apex";
defaultGateway = "84.88.53.233";
nameservers = [ "8.8.8.8" ];
# Public facing interface
interfaces.eno1.ipv4.addresses = [ {
address = "84.88.53.236";
prefixLength = 29;
} ];
# Internal LAN to our Ethernet switch
interfaces.eno2.ipv4.addresses = [ {
address = "10.0.40.30";
prefixLength = 24;
} ];
# Infiniband over Omnipath switch (disconnected for now)
# interfaces.ibp5s0 = {};
nat = {
enable = true;
internalInterfaces = [ "eno2" ];
externalInterface = "eno1";
};
};
# Use SSH tunnel to reach internal hosts
programs.ssh.extraConfig = ''
Host bscpm04.bsc.es gitlab-internal.bsc.es knights3.bsc.es
ProxyCommand nc -X connect -x localhost:23080 %h %p
Host raccoon
HostName knights3.bsc.es
ProxyCommand nc -X connect -x localhost:23080 %h %p
Host tent
ProxyJump raccoon
'';
networking.firewall = {
extraCommands = ''
# Blackhole BSC vulnerability scanner (OpenVAS) as it is spamming our
# logs. Insert as first position so we also protect SSH.
iptables -I nixos-fw 1 -p tcp -s 192.168.8.16 -j nixos-fw-refuse
# Same with opsmonweb01.bsc.es which seems to be trying to access via SSH
iptables -I nixos-fw 2 -p tcp -s 84.88.52.176 -j nixos-fw-refuse
'';
};
# Use tent for cache
nix.settings = {
extra-substituters = [ "https://jungle.bsc.es/cache" ];
extra-trusted-public-keys = [ "jungle.bsc.es:pEc7MlAT0HEwLQYPtpkPLwRsGf80ZI26aj29zMw/HH0=" ];
};
}

32
m/apex/nfs.nix Normal file
View File

@@ -0,0 +1,32 @@
{ ... }:
{
services.nfs.server = {
enable = true;
lockdPort = 4001;
mountdPort = 4002;
statdPort = 4000;
exports = ''
/home 10.0.40.0/24(rw,async,no_subtree_check,no_root_squash)
'';
};
networking.firewall = {
# Check with `rpcinfo -p`
extraCommands = ''
# Accept NFS traffic from compute nodes but not from the outside
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 111 -j nixos-fw-accept
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 2049 -j nixos-fw-accept
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 4000 -j nixos-fw-accept
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 4001 -j nixos-fw-accept
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 4002 -j nixos-fw-accept
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 20048 -j nixos-fw-accept
# Same but UDP
iptables -A nixos-fw -p udp -s 10.0.40.0/24 --dport 111 -j nixos-fw-accept
iptables -A nixos-fw -p udp -s 10.0.40.0/24 --dport 2049 -j nixos-fw-accept
iptables -A nixos-fw -p udp -s 10.0.40.0/24 --dport 4000 -j nixos-fw-accept
iptables -A nixos-fw -p udp -s 10.0.40.0/24 --dport 4001 -j nixos-fw-accept
iptables -A nixos-fw -p udp -s 10.0.40.0/24 --dport 4002 -j nixos-fw-accept
iptables -A nixos-fw -p udp -s 10.0.40.0/24 --dport 20048 -j nixos-fw-accept
'';
};
}

View File

@@ -2,7 +2,7 @@
{
imports = [
../common/xeon.nix
../common/ssf.nix
../module/monitoring.nix
];

View File

@@ -3,6 +3,7 @@
# Includes the basic configuration for an Intel server.
imports = [
./base/agenix.nix
./base/always-power-on.nix
./base/august-shutdown.nix
./base/boot.nix
./base/env.nix

View File

@@ -1,12 +1,12 @@
{
# Shutdown all machines on August 2nd at 11:00 AM, so we can protect the
# Shutdown all machines on August 3rd at 22:00, so we can protect the
# hardware from spurious electrical peaks on the yearly electrical cut for
# manteinance that starts on August 4th.
systemd.timers.august-shutdown = {
description = "Shutdown on August 2nd for maintenance";
description = "Shutdown on August 3rd for maintenance";
wantedBy = [ "timers.target" ];
timerConfig = {
OnCalendar = "*-08-02 11:00:00";
OnCalendar = "*-08-03 22:00:00";
RandomizedDelaySec = "10min";
Unit = "systemd-poweroff.service";
};

View File

@@ -3,8 +3,8 @@
{
environment.systemPackages = with pkgs; [
vim wget git htop tmux pciutils tcpdump ripgrep nix-index nixos-option
nix-diff ipmitool freeipmi ethtool lm_sensors ix cmake gnumake file tree
ncdu config.boot.kernelPackages.perf ldns
nix-diff ipmitool freeipmi ethtool lm_sensors cmake gnumake file tree
ncdu config.boot.kernelPackages.perf ldns pv
# From bsckgs overlay
osumb
];
@@ -21,6 +21,8 @@
}
];
environment.enableAllTerminfo = true;
environment.variables = {
EDITOR = "vim";
VISUAL = "vim";

View File

@@ -1,4 +1,4 @@
{ pkgs, ... }:
{ pkgs, lib, ... }:
{
networking = {
@@ -10,8 +10,11 @@
allowedTCPPorts = [ 22 ];
};
# Make sure we use iptables
nftables.enable = lib.mkForce false;
hosts = {
"84.88.53.236" = [ "ssfhead.bsc.es" "ssfhead" ];
"84.88.53.236" = [ "apex" "ssfhead.bsc.es" "ssfhead" ];
"84.88.51.152" = [ "raccoon" ];
"84.88.51.142" = [ "raccoon-ipmi" ];
};

View File

@@ -6,6 +6,8 @@
(import ../../../pkgs/overlay.nix)
];
nixpkgs.config.allowUnfree = true;
nix = {
nixPath = [
"nixpkgs=${nixpkgs}"

View File

@@ -0,0 +1,9 @@
{
imports = [
../../module/power-policy.nix
];
# By default, keep the machines off as we don't know if the AC will be working
# once the electricity comes back.
power.policy = "always-off";
}

View File

@@ -8,17 +8,6 @@ in
# Enable the OpenSSH daemon.
services.openssh.enable = true;
# Connect to intranet git hosts via proxy
programs.ssh.extraConfig = ''
Host bscpm02.bsc.es bscpm03.bsc.es bscpm04.bsc.es gitlab-internal.bsc.es alya.gitlab.bsc.es
User git
ProxyCommand nc -X connect -x hut:23080 %h %p
# Connect to BSC machines via hut proxy too
Host amdlogin1.bsc.es armlogin1.bsc.es hualogin1.bsc.es glogin1.bsc.es glogin2.bsc.es fpgalogin1.bsc.es
ProxyCommand nc -X connect -x hut:23080 %h %p
'';
programs.ssh.knownHosts = hostsKeys // {
"gitlab-internal.bsc.es".publicKey = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIF9arsAOSRB06hdy71oTvJHG2Mg8zfebADxpvc37lZo3";
"bscpm03.bsc.es".publicKey = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIM2NuSUPsEhqz1j5b4Gqd+MWFnRqyqY57+xMvBUqHYUS";

View File

@@ -20,6 +20,7 @@
rarias = {
uid = 1880;
isNormalUser = true;
linger = true;
home = "/home/Computational/rarias";
description = "Rodrigo Arias";
group = "Computational";
@@ -39,7 +40,7 @@
home = "/home/Computational/arocanon";
description = "Aleix Roca";
group = "Computational";
extraGroups = [ "wheel" ];
extraGroups = [ "wheel" "tracing" ];
hashedPassword = "$6$hliZiW4tULC/tH7p$pqZarwJkNZ7vS0G5llWQKx08UFG9DxDYgad7jplMD8WkZh5k58i4dfPoWtnEShfjTO6JHiIin05ny5lmSXzGM/";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIF3zeB5KSimMBAjvzsp1GCkepVaquVZGPYwRIzyzaCba aleix@bsc"
@@ -55,7 +56,7 @@
home = "/home/Computational/rpenacob";
description = "Raúl Peñacoba";
group = "Computational";
hosts = [ "owl1" "owl2" "hut" ];
hosts = [ "apex" "owl1" "owl2" "hut" "tent" "fox" ];
hashedPassword = "$6$TZm3bDIFyPrMhj1E$uEDXoYYd1z2Wd5mMPfh3DZAjP7ztVjJ4ezIcn82C0ImqafPA.AnTmcVftHEzLB3tbe2O4SxDyPSDEQgJ4GOtj/";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFYfXg37mauGeurqsLpedgA2XQ9d4Nm0ZGo/hI1f7wwH rpenacob@bsc"
@@ -68,10 +69,10 @@
home = "/home/Computational/anavarro";
description = "Antoni Navarro";
group = "Computational";
hosts = [ "hut" "raccoon" "fox" ];
hashedPassword = "$6$QdNDsuLehoZTYZlb$CDhCouYDPrhoiB7/seu7RF.Gqg4zMQz0n5sA4U1KDgHaZOxy2as9pbIGeF8tOHJKRoZajk5GiaZv0rZMn7Oq31";
hosts = [ "apex" "hut" "tent" "raccoon" "fox" "weasel" ];
hashedPassword = "$6$EgturvVYXlKgP43g$gTN78LLHIhaF8hsrCXD.O6mKnZSASWSJmCyndTX8QBWT6wTlUhcWVAKz65lFJPXjlJA4u7G1ydYQ0GG6Wk07b1";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAILWjRSlKgzBPZQhIeEtk6Lvws2XNcYwHcwPv4osSgst5 anavarro@ssfhead"
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIMsbM21uepnJwPrRe6jYFz8zrZ6AYMtSEvvt4c9spmFP toni@delltoni"
];
};
@@ -81,7 +82,7 @@
home = "/home/Computational/abonerib";
description = "Aleix Boné";
group = "Computational";
hosts = [ "owl1" "owl2" "hut" "raccoon" "fox" ];
hosts = [ "apex" "owl1" "owl2" "hut" "tent" "raccoon" "fox" "weasel" ];
hashedPassword = "$6$V1EQWJr474whv7XJ$OfJ0wueM2l.dgiJiiah0Tip9ITcJ7S7qDvtSycsiQ43QBFyP4lU0e0HaXWps85nqB4TypttYR4hNLoz3bz662/";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIIIFiqXqt88VuUfyANkZyLJNiuroIITaGlOOTMhVDKjf abonerib@bsc"
@@ -94,7 +95,7 @@
home = "/home/Computational/vlopez";
description = "Victor López";
group = "Computational";
hosts = [ "koro" ];
hosts = [ "apex" "koro" ];
hashedPassword = "$6$0ZBkgIYE/renVqtt$1uWlJsb0FEezRVNoETTzZMx4X2SvWiOsKvi0ppWCRqI66S6TqMBXBdP4fcQyvRRBt0e4Z7opZIvvITBsEtO0f0";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGMwlUZRf9jfG666Qa5Sb+KtEhXqkiMlBV2su3x/dXHq victor@arch"
@@ -107,7 +108,7 @@
home = "/home/Computational/dbautist";
description = "Dylan Bautista Cases";
group = "Computational";
hosts = [ "hut" ];
hosts = [ "apex" "hut" "tent" "raccoon" ];
hashedPassword = "$6$a2lpzMRVkG9nSgIm$12G6.ka0sFX1YimqJkBAjbvhRKZ.Hl090B27pdbnQOW0wzyxVWySWhyDDCILjQELky.HKYl9gqOeVXW49nW7q/";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAb+EQBoS98zrCwnGKkHKwMLdYABMTqv7q9E0+T0QmkS dbautist@bsc-848818791"
@@ -120,7 +121,7 @@
home = "/home/Computational/dalvare1";
description = "David Álvarez";
group = "Computational";
hosts = [ "hut" "fox" ];
hosts = [ "apex" "hut" "tent" "fox" ];
hashedPassword = "$6$mpyIsV3mdq.rK8$FvfZdRH5OcEkUt5PnIUijWyUYZvB1SgeqxpJ2p91TTe.3eQIDTcLEQ5rxeg.e5IEXAZHHQ/aMsR5kPEujEghx0";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGEfy6F4rF80r4Cpo2H5xaWqhuUZzUsVsILSKGJzt5jF dalvare1@ssfhead"
@@ -133,16 +134,31 @@
home = "/home/Computational/varcila";
description = "Vincent Arcila";
group = "Computational";
hosts = [ "hut" "fox" ];
hosts = [ "apex" "hut" "tent" "fox" ];
hashedPassword = "$6$oB0Tcn99DcM4Ch$Vn1A0ulLTn/8B2oFPi9wWl/NOsJzaFAWjqekwcuC9sMC7cgxEVb.Nk5XSzQ2xzYcNe5MLtmzkVYnRS1CqP39Y0";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIKGt0ESYxekBiHJQowmKpfdouw0hVm3N7tUMtAaeLejK vincent@varch"
];
};
pmartin1 = {
# Arbitrary UID but large so it doesn't collide with other users on ssfhead.
uid = 9652;
isNormalUser = true;
home = "/home/Computational/pmartin1";
description = "Pedro J. Martinez-Ferrer";
group = "Computational";
hosts = [ "fox" ];
hashedPassword = "$6$nIgDMGnt4YIZl3G.$.JQ2jXLtDPRKsbsJfJAXdSvjDIzRrg7tNNjPkLPq3KJQhMjfDXRUvzagUHUU2TrE2hHM8/6uq8ex0UdxQ0ysl.";
openssh.authorizedKeys.keys = [
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIIV5LEAII5rfe1hYqDYIIrhb1gOw7RcS1p2mhOTqG+zc pedro@pedro-ThinkPad-P14s-Gen-2a"
];
};
};
groups = {
Computational = { gid = 564; };
tracing = { };
};
};
}

10
m/common/ssf.nix Normal file
View File

@@ -0,0 +1,10 @@
{
# Provides the base system for a xeon node in the SSF rack.
imports = [
./xeon.nix
./ssf/fs.nix
./ssf/hosts.nix
./ssf/net.nix
./ssf/ssh.nix
];
}

23
m/common/ssf/hosts.nix Normal file
View File

@@ -0,0 +1,23 @@
{ pkgs, ... }:
{
networking.hosts = {
# Login
"10.0.40.30" = [ "apex" ];
# Storage
"10.0.40.40" = [ "bay" ]; "10.0.42.40" = [ "bay-ib" ]; "10.0.40.141" = [ "bay-ipmi" ];
"10.0.40.41" = [ "oss01" ]; "10.0.42.41" = [ "oss01-ib0" ]; "10.0.40.142" = [ "oss01-ipmi" ];
"10.0.40.42" = [ "lake2" ]; "10.0.42.42" = [ "lake2-ib" ]; "10.0.40.143" = [ "lake2-ipmi" ];
# Xeon compute
"10.0.40.1" = [ "owl1" ]; "10.0.42.1" = [ "owl1-ib" ]; "10.0.40.101" = [ "owl1-ipmi" ];
"10.0.40.2" = [ "owl2" ]; "10.0.42.2" = [ "owl2-ib" ]; "10.0.40.102" = [ "owl2-ipmi" ];
"10.0.40.3" = [ "xeon03" ]; "10.0.42.3" = [ "xeon03-ib" ]; "10.0.40.103" = [ "xeon03-ipmi" ];
#"10.0.40.4" = [ "tent" ]; "10.0.42.4" = [ "tent-ib" ]; "10.0.40.104" = [ "tent-ipmi" ];
"10.0.40.5" = [ "koro" ]; "10.0.42.5" = [ "koro-ib" ]; "10.0.40.105" = [ "koro-ipmi" ];
"10.0.40.6" = [ "weasel" ]; "10.0.42.6" = [ "weasel-ib" ]; "10.0.40.106" = [ "weasel-ipmi" ];
"10.0.40.7" = [ "hut" ]; "10.0.42.7" = [ "hut-ib" ]; "10.0.40.107" = [ "hut-ipmi" ];
"10.0.40.8" = [ "eudy" ]; "10.0.42.8" = [ "eudy-ib" ]; "10.0.40.108" = [ "eudy-ipmi" ];
};
}

23
m/common/ssf/net.nix Normal file
View File

@@ -0,0 +1,23 @@
{ pkgs, ... }:
{
# Infiniband (IPoIB)
environment.systemPackages = [ pkgs.rdma-core ];
boot.kernelModules = [ "ib_umad" "ib_ipoib" ];
networking = {
defaultGateway = "10.0.40.30";
nameservers = ["8.8.8.8"];
firewall = {
extraCommands = ''
# Prevent ssfhead from contacting our slurmd daemon
iptables -A nixos-fw -p tcp -s ssfhead --dport 6817:6819 -j nixos-fw-refuse
# But accept traffic to slurm ports from any other node in the subnet
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 6817:6819 -j nixos-fw-accept
# We also need to open the srun port range
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 60000:61000 -j nixos-fw-accept
'';
};
};
}

16
m/common/ssf/ssh.nix Normal file
View File

@@ -0,0 +1,16 @@
{
# Use SSH tunnel to apex to reach internal hosts
programs.ssh.extraConfig = ''
Host tent
ProxyJump raccoon
# Access raccoon via the HTTP proxy
Host raccoon knights3.bsc.es
HostName knights3.bsc.es
ProxyCommand=ssh apex 'nc -X connect -x localhost:23080 %h %p'
# Make sure we can reach gitlab even if we don't have SSH access to raccoon
Host bscpm04.bsc.es gitlab-internal.bsc.es
ProxyCommand=ssh apex 'nc -X connect -x localhost:23080 %h %p'
'';
}

View File

@@ -1,9 +1,7 @@
{
# Provides the base system for a xeon node.
# Provides the base system for a xeon node, not necessarily in the SSF rack.
imports = [
./base.nix
./xeon/fs.nix
./xeon/console.nix
./xeon/net.nix
];
}

View File

@@ -1,94 +0,0 @@
{ pkgs, ... }:
{
# Infiniband (IPoIB)
environment.systemPackages = [ pkgs.rdma-core ];
boot.kernelModules = [ "ib_umad" "ib_ipoib" ];
networking = {
defaultGateway = "10.0.40.30";
nameservers = ["8.8.8.8"];
proxy = {
default = "http://hut:23080/";
noProxy = "127.0.0.1,localhost,internal.domain,10.0.40.40,hut";
# Don't set all_proxy as go complains and breaks the gitlab runner, see:
# https://github.com/golang/go/issues/16715
allProxy = null;
};
firewall = {
extraCommands = ''
# Prevent ssfhead from contacting our slurmd daemon
iptables -A nixos-fw -p tcp -s ssfhead --dport 6817:6819 -j nixos-fw-refuse
# But accept traffic to slurm ports from any other node in the subnet
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 6817:6819 -j nixos-fw-accept
# We also need to open the srun port range
iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 60000:61000 -j nixos-fw-accept
'';
};
extraHosts = ''
10.0.40.30 ssfhead
# Node Entry for node: mds01 (ID=72)
10.0.40.40 bay mds01 mds01-eth0
10.0.42.40 bay-ib mds01-ib0
10.0.40.141 bay-ipmi mds01-ipmi0 mds01-ipmi
# Node Entry for node: oss01 (ID=73)
10.0.40.41 oss01 oss01-eth0
10.0.42.41 oss01-ib0
10.0.40.142 oss01-ipmi0 oss01-ipmi
# Node Entry for node: oss02 (ID=74)
10.0.40.42 lake2 oss02 oss02-eth0
10.0.42.42 lake2-ib oss02-ib0
10.0.40.143 lake2-ipmi oss02-ipmi0 oss02-ipmi
# Node Entry for node: xeon01 (ID=15)
10.0.40.1 owl1 xeon01 xeon01-eth0
10.0.42.1 owl1-ib xeon01-ib0
10.0.40.101 owl1-ipmi xeon01-ipmi0 xeon01-ipmi
# Node Entry for node: xeon02 (ID=16)
10.0.40.2 owl2 xeon02 xeon02-eth0
10.0.42.2 owl2-ib xeon02-ib0
10.0.40.102 owl2-ipmi xeon02-ipmi0 xeon02-ipmi
# Node Entry for node: xeon03 (ID=17)
10.0.40.3 xeon03 xeon03-eth0
10.0.42.3 xeon03-ib0
10.0.40.103 xeon03-ipmi0 xeon03-ipmi
# Node Entry for node: xeon04 (ID=18)
10.0.40.4 xeon04 xeon04-eth0
10.0.42.4 xeon04-ib0
10.0.40.104 xeon04-ipmi0 xeon04-ipmi
# Node Entry for node: xeon05 (ID=19)
10.0.40.5 koro xeon05 xeon05-eth0
10.0.42.5 koro-ib xeon05-ib0
10.0.40.105 koro-ipmi xeon05-ipmi0
# Node Entry for node: xeon06 (ID=20)
10.0.40.6 xeon06 xeon06-eth0
10.0.42.6 xeon06-ib0
10.0.40.106 xeon06-ipmi0 xeon06-ipmi
# Node Entry for node: xeon07 (ID=21)
10.0.40.7 hut xeon07 xeon07-eth0
10.0.42.7 hut-ib xeon07-ib0
10.0.40.107 hut-ipmi xeon07-ipmi0 xeon07-ipmi
# Node Entry for node: xeon08 (ID=22)
10.0.40.8 eudy xeon08 xeon08-eth0
10.0.42.8 eudy-ib xeon08-ib0
10.0.40.108 eudy-ipmi xeon08-ipmi0 xeon08-ipmi
# fox
10.0.40.26 fox
10.0.40.126 fox-ipmi
'';
};
}

View File

@@ -2,7 +2,7 @@
{
imports = [
../common/xeon.nix
../common/ssf.nix
#(modulesPath + "/installer/netboot/netboot-minimal.nix")
./kernel/kernel.nix

View File

@@ -2,13 +2,19 @@
{
imports = [
../common/xeon.nix
../module/ceph.nix
../common/base.nix
../common/xeon/console.nix
../module/emulation.nix
../module/slurm-client.nix
../module/slurm-firewall.nix
../module/nvidia.nix
../module/power-policy.nix
];
power.policy = "always-on";
# Don't turn off on August as UPC has different dates.
# Fox works fine on power cuts.
systemd.timers.august-shutdown.enable = false;
# Select the this using the ID to avoid mismatches
boot.loader.grub.device = "/dev/disk/by-id/wwn-0x500a07514b0c1103";
@@ -16,25 +22,47 @@
swapDevices = lib.mkForce [];
boot.initrd.availableKernelModules = [ "xhci_pci" "ahci" "nvme" "usbhid" "usb_storage" "sd_mod" ];
boot.kernelModules = [ "kvm-amd" ];
boot.kernelModules = [ "kvm-amd" "amd_uncore" ];
hardware.cpu.amd.updateMicrocode = lib.mkDefault config.hardware.enableRedistributableFirmware;
hardware.cpu.intel.updateMicrocode = lib.mkForce false;
# Use performance for benchmarks
powerManagement.cpuFreqGovernor = "performance";
# Disable NUMA balancing
boot.kernel.sysctl."kernel.numa_balancing" = 0;
# Expose kernel addresses
boot.kernel.sysctl."kernel.kptr_restrict" = 0;
services.openssh.settings.X11Forwarding = true;
networking = {
timeServers = [ "ntp1.upc.edu" "ntp2.upc.edu" ];
hostName = "fox";
interfaces.enp1s0f0np0.ipv4.addresses = [ {
address = "10.0.40.26";
prefixLength = 24;
} ];
# UPC network (may change over time, use DHCP)
# Public IP configuration:
# - Hostname: fox.ac.upc.edu
# - IP: 147.83.30.141
# - Gateway: 147.83.30.130
# - NetMask: 255.255.255.192
# Private IP configuration for BMC:
# - Hostname: fox-ipmi.ac.upc.edu
# - IP: 147.83.35.27
# - Gateway: 147.83.35.2
# - NetMask: 255.255.255.0
interfaces.enp1s0f0np0.useDHCP = true;
};
# Configure Nvidia driver to use with CUDA
hardware.nvidia.package = config.boot.kernelPackages.nvidiaPackages.production;
hardware.graphics.enable = true;
nixpkgs.config.allowUnfree = true;
nixpkgs.config.nvidia.acceptLicense = true;
services.xserver.videoDrivers = [ "nvidia" ];
# Use hut for cache
nix.settings = {
extra-substituters = [ "https://jungle.bsc.es/cache" ];
extra-trusted-public-keys = [ "jungle.bsc.es:pEc7MlAT0HEwLQYPtpkPLwRsGf80ZI26aj29zMw/HH0=" ];
};
# Recommended for new graphics cards
hardware.nvidia.open = true;
# Mount NVME disks
fileSystems."/nvme0" = { device = "/dev/disk/by-label/nvme0"; fsType = "ext4"; };
@@ -56,20 +84,4 @@
wantedBy = [ "multi-user.target" ];
serviceConfig.ExecStart = script;
};
# Only allow SSH connections from users who have a SLURM allocation
# See: https://slurm.schedmd.com/pam_slurm_adopt.html
security.pam.services.sshd.rules.account.slurm = {
control = "required";
enable = true;
modulePath = "${pkgs.slurm}/lib/security/pam_slurm_adopt.so";
args = [ "log_level=debug5" ];
order = 999999; # Make it last one
};
# Disable systemd session (pam_systemd.so) as it will conflict with the
# pam_slurm_adopt.so module. What happens is that the shell is first adopted
# into the slurmstepd task and then into the systemd session, which is not
# what we want, otherwise it will linger even if all jobs are gone.
security.pam.services.sshd.startSession = lib.mkForce false;
}

View File

@@ -3,160 +3,12 @@ modules:
prober: http
timeout: 5s
http:
proxy_url: "http://127.0.0.1:23080"
skip_resolve_phase_with_proxy: true
follow_redirects: true
valid_status_codes: [] # Defaults to 2xx
method: GET
http_with_proxy:
prober: http
http:
proxy_url: "http://127.0.0.1:3128"
skip_resolve_phase_with_proxy: true
http_with_proxy_and_headers:
prober: http
http:
proxy_url: "http://127.0.0.1:3128"
proxy_connect_header:
Proxy-Authorization:
- Bearer token
http_post_2xx:
prober: http
timeout: 5s
http:
method: POST
headers:
Content-Type: application/json
body: '{}'
http_post_body_file:
prober: http
timeout: 5s
http:
method: POST
body_file: "/files/body.txt"
http_basic_auth_example:
prober: http
timeout: 5s
http:
method: POST
headers:
Host: "login.example.com"
basic_auth:
username: "username"
password: "mysecret"
http_2xx_oauth_client_credentials:
prober: http
timeout: 5s
http:
valid_http_versions: ["HTTP/1.1", "HTTP/2"]
follow_redirects: true
preferred_ip_protocol: "ip4"
valid_status_codes:
- 200
- 201
oauth2:
client_id: "client_id"
client_secret: "client_secret"
token_url: "https://api.example.com/token"
endpoint_params:
grant_type: "client_credentials"
http_custom_ca_example:
prober: http
http:
valid_status_codes: [] # Defaults to 2xx
method: GET
tls_config:
ca_file: "/certs/my_cert.crt"
http_gzip:
prober: http
http:
method: GET
compression: gzip
http_gzip_with_accept_encoding:
prober: http
http:
method: GET
compression: gzip
headers:
Accept-Encoding: gzip
tls_connect:
prober: tcp
timeout: 5s
tcp:
tls: true
tcp_connect_example:
prober: tcp
timeout: 5s
imap_starttls:
prober: tcp
timeout: 5s
tcp:
query_response:
- expect: "OK.*STARTTLS"
- send: ". STARTTLS"
- expect: "OK"
- starttls: true
- send: ". capability"
- expect: "CAPABILITY IMAP4rev1"
smtp_starttls:
prober: tcp
timeout: 5s
tcp:
query_response:
- expect: "^220 ([^ ]+) ESMTP (.+)$"
- send: "EHLO prober\r"
- expect: "^250-STARTTLS"
- send: "STARTTLS\r"
- expect: "^220"
- starttls: true
- send: "EHLO prober\r"
- expect: "^250-AUTH"
- send: "QUIT\r"
irc_banner_example:
prober: tcp
timeout: 5s
tcp:
query_response:
- send: "NICK prober"
- send: "USER prober prober prober :prober"
- expect: "PING :([^ ]+)"
send: "PONG ${1}"
- expect: "^:[^ ]+ 001"
icmp:
prober: icmp
timeout: 5s
icmp:
preferred_ip_protocol: "ip4"
dns_udp_example:
prober: dns
timeout: 5s
dns:
query_name: "www.prometheus.io"
query_type: "A"
valid_rcodes:
- NOERROR
validate_answer_rrs:
fail_if_matches_regexp:
- ".*127.0.0.1"
fail_if_all_match_regexp:
- ".*127.0.0.1"
fail_if_not_matches_regexp:
- "www.prometheus.io.\t300\tIN\tA\t127.0.0.1"
fail_if_none_matches_regexp:
- "127.0.0.1"
validate_authority_rrs:
fail_if_matches_regexp:
- ".*127.0.0.1"
validate_additional_rrs:
fail_if_matches_regexp:
- ".*127.0.0.1"
dns_soa:
prober: dns
dns:
query_name: "prometheus.io"
query_type: "SOA"
dns_tcp_example:
prober: dns
dns:
transport_protocol: "tcp" # defaults to "udp"
preferred_ip_protocol: "ip4" # defaults to "ip6"
query_name: "www.prometheus.io"

View File

@@ -2,7 +2,7 @@
{
imports = [
../common/xeon.nix
../common/ssf.nix
../module/ceph.nix
../module/debuginfod.nix

View File

@@ -3,7 +3,10 @@
{
imports = [
../module/slurm-exporter.nix
../module/meteocat-exporter.nix
../module/upc-qaire-exporter.nix
./gpfs-probe.nix
../module/nix-daemon-exporter.nix
];
age.secrets.grafanaJungleRobotPassword = {
@@ -108,6 +111,9 @@
"127.0.0.1:${toString config.services.prometheus.exporters.smartctl.port}"
"127.0.0.1:9341" # Slurm exporter
"127.0.0.1:9966" # GPFS custom exporter
"127.0.0.1:9999" # Nix-daemon custom exporter
"127.0.0.1:9929" # Meteocat custom exporter
"127.0.0.1:9928" # UPC Qaire custom exporter
"127.0.0.1:${toString config.services.prometheus.exporters.blackbox.port}"
];
}];
@@ -163,6 +169,9 @@
"8.8.8.8"
"ssfhead"
"anella-bsc.cesca.cat"
"upc-anella.cesca.cat"
"fox.ac.upc.edu"
"arenys5.ac.upc.edu"
];
}];
relabel_configs = [
@@ -258,17 +267,6 @@
}
];
}
{
job_name = "ipmi-fox";
metrics_path = "/ipmi";
static_configs = [
{ targets = [ "127.0.0.1:9290" ]; }
];
params = {
target = [ "fox-ipmi" ];
module = [ "fox" ];
};
}
];
};
}

View File

@@ -4,7 +4,7 @@
- xeon03-ipmi
- xeon04-ipmi
- koro-ipmi
- xeon06-ipmi
- weasel-ipmi
- hut-ipmi
- eudy-ipmi
# Storage

View File

@@ -2,7 +2,7 @@
{
imports = [
../common/xeon.nix
../common/ssf.nix
#(modulesPath + "/installer/netboot/netboot-minimal.nix")
../eudy/cpufreq.nix

View File

@@ -2,7 +2,7 @@
{
imports = [
../common/xeon.nix
../common/ssf.nix
../module/monitoring.nix
];

70
m/map.nix Normal file
View File

@@ -0,0 +1,70 @@
{
# In physical order from top to bottom (see note below)
ssf = {
# Switches for Ethernet and OmniPath
switch-C6-S1A-05 = { pos=42; size=1; model="Dell S3048-ON"; };
switch-opa = { pos=41; size=1; };
# SSF login
apex = { pos=39; size=2; label="SSFHEAD"; board="R2208WTTYSR"; contact="rodrigo.arias@bsc.es"; };
# Storage
bay = { pos=38; size=1; label="MDS01"; board="S2600WT2R"; sn="BQWL64850303"; contact="rodrigo.arias@bsc.es"; };
lake1 = { pos=37; size=1; label="OSS01"; board="S2600WT2R"; sn="BQWL64850234"; contact="rodrigo.arias@bsc.es"; };
lake2 = { pos=36; size=1; label="OSS02"; board="S2600WT2R"; sn="BQWL64850266"; contact="rodrigo.arias@bsc.es"; };
# Compute xeon
owl1 = { pos=35; size=1; label="SSF-XEON01"; board="S2600WTTR"; sn="BQWL64954172"; contact="rodrigo.arias@bsc.es"; };
owl2 = { pos=34; size=1; label="SSF-XEON02"; board="S2600WTTR"; sn="BQWL64756560"; contact="rodrigo.arias@bsc.es"; };
xeon03 = { pos=33; size=1; label="SSF-XEON03"; board="S2600WTTR"; sn="BQWL64750826"; contact="rodrigo.arias@bsc.es"; };
# Slot 34 empty
koro = { pos=31; size=1; label="SSF-XEON05"; board="S2600WTTR"; sn="BQWL64954293"; contact="rodrigo.arias@bsc.es"; };
weasel = { pos=30; size=1; label="SSF-XEON06"; board="S2600WTTR"; sn="BQWL64750846"; contact="antoni.navarro@bsc.es"; };
hut = { pos=29; size=1; label="SSF-XEON07"; board="S2600WTTR"; sn="BQWL64751184"; contact="rodrigo.arias@bsc.es"; };
eudy = { pos=28; size=1; label="SSF-XEON08"; board="S2600WTTR"; sn="BQWL64756586"; contact="aleix.rocanonell@bsc.es"; };
# 16 KNL nodes, 4 per chassis
knl01_04 = { pos=26; size=2; label="KNL01..KNL04"; board="HNS7200APX"; };
knl05_08 = { pos=24; size=2; label="KNL05..KNL18"; board="HNS7200APX"; };
knl09_12 = { pos=22; size=2; label="KNL09..KNL12"; board="HNS7200APX"; };
knl13_16 = { pos=20; size=2; label="KNL13..KNL16"; board="HNS7200APX"; };
# Slot 19 empty
# EPI (hw team, guessed order)
epi01 = { pos=18; size=1; contact="joan.cabre@bsc.es"; };
epi02 = { pos=17; size=1; contact="joan.cabre@bsc.es"; };
epi03 = { pos=16; size=1; contact="joan.cabre@bsc.es"; };
anon = { pos=14; size=2; }; # Unlabeled machine. Operative
# These are old and decommissioned (off)
power8 = { pos=12; size=2; label="BSCPOWER8N3"; decommissioned=true; };
powern1 = { pos=8; size=4; label="BSCPOWERN1"; decommissioned=true; };
gustafson = { pos=7; size=1; label="gustafson"; decommissioned=true; };
odap01 = { pos=3; size=4; label="ODAP01"; decommissioned=true; };
amhdal = { pos=2; size=1; label="AMHDAL"; decommissioned=true; }; # sic
moore = { pos=1; size=1; label="moore (earth)"; decommissioned=true; };
};
bsc2218 = {
raccoon = { board="W2600CR"; sn="QSIP22500829"; contact="rodrigo.arias@bsc.es"; };
tent = { label="SSF-XEON04"; board="S2600WTTR"; sn="BQWL64751229"; contact="rodrigo.arias@bsc.es"; };
};
upc = {
fox = { board="H13DSG-O-CPU"; sn="UM24CS600392"; prod="AS-4125GS-TNRT"; prod_sn="E508839X5103339"; contact="rodrigo.arias@bsc.es"; };
};
# NOTE: Position is specified in "U" units (44.45 mm) and starts at 1 from the
# bottom. Example:
#
# | ... | - [pos+size] <--- Label in chassis
# +--------+
# | node | - [pos+1]
# | 2U | - [pos]
# +------- +
# | ... | - [pos-1]
#
# NOTE: The board and sn refers to the FRU information (Board Product and
# Board Serial) via `ipmitool fru print 0`.
}

View File

@@ -4,7 +4,7 @@
# Don't add hut as a cache to itself
assert config.networking.hostName != "hut";
{
substituters = [ "http://hut/cache" ];
trusted-public-keys = [ "jungle.bsc.es:pEc7MlAT0HEwLQYPtpkPLwRsGf80ZI26aj29zMw/HH0=" ];
extra-substituters = [ "http://hut/cache" ];
extra-trusted-public-keys = [ "jungle.bsc.es:pEc7MlAT0HEwLQYPtpkPLwRsGf80ZI26aj29zMw/HH0=" ];
};
}

View File

@@ -0,0 +1,17 @@
{ config, lib, pkgs, ... }:
with lib;
{
systemd.services."prometheus-meteocat-exporter" = {
wantedBy = [ "multi-user.target" ];
after = [ "network.target" ];
serviceConfig = {
Restart = mkDefault "always";
PrivateTmp = mkDefault true;
WorkingDirectory = mkDefault "/tmp";
DynamicUser = mkDefault true;
ExecStart = "${pkgs.meteocat-exporter}/bin/meteocat-exporter";
};
};
}

26
m/module/nix-daemon-builds.sh Executable file
View File

@@ -0,0 +1,26 @@
#!/bin/sh
# Locate nix daemon pid
nd=$(pgrep -o nix-daemon)
# Locate children of nix-daemon
pids1=$(tr ' ' '\n' < "/proc/$nd/task/$nd/children")
# For each children, locate 2nd level children
pids2=$(echo "$pids1" | xargs -I @ /bin/sh -c 'cat /proc/@/task/*/children' | tr ' ' '\n')
cat <<EOF
HTTP/1.1 200 OK
Content-Type: text/plain; version=0.0.4; charset=utf-8; escaping=values
# HELP nix_daemon_build Nix daemon derivation build state.
# TYPE nix_daemon_build gauge
EOF
for pid in $pids2; do
name=$(cat /proc/$pid/environ 2>/dev/null | tr '\0' '\n' | rg "^name=(.+)" - --replace '$1' | tr -dc ' [:alnum:]_\-\.')
user=$(ps -o uname= -p "$pid")
if [ -n "$name" -a -n "$user" ]; then
printf 'nix_daemon_build{user="%s",name="%s"} 1\n' "$user" "$name"
fi
done

View File

@@ -0,0 +1,23 @@
{ pkgs, config, lib, ... }:
let
script = pkgs.runCommand "nix-daemon-exporter.sh" { }
''
cp ${./nix-daemon-builds.sh} $out;
chmod +x $out
''
;
in
{
systemd.services.nix-daemon-exporter = {
description = "Daemon to export nix-daemon metrics";
path = [ pkgs.procps pkgs.ripgrep ];
wantedBy = [ "default.target" ];
serviceConfig = {
Type = "simple";
ExecStart = "${pkgs.socat}/bin/socat TCP4-LISTEN:9999,fork EXEC:${script}";
# Needed root to read the environment, potentially unsafe
User = "root";
Group = "root";
};
};
}

20
m/module/nvidia.nix Normal file
View File

@@ -0,0 +1,20 @@
{ lib, config, pkgs, ... }:
{
# Configure Nvidia driver to use with CUDA
hardware.nvidia.package = config.boot.kernelPackages.nvidiaPackages.production;
hardware.nvidia.open = lib.mkDefault (builtins.abort "hardware.nvidia.open not set");
hardware.graphics.enable = true;
nixpkgs.config.nvidia.acceptLicense = true;
services.xserver.videoDrivers = [ "nvidia" ];
# enable support for derivations which require nvidia-gpu to be available
# > requiredSystemFeatures = [ "cuda" ];
programs.nix-required-mounts.enable = true;
programs.nix-required-mounts.presets.nvidia-gpu.enable = true;
# They forgot to add the symlink
programs.nix-required-mounts.allowedPatterns.nvidia-gpu.paths = [
config.systemd.tmpfiles.settings.graphics-driver."/run/opengl-driver"."L+".argument
];
environment.systemPackages = [ pkgs.cudainfo ];
}

68
m/module/p.nix Normal file
View File

@@ -0,0 +1,68 @@
{ config, lib, pkgs, ... }:
let
cfg = config.services.p;
in
{
options = {
services.p = {
enable = lib.mkOption {
type = lib.types.bool;
default = false;
description = "Whether to enable the p service.";
};
path = lib.mkOption {
type = lib.types.str;
default = "/var/lib/p";
description = "Where to save the pasted files on disk.";
};
url = lib.mkOption {
type = lib.types.str;
default = "https://jungle.bsc.es/p";
description = "URL prefix for the printed file.";
};
};
};
config = lib.mkIf cfg.enable {
environment.systemPackages = let
p = pkgs.writeShellScriptBin "p" ''
set -e
pastedir="${cfg.path}/$USER"
cd "$pastedir"
ext="txt"
if [ -n "$1" ]; then
ext="$1"
fi
out=$(mktemp "XXXXXXXX.$ext")
cat > "$out"
chmod go+r "$out"
echo "${cfg.url}/$USER/$out"
'';
in [ p ];
systemd.services.p = let
# Take only normal users
users = lib.filterAttrs (_: v: v.isNormalUser) config.users.users;
# Create a directory for each user
commands = lib.concatLists (lib.mapAttrsToList (_: user: [
"install -d -o ${user.name} -g ${user.group} -m 0755 ${cfg.path}/${user.name}"
]) users);
in {
description = "P service setup";
requires = [ "network-online.target" ];
#wants = [ "remote-fs.target" ];
#after = [ "remote-fs.target" ];
wantedBy = [ "multi-user.target" ];
serviceConfig = {
ExecStart = pkgs.writeShellScript "p-init.sh" (''
install -d -o root -g root -m 0755 ${cfg.path}
'' + (lib.concatLines commands));
};
};
};
}

33
m/module/power-policy.nix Normal file
View File

@@ -0,0 +1,33 @@
{ config, lib, pkgs, ... }:
with lib;
let
cfg = config.power.policy;
in
{
options = {
power.policy = mkOption {
type = types.nullOr (types.enum [ "always-on" "previous" "always-off" ]);
default = null;
description = "Set power policy to use via IPMI.";
};
};
config = mkIf (cfg != null) {
systemd.services."power-policy" = {
description = "Set power policy to use via IPMI";
wantedBy = [ "multi-user.target" ];
unitConfig = {
StartLimitBurst = "10";
StartLimitIntervalSec = "10m";
};
serviceConfig = {
ExecStart = "${pkgs.ipmitool}/bin/ipmitool chassis policy ${cfg}";
Type = "oneshot";
Restart = "on-failure";
RestartSec = "5s";
};
};
};
}

View File

@@ -43,13 +43,11 @@ in {
clusterName = "jungle";
nodeName = [
"owl[1,2] Sockets=2 CoresPerSocket=14 ThreadsPerCore=2 Feature=owl"
"fox Sockets=2 CoresPerSocket=96 ThreadsPerCore=1 Feature=fox"
"hut Sockets=2 CoresPerSocket=14 ThreadsPerCore=2"
];
partitionName = [
"owl Nodes=owl[1-2] Default=YES DefaultTime=01:00:00 MaxTime=INFINITE State=UP"
"fox Nodes=fox Default=NO DefaultTime=01:00:00 MaxTime=INFINITE State=UP"
];
# See slurm.conf(5) for more details about these options.
@@ -77,7 +75,7 @@ in {
SuspendTimeout=60
ResumeProgram=${resumeProgram}
ResumeTimeout=300
SuspendExcNodes=hut,fox
SuspendExcNodes=hut
# Turn the nodes off after 1 hour of inactivity
SuspendTime=3600

View File

@@ -0,0 +1,8 @@
{
programs.ssh.extraConfig = ''
Host apex ssfhead
HostName ssflogin.bsc.es
Host hut
ProxyJump apex
'';
}

View File

@@ -0,0 +1,17 @@
{ config, lib, pkgs, ... }:
with lib;
{
systemd.services."prometheus-upc-qaire-exporter" = {
wantedBy = [ "multi-user.target" ];
after = [ "network.target" ];
serviceConfig = {
Restart = mkDefault "always";
PrivateTmp = mkDefault true;
WorkingDirectory = mkDefault "/tmp";
DynamicUser = mkDefault true;
ExecStart = "${pkgs.upc-qaire-exporter}/bin/upc-qaire-exporter";
};
};
}

35
m/module/vpn-dac.nix Normal file
View File

@@ -0,0 +1,35 @@
{config, ...}:
{
age.secrets.vpn-dac-login.file = ../../secrets/vpn-dac-login.age;
age.secrets.vpn-dac-client-key.file = ../../secrets/vpn-dac-client-key.age;
services.openvpn.servers = {
# systemctl status openvpn-dac.service
dac = {
config = ''
client
dev tun
proto tcp
remote vpn.ac.upc.edu 1194
remote vpn.ac.upc.edu 80
resolv-retry infinite
nobind
persist-key
persist-tun
ca ${./vpn-dac/ca.crt}
cert ${./vpn-dac/client.crt}
# Only key needs to be secret
key ${config.age.secrets.vpn-dac-client-key.path}
remote-cert-tls server
comp-lzo
verb 3
auth-user-pass ${config.age.secrets.vpn-dac-login.path}
reneg-sec 0
# Only route fox-ipmi
pull-filter ignore "route "
route 147.83.35.27 255.255.255.255
'';
};
};
}

31
m/module/vpn-dac/ca.crt Normal file
View File

@@ -0,0 +1,31 @@
-----BEGIN CERTIFICATE-----
MIIFUjCCBDqgAwIBAgIJAJH118PApk5hMA0GCSqGSIb3DQEBCwUAMIHLMQswCQYD
VQQGEwJFUzESMBAGA1UECBMJQmFyY2Vsb25hMRIwEAYDVQQHEwlCYXJjZWxvbmEx
LTArBgNVBAoTJFVuaXZlcnNpdGF0IFBvbGl0ZWNuaWNhIGRlIENhdGFsdW55YTEk
MCIGA1UECxMbQXJxdWl0ZWN0dXJhIGRlIENvbXB1dGFkb3JzMRAwDgYDVQQDEwdM
Q0FDIENBMQ0wCwYDVQQpEwRMQ0FDMR4wHAYJKoZIhvcNAQkBFg9sY2FjQGFjLnVw
Yy5lZHUwHhcNMTYwMTEyMTI0NDIxWhcNNDYwMTEyMTI0NDIxWjCByzELMAkGA1UE
BhMCRVMxEjAQBgNVBAgTCUJhcmNlbG9uYTESMBAGA1UEBxMJQmFyY2Vsb25hMS0w
KwYDVQQKEyRVbml2ZXJzaXRhdCBQb2xpdGVjbmljYSBkZSBDYXRhbHVueWExJDAi
BgNVBAsTG0FycXVpdGVjdHVyYSBkZSBDb21wdXRhZG9yczEQMA4GA1UEAxMHTENB
QyBDQTENMAsGA1UEKRMETENBQzEeMBwGCSqGSIb3DQEJARYPbGNhY0BhYy51cGMu
ZWR1MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA0CteSeof7Xwi51kC
F0nQ4E9iR5Lq7wtfRuVPn6JJcIxJJ6+F9gr4R/HIHTztW4XAzReE36DYfexupx3D
6UgQIkMLlVyGqRbulNF+RnCx20GosF7Dm4RGBVvOxBP1PGjYq/A+XhaaDAFd0cOF
LMNkzuYP7PF0bnBEaHnxmN8bPmuyDyas7fK9AAc3scyWT2jSBPbOVFvCJwPg8MH9
V/h+hKwL/7hRt1MVfVv2qyIuKwTki8mUt0RcVbP7oJoRY5K1+R52phIz/GL/b4Fx
L6MKXlQxLi8vzP4QZXgCMyV7oFNdU3VqCEXBA11YIRvsOZ4QS19otIk/ZWU5x+HH
LAIJ7wIDAQABo4IBNTCCATEwHQYDVR0OBBYEFNyezX1cH1N4QR14ebBpljqmtE7q
MIIBAAYDVR0jBIH4MIH1gBTcns19XB9TeEEdeHmwaZY6prRO6qGB0aSBzjCByzEL
MAkGA1UEBhMCRVMxEjAQBgNVBAgTCUJhcmNlbG9uYTESMBAGA1UEBxMJQmFyY2Vs
b25hMS0wKwYDVQQKEyRVbml2ZXJzaXRhdCBQb2xpdGVjbmljYSBkZSBDYXRhbHVu
eWExJDAiBgNVBAsTG0FycXVpdGVjdHVyYSBkZSBDb21wdXRhZG9yczEQMA4GA1UE
AxMHTENBQyBDQTENMAsGA1UEKRMETENBQzEeMBwGCSqGSIb3DQEJARYPbGNhY0Bh
Yy51cGMuZWR1ggkAkfXXw8CmTmEwDAYDVR0TBAUwAwEB/zANBgkqhkiG9w0BAQsF
AAOCAQEAUAmOvVXIQrR+aZVO0bOTeugKBHB75eTIZSIHIn2oDUvDbAP5GXIJ56A1
6mZXxemSMY8/9k+pRcwJhfat3IgvAN159XSqf9kRv0NHgc3FWUI1Qv/BsAn0vJO/
oK0dbmbbRWqt86qNrCN+cUfz5aovvxN73jFfnvfDQFBk/8enj9wXxYfokjjLPR1Q
+oTkH8dY68qf71oaUB9MndppPEPSz0K1S6h1XxvJoSu9MVSXOQHiq1cdZdxRazI3
4f7q9sTCL+khwDAuZxAYzlEYxFFa/NN8PWU6xPw6V+t/aDhOiXUPJQB/O/K7mw3Z
TQQx5NqM7B5jjak5fauR3/oRD8XXsA==
-----END CERTIFICATE-----

100
m/module/vpn-dac/client.crt Normal file
View File

@@ -0,0 +1,100 @@
Certificate:
Data:
Version: 3 (0x2)
Serial Number: 2 (0x2)
Signature Algorithm: sha256WithRSAEncryption
Issuer: C=ES, ST=Barcelona, L=Barcelona, O=Universitat Politecnica de Catalunya, OU=Arquitectura de Computadors, CN=LCAC CA/name=LCAC/emailAddress=lcac@ac.upc.edu
Validity
Not Before: Jan 12 12:45:41 2016 GMT
Not After : Jan 12 12:45:41 2046 GMT
Subject: C=ES, ST=Barcelona, L=Barcelona, O=Universitat Politecnica de Catalunya, OU=Arquitectura de Computadors, CN=client/name=LCAC/emailAddress=lcac@ac.upc.edu
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
Public-Key: (2048 bit)
Modulus:
00:97:99:fa:7a:0e:4d:e2:1d:a5:b1:a8:14:18:64:
c7:66:bf:de:99:1d:92:3b:86:82:4d:95:39:f7:a6:
56:49:97:14:4f:e3:37:00:6c:f4:d0:1d:56:79:e7:
19:b5:dd:36:15:8e:1d:57:7b:59:29:d2:11:bf:58:
48:e0:f7:41:3d:16:64:8d:a2:0b:4a:ac:fa:c6:83:
dc:10:2a:2c:d9:97:48:ee:11:2a:bc:4b:60:dd:b9:
2e:8f:45:ca:87:0b:38:65:1c:f8:a2:1d:f9:50:aa:
6e:60:f9:48:df:57:12:23:e1:e7:0c:81:5c:9f:c5:
b2:e6:99:99:95:30:6d:57:36:06:8c:fd:fb:f9:4f:
60:d2:3c:ba:ae:28:56:2f:da:58:5c:e8:c5:7b:ec:
76:d9:28:6e:fb:8c:07:f9:d7:23:c3:72:76:3c:fa:
dc:20:67:8f:cc:16:e0:91:07:d5:68:f9:20:4d:7d:
5c:2d:02:04:16:76:52:f3:53:be:a3:dc:0d:d5:fb:
6b:55:29:f3:52:35:c8:7d:99:d1:4a:94:be:b1:8e:
fd:85:18:25:eb:41:e9:56:da:af:62:84:20:0a:00:
17:94:92:94:91:6a:f8:54:37:17:ee:1e:bb:fb:93:
71:91:d9:e4:e9:b8:3b:18:7d:6d:7d:4c:ce:58:55:
f9:41
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Basic Constraints:
CA:FALSE
Netscape Comment:
Easy-RSA Generated Certificate
X509v3 Subject Key Identifier:
1B:88:06:D5:33:1D:5C:48:46:B5:DE:78:89:36:96:91:3A:74:43:18
X509v3 Authority Key Identifier:
keyid:DC:9E:CD:7D:5C:1F:53:78:41:1D:78:79:B0:69:96:3A:A6:B4:4E:EA
DirName:/C=ES/ST=Barcelona/L=Barcelona/O=Universitat Politecnica de Catalunya/OU=Arquitectura de Computadors/CN=LCAC CA/name=LCAC/emailAddress=lcac@ac.upc.edu
serial:91:F5:D7:C3:C0:A6:4E:61
X509v3 Extended Key Usage:
TLS Web Client Authentication
X509v3 Key Usage:
Digital Signature
X509v3 Subject Alternative Name:
DNS:client
Signature Algorithm: sha256WithRSAEncryption
42:e8:50:b2:e7:88:75:86:0b:bb:29:e3:aa:c6:0e:4c:e8:ea:
3d:0c:02:31:7f:3b:80:0c:3f:80:af:45:d6:62:27:a0:0e:e7:
26:09:12:97:95:f8:d9:9b:89:b5:ef:56:64:f1:de:82:74:e0:
31:0a:cc:90:0a:bd:50:b8:54:95:0a:ae:3b:40:df:76:b6:d1:
01:2e:f3:96:9f:52:d4:e9:14:6d:b7:14:9d:45:99:33:36:2a:
01:0b:15:1a:ed:55:dc:64:83:65:1a:06:42:d9:c7:dc:97:d4:
02:81:c2:58:2b:ea:e4:b7:ae:84:3a:e4:3f:f1:2e:fa:ec:f3:
40:5d:b8:6a:d5:5e:e1:e8:2f:e2:2f:48:a4:38:a1:4f:22:e3:
4f:66:94:aa:02:78:9a:2b:7a:5d:aa:aa:51:a5:e3:d0:91:e9:
1d:f9:08:ed:8b:51:c9:a6:af:46:85:b5:1c:ed:12:a1:28:33:
75:36:00:d8:5c:14:65:96:c0:28:7d:47:50:a4:89:5f:b0:72:
1a:4b:13:17:26:0f:f0:b8:65:3c:e9:96:36:f9:bf:90:59:33:
87:1f:01:03:25:f8:f0:3a:9b:33:02:d0:0a:43:b5:0a:cf:62:
a1:45:38:37:07:9d:9c:94:0b:31:c6:3c:34:b7:fc:5a:0c:e4:
bf:23:f6:7d
-----BEGIN CERTIFICATE-----
MIIFqjCCBJKgAwIBAgIBAjANBgkqhkiG9w0BAQsFADCByzELMAkGA1UEBhMCRVMx
EjAQBgNVBAgTCUJhcmNlbG9uYTESMBAGA1UEBxMJQmFyY2Vsb25hMS0wKwYDVQQK
EyRVbml2ZXJzaXRhdCBQb2xpdGVjbmljYSBkZSBDYXRhbHVueWExJDAiBgNVBAsT
G0FycXVpdGVjdHVyYSBkZSBDb21wdXRhZG9yczEQMA4GA1UEAxMHTENBQyBDQTEN
MAsGA1UEKRMETENBQzEeMBwGCSqGSIb3DQEJARYPbGNhY0BhYy51cGMuZWR1MB4X
DTE2MDExMjEyNDU0MVoXDTQ2MDExMjEyNDU0MVowgcoxCzAJBgNVBAYTAkVTMRIw
EAYDVQQIEwlCYXJjZWxvbmExEjAQBgNVBAcTCUJhcmNlbG9uYTEtMCsGA1UEChMk
VW5pdmVyc2l0YXQgUG9saXRlY25pY2EgZGUgQ2F0YWx1bnlhMSQwIgYDVQQLExtB
cnF1aXRlY3R1cmEgZGUgQ29tcHV0YWRvcnMxDzANBgNVBAMTBmNsaWVudDENMAsG
A1UEKRMETENBQzEeMBwGCSqGSIb3DQEJARYPbGNhY0BhYy51cGMuZWR1MIIBIjAN
BgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAl5n6eg5N4h2lsagUGGTHZr/emR2S
O4aCTZU596ZWSZcUT+M3AGz00B1WeecZtd02FY4dV3tZKdIRv1hI4PdBPRZkjaIL
Sqz6xoPcECos2ZdI7hEqvEtg3bkuj0XKhws4ZRz4oh35UKpuYPlI31cSI+HnDIFc
n8Wy5pmZlTBtVzYGjP37+U9g0jy6rihWL9pYXOjFe+x22Shu+4wH+dcjw3J2PPrc
IGePzBbgkQfVaPkgTX1cLQIEFnZS81O+o9wN1ftrVSnzUjXIfZnRSpS+sY79hRgl
60HpVtqvYoQgCgAXlJKUkWr4VDcX7h67+5Nxkdnk6bg7GH1tfUzOWFX5QQIDAQAB
o4IBljCCAZIwCQYDVR0TBAIwADAtBglghkgBhvhCAQ0EIBYeRWFzeS1SU0EgR2Vu
ZXJhdGVkIENlcnRpZmljYXRlMB0GA1UdDgQWBBQbiAbVMx1cSEa13niJNpaROnRD
GDCCAQAGA1UdIwSB+DCB9YAU3J7NfVwfU3hBHXh5sGmWOqa0TuqhgdGkgc4wgcsx
CzAJBgNVBAYTAkVTMRIwEAYDVQQIEwlCYXJjZWxvbmExEjAQBgNVBAcTCUJhcmNl
bG9uYTEtMCsGA1UEChMkVW5pdmVyc2l0YXQgUG9saXRlY25pY2EgZGUgQ2F0YWx1
bnlhMSQwIgYDVQQLExtBcnF1aXRlY3R1cmEgZGUgQ29tcHV0YWRvcnMxEDAOBgNV
BAMTB0xDQUMgQ0ExDTALBgNVBCkTBExDQUMxHjAcBgkqhkiG9w0BCQEWD2xjYWNA
YWMudXBjLmVkdYIJAJH118PApk5hMBMGA1UdJQQMMAoGCCsGAQUFBwMCMAsGA1Ud
DwQEAwIHgDARBgNVHREECjAIggZjbGllbnQwDQYJKoZIhvcNAQELBQADggEBAELo
ULLniHWGC7sp46rGDkzo6j0MAjF/O4AMP4CvRdZiJ6AO5yYJEpeV+NmbibXvVmTx
3oJ04DEKzJAKvVC4VJUKrjtA33a20QEu85afUtTpFG23FJ1FmTM2KgELFRrtVdxk
g2UaBkLZx9yX1AKBwlgr6uS3roQ65D/xLvrs80BduGrVXuHoL+IvSKQ4oU8i409m
lKoCeJorel2qqlGl49CR6R35CO2LUcmmr0aFtRztEqEoM3U2ANhcFGWWwCh9R1Ck
iV+wchpLExcmD/C4ZTzpljb5v5BZM4cfAQMl+PA6mzMC0ApDtQrPYqFFODcHnZyU
CzHGPDS3/FoM5L8j9n0=
-----END CERTIFICATE-----

View File

@@ -2,7 +2,7 @@
{
imports = [
../common/xeon.nix
../common/ssf.nix
../module/ceph.nix
../module/emulation.nix
../module/slurm-client.nix

View File

@@ -2,7 +2,7 @@
{
imports = [
../common/xeon.nix
../common/ssf.nix
../module/ceph.nix
../module/emulation.nix
../module/slurm-client.nix

View File

@@ -3,8 +3,16 @@
{
imports = [
../common/base.nix
../module/emulation.nix
../module/debuginfod.nix
../module/ssh-hut-extern.nix
../module/nvidia.nix
../module/power-policy.nix
../eudy/kernel/perf.nix
];
power.policy = "always-on";
# Don't install Grub on the disk yet
boot.loader.grub.device = "nodev";
@@ -23,19 +31,38 @@
address = "84.88.51.152";
prefixLength = 25;
} ];
interfaces.enp5s0f1.ipv4.addresses = [ {
address = "10.0.44.1";
prefixLength = 24;
} ];
nat = {
enable = true;
internalInterfaces = [ "enp5s0f1" ];
externalInterface = "eno0";
};
hosts = {
"10.0.44.4" = [ "tent" ];
};
};
nix.settings = {
substituters = [ "https://jungle.bsc.es/cache" ];
trusted-public-keys = [ "jungle.bsc.es:pEc7MlAT0HEwLQYPtpkPLwRsGf80ZI26aj29zMw/HH0=" ];
extra-substituters = [ "https://jungle.bsc.es/cache" ];
extra-trusted-public-keys = [ "jungle.bsc.es:pEc7MlAT0HEwLQYPtpkPLwRsGf80ZI26aj29zMw/HH0=" ];
};
# Configure Nvidia driver to use with CUDA
hardware.nvidia.package = config.boot.kernelPackages.nvidiaPackages.production;
hardware.graphics.enable = true;
nixpkgs.config.allowUnfree = true;
nixpkgs.config.nvidia.acceptLicense = true;
services.xserver.videoDrivers = [ "nvidia" ];
# Enable performance governor
powerManagement.cpuFreqGovernor = "performance";
hardware.nvidia.open = false; # Maxwell is older than Turing architecture
services.openssh.settings.X11Forwarding = true;
services.prometheus.exporters.node = {
enable = true;
enabledCollectors = [ "systemd" ];
port = 9002;
listenAddress = "127.0.0.1";
};
users.motd = ''

14
m/tent/blackbox.yml Normal file
View File

@@ -0,0 +1,14 @@
modules:
http_2xx:
prober: http
timeout: 5s
http:
preferred_ip_protocol: "ip4"
follow_redirects: true
valid_status_codes: [] # Defaults to 2xx
method: GET
icmp:
prober: icmp
timeout: 5s
icmp:
preferred_ip_protocol: "ip4"

80
m/tent/configuration.nix Normal file
View File

@@ -0,0 +1,80 @@
{ config, pkgs, lib, ... }:
{
imports = [
../common/xeon.nix
../module/emulation.nix
../module/debuginfod.nix
../module/ssh-hut-extern.nix
./monitoring.nix
./nginx.nix
./nix-serve.nix
./gitlab-runner.nix
./gitea.nix
../hut/public-inbox.nix
../hut/msmtp.nix
../module/p.nix
../module/vpn-dac.nix
];
# Select the this using the ID to avoid mismatches
boot.loader.grub.device = "/dev/disk/by-id/wwn-0x55cd2e414d537675";
networking = {
hostName = "tent";
interfaces.eno1.ipv4.addresses = [
{
address = "10.0.44.4";
prefixLength = 24;
}
];
# Only BSC DNSs seem to be reachable from the office VLAN
nameservers = [ "84.88.52.35" "84.88.52.36" ];
search = [ "bsc.es" "ac.upc.edu" ];
defaultGateway = "10.0.44.1";
};
services.p.enable = true;
services.prometheus.exporters.node = {
enable = true;
enabledCollectors = [ "systemd" ];
port = 9002;
listenAddress = "127.0.0.1";
};
boot.swraid = {
enable = true;
mdadmConf = ''
DEVICE partitions
ARRAY /dev/md0 metadata=1.2 UUID=496db1e2:056a92aa:a544543f:40db379d
MAILADDR root
'';
};
fileSystems."/vault" = {
device = "/dev/disk/by-label/vault";
fsType = "ext4";
};
# Make a /vault/$USER directory for each user.
systemd.services.create-vault-dirs = let
# Take only normal users in tent
users = lib.filterAttrs (_: v: v.isNormalUser) config.users.users;
commands = lib.concatLists (lib.mapAttrsToList
(_: user: [
"install -d -o ${user.name} -g ${user.group} -m 0711 /vault/home/${user.name}"
]) users);
script = pkgs.writeShellScript "create-vault-dirs.sh" (lib.concatLines commands);
in {
enable = true;
wants = [ "local-fs.target" ];
after = [ "local-fs.target" ];
wantedBy = [ "multi-user.target" ];
serviceConfig.ExecStart = script;
};
# disable automatic garbage collector
nix.gc.automatic = lib.mkForce false;
}

30
m/tent/gitea.nix Normal file
View File

@@ -0,0 +1,30 @@
{ config, lib, ... }:
{
services.gitea = {
enable = true;
appName = "Gitea in the jungle";
settings = {
server = {
ROOT_URL = "https://jungle.bsc.es/git/";
LOCAL_ROOT_URL = "https://jungle.bsc.es/git/";
LANDING_PAGE = "explore";
};
metrics.ENABLED = true;
service = {
DISABLE_REGISTRATION = true;
REGISTER_MANUAL_CONFIRM = true;
ENABLE_NOTIFY_MAIL = true;
};
log.LEVEL = "Warn";
mailer = {
ENABLED = true;
FROM = "jungle-robot@bsc.es";
PROTOCOL = "sendmail";
SENDMAIL_PATH = "/run/wrappers/bin/sendmail";
SENDMAIL_ARGS = "--";
};
};
};
}

93
m/tent/gitlab-runner.nix Normal file
View File

@@ -0,0 +1,93 @@
{ pkgs, lib, config, ... }:
{
age.secrets.tent-gitlab-runner-pm-shell.file = ../../secrets/tent-gitlab-runner-pm-shell-token.age;
age.secrets.tent-gitlab-runner-pm-docker.file = ../../secrets/tent-gitlab-runner-pm-docker-token.age;
age.secrets.tent-gitlab-runner-bsc-docker.file = ../../secrets/tent-gitlab-runner-bsc-docker-token.age;
services.gitlab-runner = let sec = config.age.secrets; in {
enable = true;
settings.concurrent = 5;
services = {
# For gitlab.pm.bsc.es
gitlab-pm-shell = {
executor = "shell";
environmentVariables = {
SHELL = "${pkgs.bash}/bin/bash";
};
authenticationTokenConfigFile = sec.tent-gitlab-runner-pm-shell.path;
preGetSourcesScript = pkgs.writeScript "setup" ''
echo "This is the preGetSources script running, brace for impact"
env
'';
};
gitlab-pm-docker = {
authenticationTokenConfigFile = sec.tent-gitlab-runner-pm-docker.path;
executor = "docker";
dockerImage = "debian:stable";
};
# For gitlab.bsc.es
gitlab-bsc-docker = {
# gitlab.bsc.es still uses the old token mechanism
registrationConfigFile = sec.tent-gitlab-runner-bsc-docker.path;
tagList = [ "docker" "tent" "nix" ];
executor = "docker";
dockerImage = "alpine";
dockerVolumes = [
"/nix/store:/nix/store:ro"
"/nix/var/nix/db:/nix/var/nix/db:ro"
"/nix/var/nix/daemon-socket:/nix/var/nix/daemon-socket:ro"
];
dockerDisableCache = true;
registrationFlags = [
# Increase build log length to 64 MiB
"--output-limit 65536"
];
preBuildScript = pkgs.writeScript "setup-container" ''
mkdir -p -m 0755 /nix/var/log/nix/drvs
mkdir -p -m 0755 /nix/var/nix/gcroots
mkdir -p -m 0755 /nix/var/nix/profiles
mkdir -p -m 0755 /nix/var/nix/temproots
mkdir -p -m 0755 /nix/var/nix/userpool
mkdir -p -m 1777 /nix/var/nix/gcroots/per-user
mkdir -p -m 1777 /nix/var/nix/profiles/per-user
mkdir -p -m 0755 /nix/var/nix/profiles/per-user/root
mkdir -p -m 0700 "$HOME/.nix-defexpr"
mkdir -p -m 0700 "$HOME/.ssh"
cat >> "$HOME/.ssh/known_hosts" << EOF
bscpm04.bsc.es ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPx4mC0etyyjYUT2Ztc/bs4ZXSbVMrogs1ZTP924PDgT
gitlab-internal.bsc.es ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIF9arsAOSRB06hdy71oTvJHG2Mg8zfebADxpvc37lZo3
EOF
. ${pkgs.nix}/etc/profile.d/nix-daemon.sh
# Required to load SSL certificate paths
. ${pkgs.cacert}/nix-support/setup-hook
'';
environmentVariables = {
ENV = "/etc/profile";
USER = "root";
NIX_REMOTE = "daemon";
PATH = "${config.system.path}/bin:/bin:/sbin:/usr/bin:/usr/sbin";
};
};
};
};
systemd.services.gitlab-runner.serviceConfig = {
DynamicUser = lib.mkForce false;
User = "gitlab-runner";
Group = "gitlab-runner";
ExecStart = lib.mkForce
''${pkgs.gitlab-runner}/bin/gitlab-runner run --config ''${HOME}/.gitlab-runner/config.toml --listen-address "127.0.0.1:9252" --working-directory ''${HOME}'';
};
users.users.gitlab-runner = {
uid = config.ids.uids.gitlab-runner;
home = "/var/lib/gitlab-runner";
description = "Gitlab Runner";
group = "gitlab-runner";
extraGroups = [ "docker" ];
createHome = true;
};
users.groups.gitlab-runner.gid = config.ids.gids.gitlab-runner;
}

217
m/tent/monitoring.nix Normal file
View File

@@ -0,0 +1,217 @@
{ config, lib, pkgs, ... }:
{
imports = [
../module/meteocat-exporter.nix
../module/upc-qaire-exporter.nix
../module/nix-daemon-exporter.nix
];
age.secrets.grafanaJungleRobotPassword = {
file = ../../secrets/jungle-robot-password.age;
owner = "grafana";
mode = "400";
};
services.grafana = {
enable = true;
settings = {
server = {
domain = "jungle.bsc.es";
root_url = "%(protocol)s://%(domain)s/grafana";
serve_from_sub_path = true;
http_port = 2342;
http_addr = "127.0.0.1";
};
smtp = {
enabled = true;
from_address = "jungle-robot@bsc.es";
user = "jungle-robot";
# Read the password from a file, which is only readable by grafana user
# https://grafana.com/docs/grafana/latest/setup-grafana/configure-grafana/#file-provider
password = "$__file{${config.age.secrets.grafanaJungleRobotPassword.path}}";
host = "mail.bsc.es:465";
startTLS_policy = "NoStartTLS";
};
feature_toggles.publicDashboards = true;
"auth.anonymous".enabled = true;
log.level = "warn";
};
};
services.prometheus = {
enable = true;
port = 9001;
retentionTime = "5y";
listenAddress = "127.0.0.1";
};
# We need access to the devices to monitor the disk space
systemd.services.prometheus-node-exporter.serviceConfig.PrivateDevices = lib.mkForce false;
systemd.services.prometheus-node-exporter.serviceConfig.ProtectHome = lib.mkForce "read-only";
# Credentials for IPMI exporter
age.secrets.ipmiYml = {
file = ../../secrets/ipmi.yml.age;
owner = "ipmi-exporter";
};
# Create an IPMI group and assign the ipmi0 device
users.groups.ipmi = {};
services.udev.extraRules = ''
SUBSYSTEM=="ipmi", KERNEL=="ipmi0", GROUP="ipmi", MODE="0660"
'';
# Add a new ipmi-exporter user that can read the ipmi0 device
users.users.ipmi-exporter = {
isSystemUser = true;
group = "ipmi";
};
# Disable dynamic user so we have the ipmi-exporter user available for the credentials
systemd.services.prometheus-ipmi-exporter.serviceConfig = {
DynamicUser = lib.mkForce false;
PrivateDevices = lib.mkForce false;
User = lib.mkForce "ipmi-exporter";
Group = lib.mkForce "ipmi";
RestrictNamespaces = lib.mkForce false;
# Fake uid to 0 so it shuts up
ExecStart = let
cfg = config.services.prometheus.exporters.ipmi;
in lib.mkForce (lib.concatStringsSep " " ([
"${pkgs.util-linux}/bin/unshare --map-user 0"
"${pkgs.prometheus-ipmi-exporter}/bin/ipmi_exporter"
"--web.listen-address ${cfg.listenAddress}:${toString cfg.port}"
"--config.file ${lib.escapeShellArg cfg.configFile}"
] ++ cfg.extraFlags));
};
services.prometheus = {
exporters = {
ipmi = {
enable = true;
configFile = config.age.secrets.ipmiYml.path;
#extraFlags = [ "--log.level=debug" ];
listenAddress = "127.0.0.1";
};
node = {
enable = true;
enabledCollectors = [ "logind" ];
port = 9002;
listenAddress = "127.0.0.1";
};
blackbox = {
enable = true;
listenAddress = "127.0.0.1";
configFile = ./blackbox.yml;
};
};
scrapeConfigs = [
{
job_name = "local";
static_configs = [{
targets = [
"127.0.0.1:9002" # Node exporter
#"127.0.0.1:9115" # Blackbox exporter
"127.0.0.1:9290" # IPMI exporter for local node
"127.0.0.1:9928" # UPC Qaire custom exporter
"127.0.0.1:9929" # Meteocat custom exporter
"127.0.0.1:9999" # Nix-daemon custom exporter
];
}];
}
{
job_name = "blackbox-http";
metrics_path = "/probe";
params = { module = [ "http_2xx" ]; };
static_configs = [{
targets = [
"https://www.google.com/robots.txt"
"https://pm.bsc.es/"
"https://pm.bsc.es/gitlab/"
"https://jungle.bsc.es/"
"https://gitlab.bsc.es/"
];
}];
relabel_configs = [
{
# Takes the address and sets it in the "target=<xyz>" URL parameter
source_labels = [ "__address__" ];
target_label = "__param_target";
}
{
# Sets the "instance" label with the remote host we are querying
source_labels = [ "__param_target" ];
target_label = "instance";
}
{
# Shows the host target address instead of the blackbox address
target_label = "__address__";
replacement = "127.0.0.1:9115";
}
];
}
{
job_name = "blackbox-icmp";
metrics_path = "/probe";
params = { module = [ "icmp" ]; };
static_configs = [{
targets = [
"1.1.1.1"
"8.8.8.8"
"ssfhead"
"raccoon"
"anella-bsc.cesca.cat"
"upc-anella.cesca.cat"
"fox.ac.upc.edu"
"fox-ipmi.ac.upc.edu"
"arenys5.ac.upc.edu"
"arenys0-2.ac.upc.edu"
"epi01.bsc.es"
"axle.bsc.es"
];
}];
relabel_configs = [
{
# Takes the address and sets it in the "target=<xyz>" URL parameter
source_labels = [ "__address__" ];
target_label = "__param_target";
}
{
# Sets the "instance" label with the remote host we are querying
source_labels = [ "__param_target" ];
target_label = "instance";
}
{
# Shows the host target address instead of the blackbox address
target_label = "__address__";
replacement = "127.0.0.1:9115";
}
];
}
{
job_name = "ipmi-raccoon";
metrics_path = "/ipmi";
static_configs = [
{ targets = [ "127.0.0.1:9290" ]; }
];
params = {
target = [ "raccoon-ipmi" ];
module = [ "raccoon" ];
};
}
{
job_name = "ipmi-fox";
metrics_path = "/ipmi";
static_configs = [
{ targets = [ "127.0.0.1:9290" ]; }
];
params = {
target = [ "fox-ipmi.ac.upc.edu" ];
module = [ "fox" ];
};
}
];
};
}

73
m/tent/nginx.nix Normal file
View File

@@ -0,0 +1,73 @@
{ theFlake, pkgs, ... }:
let
website = pkgs.stdenv.mkDerivation {
name = "jungle-web";
src = theFlake;
buildInputs = [ pkgs.hugo ];
buildPhase = ''
cd web
rm -rf public/
hugo
'';
installPhase = ''
cp -r public $out
'';
# Don't mess doc/
dontFixup = true;
};
in
{
networking.firewall.allowedTCPPorts = [ 80 ];
services.nginx = {
enable = true;
virtualHosts."jungle.bsc.es" = {
root = "${website}";
listen = [
{
addr = "0.0.0.0";
port = 80;
}
];
extraConfig = ''
set_real_ip_from 127.0.0.1;
set_real_ip_from 84.88.52.107;
real_ip_recursive on;
real_ip_header X-Forwarded-For;
location /git {
rewrite ^/git$ / break;
rewrite ^/git/(.*) /$1 break;
proxy_pass http://127.0.0.1:3000;
proxy_redirect http:// $scheme://;
}
location /cache {
rewrite ^/cache/(.*) /$1 break;
proxy_pass http://127.0.0.1:5000;
proxy_redirect http:// $scheme://;
}
location /lists {
proxy_pass http://127.0.0.1:8081;
proxy_redirect http:// $scheme://;
}
location /grafana {
proxy_pass http://127.0.0.1:2342;
proxy_redirect http:// $scheme://;
proxy_set_header Host $host;
# Websockets
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
location ~ ^/~(.+?)(/.*)?$ {
alias /vault/home/$1/public_html$2;
index index.html index.htm;
autoindex on;
absolute_redirect off;
}
location /p/ {
alias /var/lib/p/;
}
'';
};
};
}

16
m/tent/nix-serve.nix Normal file
View File

@@ -0,0 +1,16 @@
{ config, ... }:
{
age.secrets.nixServe.file = ../../secrets/nix-serve.age;
services.nix-serve = {
enable = true;
# Only listen locally, as we serve it via ssh
bindAddress = "127.0.0.1";
port = 5000;
secretKeyFile = config.age.secrets.nixServe.path;
# Public key:
# jungle.bsc.es:pEc7MlAT0HEwLQYPtpkPLwRsGf80ZI26aj29zMw/HH0=
};
}

View File

@@ -0,0 +1,28 @@
{ lib, ... }:
{
imports = [
../common/ssf.nix
];
# Select this using the ID to avoid mismatches
boot.loader.grub.device = "/dev/disk/by-id/wwn-0x55cd2e414d5356ca";
# No swap, there is plenty of RAM
swapDevices = lib.mkForce [];
# Users with sudo access
users.groups.wheel.members = [ "abonerib" "anavarro" ];
networking = {
hostName = "weasel";
interfaces.eno1.ipv4.addresses = [ {
address = "10.0.40.6";
prefixLength = 24;
} ];
interfaces.ibp5s0.ipv4.addresses = [ {
address = "10.0.42.6";
prefixLength = 24;
} ];
};
}

12
pkgs/cudainfo/Makefile Normal file
View File

@@ -0,0 +1,12 @@
HOSTCXX ?= g++
NVCC := nvcc -ccbin $(HOSTCXX)
CXXFLAGS := -m64
# Target rules
all: cudainfo
cudainfo: cudainfo.cpp
$(NVCC) $(CXXFLAGS) -o $@ $<
clean:
rm -f cudainfo cudainfo.o

600
pkgs/cudainfo/cudainfo.cpp Normal file
View File

@@ -0,0 +1,600 @@
/*
* Copyright 1993-2015 NVIDIA Corporation. All rights reserved.
*
* Please refer to the NVIDIA end user license agreement (EULA) associated
* with this source code for terms and conditions that govern your use of
* this software. Any use, reproduction, disclosure, or distribution of
* this software and related documentation outside the terms of the EULA
* is strictly prohibited.
*
*/
/* This sample queries the properties of the CUDA devices present in the system via CUDA Runtime API. */
// Shared Utilities (QA Testing)
// std::system includes
#include <memory>
#include <iostream>
#include <cuda_runtime.h>
// This will output the proper CUDA error strings in the event that a CUDA host call returns an error
#define checkCudaErrors(val) check ( (val), #val, __FILE__, __LINE__ )
// CUDA Runtime error messages
#ifdef __DRIVER_TYPES_H__
static const char *_cudaGetErrorEnum(cudaError_t error)
{
switch (error)
{
case cudaSuccess:
return "cudaSuccess";
case cudaErrorMissingConfiguration:
return "cudaErrorMissingConfiguration";
case cudaErrorMemoryAllocation:
return "cudaErrorMemoryAllocation";
case cudaErrorInitializationError:
return "cudaErrorInitializationError";
case cudaErrorLaunchFailure:
return "cudaErrorLaunchFailure";
case cudaErrorPriorLaunchFailure:
return "cudaErrorPriorLaunchFailure";
case cudaErrorLaunchTimeout:
return "cudaErrorLaunchTimeout";
case cudaErrorLaunchOutOfResources:
return "cudaErrorLaunchOutOfResources";
case cudaErrorInvalidDeviceFunction:
return "cudaErrorInvalidDeviceFunction";
case cudaErrorInvalidConfiguration:
return "cudaErrorInvalidConfiguration";
case cudaErrorInvalidDevice:
return "cudaErrorInvalidDevice";
case cudaErrorInvalidValue:
return "cudaErrorInvalidValue";
case cudaErrorInvalidPitchValue:
return "cudaErrorInvalidPitchValue";
case cudaErrorInvalidSymbol:
return "cudaErrorInvalidSymbol";
case cudaErrorMapBufferObjectFailed:
return "cudaErrorMapBufferObjectFailed";
case cudaErrorUnmapBufferObjectFailed:
return "cudaErrorUnmapBufferObjectFailed";
case cudaErrorInvalidHostPointer:
return "cudaErrorInvalidHostPointer";
case cudaErrorInvalidDevicePointer:
return "cudaErrorInvalidDevicePointer";
case cudaErrorInvalidTexture:
return "cudaErrorInvalidTexture";
case cudaErrorInvalidTextureBinding:
return "cudaErrorInvalidTextureBinding";
case cudaErrorInvalidChannelDescriptor:
return "cudaErrorInvalidChannelDescriptor";
case cudaErrorInvalidMemcpyDirection:
return "cudaErrorInvalidMemcpyDirection";
case cudaErrorAddressOfConstant:
return "cudaErrorAddressOfConstant";
case cudaErrorTextureFetchFailed:
return "cudaErrorTextureFetchFailed";
case cudaErrorTextureNotBound:
return "cudaErrorTextureNotBound";
case cudaErrorSynchronizationError:
return "cudaErrorSynchronizationError";
case cudaErrorInvalidFilterSetting:
return "cudaErrorInvalidFilterSetting";
case cudaErrorInvalidNormSetting:
return "cudaErrorInvalidNormSetting";
case cudaErrorMixedDeviceExecution:
return "cudaErrorMixedDeviceExecution";
case cudaErrorCudartUnloading:
return "cudaErrorCudartUnloading";
case cudaErrorUnknown:
return "cudaErrorUnknown";
case cudaErrorNotYetImplemented:
return "cudaErrorNotYetImplemented";
case cudaErrorMemoryValueTooLarge:
return "cudaErrorMemoryValueTooLarge";
case cudaErrorInvalidResourceHandle:
return "cudaErrorInvalidResourceHandle";
case cudaErrorNotReady:
return "cudaErrorNotReady";
case cudaErrorInsufficientDriver:
return "cudaErrorInsufficientDriver";
case cudaErrorSetOnActiveProcess:
return "cudaErrorSetOnActiveProcess";
case cudaErrorInvalidSurface:
return "cudaErrorInvalidSurface";
case cudaErrorNoDevice:
return "cudaErrorNoDevice";
case cudaErrorECCUncorrectable:
return "cudaErrorECCUncorrectable";
case cudaErrorSharedObjectSymbolNotFound:
return "cudaErrorSharedObjectSymbolNotFound";
case cudaErrorSharedObjectInitFailed:
return "cudaErrorSharedObjectInitFailed";
case cudaErrorUnsupportedLimit:
return "cudaErrorUnsupportedLimit";
case cudaErrorDuplicateVariableName:
return "cudaErrorDuplicateVariableName";
case cudaErrorDuplicateTextureName:
return "cudaErrorDuplicateTextureName";
case cudaErrorDuplicateSurfaceName:
return "cudaErrorDuplicateSurfaceName";
case cudaErrorDevicesUnavailable:
return "cudaErrorDevicesUnavailable";
case cudaErrorInvalidKernelImage:
return "cudaErrorInvalidKernelImage";
case cudaErrorNoKernelImageForDevice:
return "cudaErrorNoKernelImageForDevice";
case cudaErrorIncompatibleDriverContext:
return "cudaErrorIncompatibleDriverContext";
case cudaErrorPeerAccessAlreadyEnabled:
return "cudaErrorPeerAccessAlreadyEnabled";
case cudaErrorPeerAccessNotEnabled:
return "cudaErrorPeerAccessNotEnabled";
case cudaErrorDeviceAlreadyInUse:
return "cudaErrorDeviceAlreadyInUse";
case cudaErrorProfilerDisabled:
return "cudaErrorProfilerDisabled";
case cudaErrorProfilerNotInitialized:
return "cudaErrorProfilerNotInitialized";
case cudaErrorProfilerAlreadyStarted:
return "cudaErrorProfilerAlreadyStarted";
case cudaErrorProfilerAlreadyStopped:
return "cudaErrorProfilerAlreadyStopped";
/* Since CUDA 4.0*/
case cudaErrorAssert:
return "cudaErrorAssert";
case cudaErrorTooManyPeers:
return "cudaErrorTooManyPeers";
case cudaErrorHostMemoryAlreadyRegistered:
return "cudaErrorHostMemoryAlreadyRegistered";
case cudaErrorHostMemoryNotRegistered:
return "cudaErrorHostMemoryNotRegistered";
/* Since CUDA 5.0 */
case cudaErrorOperatingSystem:
return "cudaErrorOperatingSystem";
case cudaErrorPeerAccessUnsupported:
return "cudaErrorPeerAccessUnsupported";
case cudaErrorLaunchMaxDepthExceeded:
return "cudaErrorLaunchMaxDepthExceeded";
case cudaErrorLaunchFileScopedTex:
return "cudaErrorLaunchFileScopedTex";
case cudaErrorLaunchFileScopedSurf:
return "cudaErrorLaunchFileScopedSurf";
case cudaErrorSyncDepthExceeded:
return "cudaErrorSyncDepthExceeded";
case cudaErrorLaunchPendingCountExceeded:
return "cudaErrorLaunchPendingCountExceeded";
case cudaErrorNotPermitted:
return "cudaErrorNotPermitted";
case cudaErrorNotSupported:
return "cudaErrorNotSupported";
/* Since CUDA 6.0 */
case cudaErrorHardwareStackError:
return "cudaErrorHardwareStackError";
case cudaErrorIllegalInstruction:
return "cudaErrorIllegalInstruction";
case cudaErrorMisalignedAddress:
return "cudaErrorMisalignedAddress";
case cudaErrorInvalidAddressSpace:
return "cudaErrorInvalidAddressSpace";
case cudaErrorInvalidPc:
return "cudaErrorInvalidPc";
case cudaErrorIllegalAddress:
return "cudaErrorIllegalAddress";
/* Since CUDA 6.5*/
case cudaErrorInvalidPtx:
return "cudaErrorInvalidPtx";
case cudaErrorInvalidGraphicsContext:
return "cudaErrorInvalidGraphicsContext";
case cudaErrorStartupFailure:
return "cudaErrorStartupFailure";
case cudaErrorApiFailureBase:
return "cudaErrorApiFailureBase";
}
return "<unknown>";
}
#endif
template< typename T >
void check(T result, char const *const func, const char *const file, int const line)
{
if (result)
{
fprintf(stderr, "CUDA error at %s:%d code=%d(%s) \"%s\" \n",
file, line, static_cast<unsigned int>(result), _cudaGetErrorEnum(result), func);
cudaDeviceReset();
// Make sure we call CUDA Device Reset before exiting
exit(EXIT_FAILURE);
}
}
int *pArgc = NULL;
char **pArgv = NULL;
#if CUDART_VERSION < 5000
// CUDA-C includes
#include <cuda.h>
// This function wraps the CUDA Driver API into a template function
template <class T>
inline void getCudaAttribute(T *attribute, CUdevice_attribute device_attribute, int device)
{
CUresult error = cuDeviceGetAttribute(attribute, device_attribute, device);
if (CUDA_SUCCESS != error) {
fprintf(stderr, "cuSafeCallNoSync() Driver API error = %04d from file <%s>, line %i.\n",
error, __FILE__, __LINE__);
// cudaDeviceReset causes the driver to clean up all state. While
// not mandatory in normal operation, it is good practice. It is also
// needed to ensure correct operation when the application is being
// profiled. Calling cudaDeviceReset causes all profile data to be
// flushed before the application exits
cudaDeviceReset();
exit(EXIT_FAILURE);
}
}
#endif /* CUDART_VERSION < 5000 */
// Beginning of GPU Architecture definitions
inline int ConvertSMVer2Cores(int major, int minor)
{
// Defines for GPU Architecture types (using the SM version to determine the # of cores per SM
typedef struct {
int SM; // 0xMm (hexidecimal notation), M = SM Major version, and m = SM minor version
int Cores;
} sSMtoCores;
sSMtoCores nGpuArchCoresPerSM[] = {
{ 0x20, 32 }, // Fermi Generation (SM 2.0) GF100 class
{ 0x21, 48 }, // Fermi Generation (SM 2.1) GF10x class
{ 0x30, 192}, // Kepler Generation (SM 3.0) GK10x class
{ 0x32, 192}, // Kepler Generation (SM 3.2) GK10x class
{ 0x35, 192}, // Kepler Generation (SM 3.5) GK11x class
{ 0x37, 192}, // Kepler Generation (SM 3.7) GK21x class
{ 0x50, 128}, // Maxwell Generation (SM 5.0) GM10x class
{ 0x52, 128}, // Maxwell Generation (SM 5.2) GM20x class
{ -1, -1 }
};
int index = 0;
while (nGpuArchCoresPerSM[index].SM != -1) {
if (nGpuArchCoresPerSM[index].SM == ((major << 4) + minor)) {
return nGpuArchCoresPerSM[index].Cores;
}
index++;
}
// If we don't find the values, we default use the previous one to run properly
printf("MapSMtoCores for SM %d.%d is undefined. Default to use %d Cores/SM\n", major, minor, nGpuArchCoresPerSM[index-1].Cores);
return nGpuArchCoresPerSM[index-1].Cores;
}
////////////////////////////////////////////////////////////////////////////////
// Program main
////////////////////////////////////////////////////////////////////////////////
int
main(int argc, char **argv)
{
pArgc = &argc;
pArgv = argv;
printf("%s Starting...\n\n", argv[0]);
printf(" CUDA Device Query (Runtime API) version (CUDART static linking)\n\n");
int deviceCount = 0;
cudaError_t error_id = cudaGetDeviceCount(&deviceCount);
if (error_id != cudaSuccess) {
printf("cudaGetDeviceCount failed: %s (%d)\n",
cudaGetErrorString(error_id), (int) error_id);
printf("Result = FAIL\n");
exit(EXIT_FAILURE);
}
// This function call returns 0 if there are no CUDA capable devices.
if (deviceCount == 0)
printf("There are no available device(s) that support CUDA\n");
else
printf("Detected %d CUDA Capable device(s)\n", deviceCount);
int dev, driverVersion = 0, runtimeVersion = 0;
for (dev = 0; dev < deviceCount; ++dev) {
cudaSetDevice(dev);
cudaDeviceProp deviceProp;
cudaGetDeviceProperties(&deviceProp, dev);
printf("\nDevice %d: \"%s\"\n", dev, deviceProp.name);
// Console log
cudaDriverGetVersion(&driverVersion);
cudaRuntimeGetVersion(&runtimeVersion);
printf(" CUDA Driver Version / Runtime Version %d.%d / %d.%d\n", driverVersion/1000, (driverVersion%100)/10, runtimeVersion/1000, (runtimeVersion%100)/10);
printf(" CUDA Capability Major/Minor version number: %d.%d\n", deviceProp.major, deviceProp.minor);
printf(" Total amount of global memory: %.0f MBytes (%llu bytes)\n",
(float)deviceProp.totalGlobalMem/1048576.0f, (unsigned long long) deviceProp.totalGlobalMem);
printf(" (%2d) Multiprocessors, (%3d) CUDA Cores/MP: %d CUDA Cores\n",
deviceProp.multiProcessorCount,
ConvertSMVer2Cores(deviceProp.major, deviceProp.minor),
ConvertSMVer2Cores(deviceProp.major, deviceProp.minor) * deviceProp.multiProcessorCount);
printf(" GPU Max Clock rate: %.0f MHz (%0.2f GHz)\n", deviceProp.clockRate * 1e-3f, deviceProp.clockRate * 1e-6f);
#if CUDART_VERSION >= 5000
// This is supported in CUDA 5.0 (runtime API device properties)
printf(" Memory Clock rate: %.0f Mhz\n", deviceProp.memoryClockRate * 1e-3f);
printf(" Memory Bus Width: %d-bit\n", deviceProp.memoryBusWidth);
if (deviceProp.l2CacheSize) {
printf(" L2 Cache Size: %d bytes\n", deviceProp.l2CacheSize);
}
#else
// This only available in CUDA 4.0-4.2 (but these were only exposed in the CUDA Driver API)
int memoryClock;
getCudaAttribute<int>(&memoryClock, CU_DEVICE_ATTRIBUTE_MEMORY_CLOCK_RATE, dev);
printf(" Memory Clock rate: %.0f Mhz\n", memoryClock * 1e-3f);
int memBusWidth;
getCudaAttribute<int>(&memBusWidth, CU_DEVICE_ATTRIBUTE_GLOBAL_MEMORY_BUS_WIDTH, dev);
printf(" Memory Bus Width: %d-bit\n", memBusWidth);
int L2CacheSize;
getCudaAttribute<int>(&L2CacheSize, CU_DEVICE_ATTRIBUTE_L2_CACHE_SIZE, dev);
if (L2CacheSize) {
printf(" L2 Cache Size: %d bytes\n", L2CacheSize);
}
#endif
printf(" Maximum Texture Dimension Size (x,y,z) 1D=(%d), 2D=(%d, %d), 3D=(%d, %d, %d)\n",
deviceProp.maxTexture1D , deviceProp.maxTexture2D[0], deviceProp.maxTexture2D[1],
deviceProp.maxTexture3D[0], deviceProp.maxTexture3D[1], deviceProp.maxTexture3D[2]);
printf(" Maximum Layered 1D Texture Size, (num) layers 1D=(%d), %d layers\n",
deviceProp.maxTexture1DLayered[0], deviceProp.maxTexture1DLayered[1]);
printf(" Maximum Layered 2D Texture Size, (num) layers 2D=(%d, %d), %d layers\n",
deviceProp.maxTexture2DLayered[0], deviceProp.maxTexture2DLayered[1], deviceProp.maxTexture2DLayered[2]);
printf(" Total amount of constant memory: %lu bytes\n", deviceProp.totalConstMem);
printf(" Total amount of shared memory per block: %lu bytes\n", deviceProp.sharedMemPerBlock);
printf(" Total number of registers available per block: %d\n", deviceProp.regsPerBlock);
printf(" Warp size: %d\n", deviceProp.warpSize);
printf(" Maximum number of threads per multiprocessor: %d\n", deviceProp.maxThreadsPerMultiProcessor);
printf(" Maximum number of threads per block: %d\n", deviceProp.maxThreadsPerBlock);
printf(" Max dimension size of a thread block (x,y,z): (%d, %d, %d)\n",
deviceProp.maxThreadsDim[0],
deviceProp.maxThreadsDim[1],
deviceProp.maxThreadsDim[2]);
printf(" Max dimension size of a grid size (x,y,z): (%d, %d, %d)\n",
deviceProp.maxGridSize[0],
deviceProp.maxGridSize[1],
deviceProp.maxGridSize[2]);
printf(" Maximum memory pitch: %lu bytes\n", deviceProp.memPitch);
printf(" Texture alignment: %lu bytes\n", deviceProp.textureAlignment);
printf(" Concurrent copy and kernel execution: %s with %d copy engine(s)\n", (deviceProp.deviceOverlap ? "Yes" : "No"), deviceProp.asyncEngineCount);
printf(" Run time limit on kernels: %s\n", deviceProp.kernelExecTimeoutEnabled ? "Yes" : "No");
printf(" Integrated GPU sharing Host Memory: %s\n", deviceProp.integrated ? "Yes" : "No");
printf(" Support host page-locked memory mapping: %s\n", deviceProp.canMapHostMemory ? "Yes" : "No");
printf(" Alignment requirement for Surfaces: %s\n", deviceProp.surfaceAlignment ? "Yes" : "No");
printf(" Device has ECC support: %s\n", deviceProp.ECCEnabled ? "Enabled" : "Disabled");
#if defined(WIN32) || defined(_WIN32) || defined(WIN64) || defined(_WIN64)
printf(" CUDA Device Driver Mode (TCC or WDDM): %s\n", deviceProp.tccDriver ? "TCC (Tesla Compute Cluster Driver)" : "WDDM (Windows Display Driver Model)");
#endif
printf(" Device supports Unified Addressing (UVA): %s\n", deviceProp.unifiedAddressing ? "Yes" : "No");
printf(" Device PCI Domain ID / Bus ID / location ID: %d / %d / %d\n", deviceProp.pciDomainID, deviceProp.pciBusID, deviceProp.pciDeviceID);
const char *sComputeMode[] = {
"Default (multiple host threads can use ::cudaSetDevice() with device simultaneously)",
"Exclusive (only one host thread in one process is able to use ::cudaSetDevice() with this device)",
"Prohibited (no host thread can use ::cudaSetDevice() with this device)",
"Exclusive Process (many threads in one process is able to use ::cudaSetDevice() with this device)",
"Unknown",
NULL
};
printf(" Compute Mode:\n");
printf(" < %s >\n", sComputeMode[deviceProp.computeMode]);
}
// If there are 2 or more GPUs, query to determine whether RDMA is supported
if (deviceCount >= 2)
{
cudaDeviceProp prop[64];
int gpuid[64]; // we want to find the first two GPU's that can support P2P
int gpu_p2p_count = 0;
for (int i=0; i < deviceCount; i++)
{
checkCudaErrors(cudaGetDeviceProperties(&prop[i], i));
// Only boards based on Fermi or later can support P2P
if ((prop[i].major >= 2)
#if defined(WIN32) || defined(_WIN32) || defined(WIN64) || defined(_WIN64)
// on Windows (64-bit), the Tesla Compute Cluster driver for windows must be enabled to supprot this
&& prop[i].tccDriver
#endif
)
{
// This is an array of P2P capable GPUs
gpuid[gpu_p2p_count++] = i;
}
}
// Show all the combinations of support P2P GPUs
int can_access_peer_0_1, can_access_peer_1_0;
if (gpu_p2p_count >= 2)
{
for (int i = 0; i < gpu_p2p_count-1; i++)
{
for (int j = 1; j < gpu_p2p_count; j++)
{
checkCudaErrors(cudaDeviceCanAccessPeer(&can_access_peer_0_1, gpuid[i], gpuid[j]));
printf("> Peer access from %s (GPU%d) -> %s (GPU%d) : %s\n", prop[gpuid[i]].name, gpuid[i],
prop[gpuid[j]].name, gpuid[j] ,
can_access_peer_0_1 ? "Yes" : "No");
}
}
for (int j = 1; j < gpu_p2p_count; j++)
{
for (int i = 0; i < gpu_p2p_count-1; i++)
{
checkCudaErrors(cudaDeviceCanAccessPeer(&can_access_peer_1_0, gpuid[j], gpuid[i]));
printf("> Peer access from %s (GPU%d) -> %s (GPU%d) : %s\n", prop[gpuid[j]].name, gpuid[j],
prop[gpuid[i]].name, gpuid[i] ,
can_access_peer_1_0 ? "Yes" : "No");
}
}
}
}
// csv masterlog info
// *****************************
// exe and CUDA driver name
printf("\n");
std::string sProfileString = "deviceQuery, CUDA Driver = CUDART";
char cTemp[128];
// driver version
sProfileString += ", CUDA Driver Version = ";
#if defined(WIN32) || defined(_WIN32) || defined(WIN64) || defined(_WIN64)
sprintf_s(cTemp, 10, "%d.%d", driverVersion/1000, (driverVersion%100)/10);
#else
sprintf(cTemp, "%d.%d", driverVersion/1000, (driverVersion%100)/10);
#endif
sProfileString += cTemp;
// Runtime version
sProfileString += ", CUDA Runtime Version = ";
#if defined(WIN32) || defined(_WIN32) || defined(WIN64) || defined(_WIN64)
sprintf_s(cTemp, 10, "%d.%d", runtimeVersion/1000, (runtimeVersion%100)/10);
#else
sprintf(cTemp, "%d.%d", runtimeVersion/1000, (runtimeVersion%100)/10);
#endif
sProfileString += cTemp;
// Device count
sProfileString += ", NumDevs = ";
#if defined(WIN32) || defined(_WIN32) || defined(WIN64) || defined(_WIN64)
sprintf_s(cTemp, 10, "%d", deviceCount);
#else
sprintf(cTemp, "%d", deviceCount);
#endif
sProfileString += cTemp;
// Print Out all device Names
for (dev = 0; dev < deviceCount; ++dev)
{
#if defined(WIN32) || defined(_WIN32) || defined(WIN64) || defined(_WIN64)
sprintf_s(cTemp, 13, ", Device%d = ", dev);
#else
sprintf(cTemp, ", Device%d = ", dev);
#endif
cudaDeviceProp deviceProp;
cudaGetDeviceProperties(&deviceProp, dev);
sProfileString += cTemp;
sProfileString += deviceProp.name;
}
sProfileString += "\n";
printf("%s", sProfileString.c_str());
printf("Result = PASS\n");
// finish
// cudaDeviceReset causes the driver to clean up all state. While
// not mandatory in normal operation, it is good practice. It is also
// needed to ensure correct operation when the application is being
// profiled. Calling cudaDeviceReset causes all profile data to be
// flushed before the application exits
cudaDeviceReset();
return 0;
}

43
pkgs/cudainfo/default.nix Normal file
View File

@@ -0,0 +1,43 @@
{
stdenv
, cudatoolkit
, cudaPackages
, autoAddDriverRunpath
, strace
}:
stdenv.mkDerivation (finalAttrs: {
name = "cudainfo";
src = ./.;
buildInputs = [
cudatoolkit # Required for nvcc
cudaPackages.cuda_cudart.static # Required for -lcudart_static
autoAddDriverRunpath
];
installPhase = ''
mkdir -p $out/bin
cp -a cudainfo $out/bin
'';
passthru.gpuCheck = stdenv.mkDerivation {
name = "cudainfo-test";
requiredSystemFeatures = [ "cuda" ];
dontBuild = true;
nativeCheckInputs = [
finalAttrs.finalPackage # The cudainfo package from above
strace # When it fails, it will show the trace
];
dontUnpack = true;
doCheck = true;
checkPhase = ''
if ! cudainfo; then
set -x
cudainfo=$(command -v cudainfo)
ldd $cudainfo
readelf -d $cudainfo
strace -f $cudainfo
set +x
fi
'';
installPhase = "touch $out";
};
})

View File

@@ -0,0 +1,25 @@
{ python3Packages, lib }:
python3Packages.buildPythonApplication rec {
pname = "meteocat-exporter";
version = "1.0";
src = ./.;
doCheck = false;
build-system = with python3Packages; [
setuptools
];
dependencies = with python3Packages; [
beautifulsoup4
lxml
prometheus-client
];
meta = with lib; {
description = "MeteoCat Prometheus Exporter";
platforms = platforms.linux;
};
}

View File

@@ -0,0 +1,54 @@
#!/usr/bin/env python3
import time
from prometheus_client import start_http_server, Gauge
from bs4 import BeautifulSoup
from urllib import request
# Configuration -------------------------------------------
meteo_station = "X8" # Barcelona - Zona Universitària
listening_port = 9929
update_period = 60 * 5 # Each 5 min
# ---------------------------------------------------------
metric_tmin = Gauge('meteocat_temp_min', 'Min temperature')
metric_tmax = Gauge('meteocat_temp_max', 'Max temperature')
metric_tavg = Gauge('meteocat_temp_avg', 'Average temperature')
metric_srad = Gauge('meteocat_solar_radiation', 'Solar radiation')
def update(st):
url = 'https://www.meteo.cat/observacions/xema/dades?codi=' + st
response = request.urlopen(url)
data = response.read()
soup = BeautifulSoup(data, 'lxml')
table = soup.find("table", {"class" : "tblperiode"})
rows = table.find_all('tr')
row = rows[-1] # Take the last row
row_data = []
header = row.find('th')
header_text = header.text.strip()
row_data.append(header_text)
for col in row.find_all('td'):
row_data.append(col.text)
try:
# Sometimes it will return '(s/d)' and fail to parse
metric_tavg.set(float(row_data[1]))
metric_tmax.set(float(row_data[2]))
metric_tmin.set(float(row_data[3]))
metric_srad.set(float(row_data[10]))
#print("ok: temp_avg={}".format(float(row_data[1])))
except:
print("cannot parse row: {}".format(row))
metric_tavg.set(float("nan"))
metric_tmax.set(float("nan"))
metric_tmin.set(float("nan"))
metric_srad.set(float("nan"))
if __name__ == '__main__':
start_http_server(port=listening_port, addr="localhost")
while True:
try:
update(meteo_station)
except:
print("update failed")
time.sleep(update_period)

View File

@@ -0,0 +1,11 @@
#!/usr/bin/env python
from setuptools import setup, find_packages
setup(name='meteocat-exporter',
version='1.0',
# Modules to import from other scripts:
packages=find_packages(),
# Executables
scripts=["meteocat-exporter"],
)

View File

@@ -1,36 +0,0 @@
diff --git a/src/util/mpir_hwtopo.c b/src/util/mpir_hwtopo.c
index 33e88bc..ee3641c 100644
--- a/src/util/mpir_hwtopo.c
+++ b/src/util/mpir_hwtopo.c
@@ -200,18 +200,6 @@ int MPII_hwtopo_init(void)
#ifdef HAVE_HWLOC
bindset = hwloc_bitmap_alloc();
hwloc_topology_init(&hwloc_topology);
- char *xmlfile = MPIR_pmi_get_jobattr("PMI_hwloc_xmlfile");
- if (xmlfile != NULL) {
- int rc;
- rc = hwloc_topology_set_xml(hwloc_topology, xmlfile);
- if (rc == 0) {
- /* To have hwloc still actually call OS-specific hooks, the
- * HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM has to be set to assert that the loaded
- * file is really the underlying system. */
- hwloc_topology_set_flags(hwloc_topology, HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM);
- }
- MPL_free(xmlfile);
- }
hwloc_topology_set_io_types_filter(hwloc_topology, HWLOC_TYPE_FILTER_KEEP_ALL);
if (!hwloc_topology_load(hwloc_topology))
--- a/src/mpi/init/local_proc_attrs.c
+++ b/src/mpi/init/local_proc_attrs.c
@@ -79,10 +79,6 @@ int MPII_init_local_proc_attrs(int *p_thread_required)
/* Set the number of tag bits. The device may override this value. */
MPIR_Process.tag_bits = MPIR_TAG_BITS_DEFAULT;
- char *requested_kinds = MPIR_pmi_get_jobattr("PMI_mpi_memory_alloc_kinds");
- MPIR_get_supported_memory_kinds(requested_kinds, &MPIR_Process.memory_alloc_kinds);
- MPL_free(requested_kinds);
-
return mpi_errno;
}

View File

@@ -11,10 +11,6 @@ final: prev:
paths = [ pmix.dev pmix.out ];
};
in prev.mpich.overrideAttrs (old: {
patches = (old.patches or []) ++ [
# See https://github.com/pmodels/mpich/issues/6946
./mpich-fix-hwtopo.patch
];
buildInput = old.buildInputs ++ [
libfabric
pmixAll
@@ -54,4 +50,7 @@ final: prev:
});
prometheus-slurm-exporter = prev.callPackage ./slurm-exporter.nix { };
meteocat-exporter = prev.callPackage ./meteocat-exporter/default.nix { };
upc-qaire-exporter = prev.callPackage ./upc-qaire-exporter/default.nix { };
cudainfo = prev.callPackage ./cudainfo/default.nix { };
}

View File

@@ -0,0 +1,24 @@
{ python3Packages, lib }:
python3Packages.buildPythonApplication rec {
pname = "upc-qaire-exporter";
version = "1.0";
src = ./.;
doCheck = false;
build-system = with python3Packages; [
setuptools
];
dependencies = with python3Packages; [
prometheus-client
requests
];
meta = with lib; {
description = "UPC Qaire Prometheus Exporter";
platforms = platforms.linux;
};
}

View File

@@ -0,0 +1,11 @@
#!/usr/bin/env python
from setuptools import setup, find_packages
setup(name='upc-qaire-exporter',
version='1.0',
# Modules to import from other scripts:
packages=find_packages(),
# Executables
scripts=["upc-qaire-exporter"],
)

View File

@@ -0,0 +1,74 @@
#!/usr/bin/env python3
import time
from prometheus_client import start_http_server, Gauge
import requests, json
from datetime import datetime, timedelta
# Configuration -------------------------------------------
listening_port = 9928
update_period = 60 * 5 # Each 5 min
# ---------------------------------------------------------
metric_temp = Gauge('upc_c6_s302_temp', 'UPC C6 S302 temperature sensor')
def genparams():
d = {}
d['topic'] = 'TEMPERATURE'
d['shift_dates_to'] = ''
d['datapoints'] = 301
d['devicesAndColors'] = '1148418@@@#40ACB6'
now = datetime.now()
d['fromDate'] = now.strftime('%d/%m/%Y')
d['toDate'] = now.strftime('%d/%m/%Y')
d['serviceFrequency'] = 'NONE'
# WTF!
for i in range(7):
for j in range(48):
key = 'week.days[{}].hours[{}].value'.format(i, j)
d[key] = 'OPEN'
return d
def measure():
# First we need to load session
s = requests.Session()
r = s.get("https://upc.edu/sirena")
if r.status_code != 200:
print("bad HTTP status code on new session: {}".format(r.status_code))
return
if s.cookies.get("JSESSIONID") is None:
print("cannot get JSESSIONID")
return
# Now we can pull the data
url = "https://upcsirena.app.dexma.com/l_12535/analysis/by_datapoints/data.json"
r = s.post(url, data=genparams())
if r.status_code != 200:
print("bad HTTP status code on data: {}".format(r.status_code))
return
#print(r.text)
j = json.loads(r.content)
# Just take the last one
last = j['data']['chartElementList'][-1]
temp = last['values']['1148418-Temperatura']
return temp
if __name__ == '__main__':
start_http_server(port=listening_port, addr="localhost")
while True:
try:
metric_temp.set(measure())
except:
print("measure failed")
metric_temp.set(float("nan"))
time.sleep(update_period)

Binary file not shown.

View File

@@ -1,9 +1,11 @@
age-encryption.org/v1
-> ssh-ed25519 HY2yRg eRVX5yndWDLg9hw7sY1Iu8pJFy47luHvdL+zZGK2u1s
e1nXXiMW0ywkZYh2s6c7/quGMfBOJOaRhNQDjCD2Iyc
-> ssh-ed25519 CAWG4Q gYG7GRxRpJ0/5Wz0Z0J2wfLfkMFNmcy81dQEewM7gUA
lamdUdx+xOFWF1lmUM4x9TT0cJtKu9Sp7w9JHwm13u0
-> ssh-ed25519 MSF3dg HEzfpR8alG6WPzhaEjAmmjOFoFcMSQUldx46dBsXri4
OAD5H/zZGhfevYrFJzJrbNKPomKZDOS9Qx5tmTp78Jo
--- A0sMSiNXWaEIgRXR0x6UAIaluuVH6Zlv4CJ9sI0NXOw
<EFBFBD><EFBFBD>6<EFBFBD>ph<EFBFBD><EFBFBD><EFBFBD>{<7B>><3E>F|<7C>i<EFBFBD>v <0B><>E}{<7B>ru<72><75>Ʒ<EFBFBD><C6B7><1A><EFBFBD><7F>}^<5E><>><3E>c6<06><14>j<> <09>g<EFBFBD>GW<47><57>:<3A>J3<19>|<7C>|<7C>Z<EFBFBD>
-> ssh-ed25519 HY2yRg d7+nvfAcdC3GjJxipXFrsfGGyP5jAY+gRWRV+4FVYAM
CG7r0bRGgnUWcdfDnpe7HwZ3L/y7b5iuJuqvf15b3/Y
-> ssh-ed25519 CAWG4Q X0vITOErz4wkR3VQYOcVlnrkHtwe+ytdZz1Hcrs4vVs
6IWYOhXLQ+BnML9YfLLHJYEO2CZ/uEc9IBqhoWvjDHI
-> ssh-ed25519 xA739A p5e/0AJtZ0+zbRvkB/usLuxusY8xXRx9Ksi/LQlcIHw
M4S/qlzT9POyJx4gY9lmycstUcdwG2cinN4OlV22zzo
-> ssh-ed25519 MSF3dg Ydl7uBWzBx6sAaxbzC3x8qiaU3ysGqV4rUFLpHCEV30
/1AUHBhCNOs9i7LJbmzwQDHsu+ybzYf6+coztKk5E3U
--- kYt15WxClpT7PXD1oFe9GqJU+OswjH7y9wIc8/GzZ7M
<EFBFBD><EFBFBD>h<>ߓ<><DF93><EFBFBD>`<60><><EFBFBD>V4F<34><46>_k)^<5E>m$uj:ѳ<><D1B3><17><><EFBFBD>}<7D>Z]$U]<12>u<EFBFBD> <20>0<EFBFBD><30><EFBFBD>v8<76>?<3F>X<EFBFBD>P<EFBFBD>g%d<>#<23>d9{rAi<41><69>

View File

@@ -1,11 +1,11 @@
age-encryption.org/v1
-> ssh-ed25519 HY2yRg WSdjyQPzBJ4JbzQpGeq1AAYpWKoXmLI1ZtmNmM5QOzs
qGDlDT31DQF1DdHen0+5+52DdsQlabJdA2pOB5O1I6g
-> ssh-ed25519 CAWG4Q wioWMDxQjN+d4JdIbCwZg0DLQu1OH2mV6gukRprjuAs
670fE61hidOEh20hHiQAhP0+CjDF0WMBNzgwkGT8Yqg
-> ssh-ed25519 MSF3dg DN19uvAEtqq4708P6HpuX9i/o/qAvHX6dj69dCF2H1o
4Lu9GnjiFLMeXJ2C7aVPJsCHCQVlhylNWJi896Av92s
--- 7cKBwOYNOUZ2h3/kAY09aSMASZSxX7hZIT4kvlIiT6w
<EFBFBD>6<EFBFBD><02><><EFBFBD><06>fQF5=<3D>bX+<2B>v e`<60>7/<2F><05>A~P<><50>Ѧ7<15><>
<EFBFBD><EFBFBD>A<EFBFBD>)<29>h<05><>=oZ<6F>$<24> ^<5E>V0<56><>r
k<EFBFBD>u<EFBFBD>bĶ:R<><52>>^g<><67><06>ik_*% <0B>a7<61>KG<4B><47><EFBFBD><EFBFBD><EFBFBD><EFBFBD>&<26>PI<50><49>n
-> ssh-ed25519 HY2yRg 2jO/raAo/qSdOmENknW2ksMgGQDmn9re+X//ekjgRGg
+Wo4K69UucwxYtbMSn3dDB6M5/1y+PY+p071vgPOsuE
-> ssh-ed25519 CAWG4Q WKIgDc1Bv/iHas0D7IHmbru5Jy/Iy4abUsIuTDC8+Wc
kNbE/CBggRmgUk3OewptR3oNo/t/uOQuf8nZDfJUoLA
-> ssh-ed25519 xA739A FaaHnwsRGA+PZgVjyqvY6ctw/frYFKQzRl92+mTXZwM
ivABFIbzu4pQrz3yp2IiyTXFFmb/aWO+MaBwz8Bsabc
-> ssh-ed25519 MSF3dg HHwnX2dzb4PVdwKUROo8eQS2iyVc5QCUIXanObh71FU
dM0qRA18DdN3QFEr5etAjtCNn2U2dmkqbUkykEp4eLg
--- x+ixidWejvGMWNz6qzelFPz+9sJ15T7oofx3G9O1ZZE
<EFBFBD><0E>VFh-=B<><42>9<EFBFBD>)<29>o<EFBFBD>E<<3C><>q(P<><1E><14>(/<2F>Y qsU<73><55>X<EFBFBD>e<EFBFBD>䉐)

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@@ -2,6 +2,8 @@ let
keys = import ../keys.nix;
adminsKeys = builtins.attrValues keys.admins;
hut = [ keys.hosts.hut ] ++ adminsKeys;
mon = [ keys.hosts.hut keys.hosts.tent ] ++ adminsKeys;
tent = [ keys.hosts.tent ] ++ adminsKeys;
# Only expose ceph keys to safe nodes and admins
safe = keys.hostGroup.safe ++ adminsKeys;
in
@@ -10,9 +12,15 @@ in
"gitlab-runner-docker-token.age".publicKeys = hut;
"gitlab-runner-shell-token.age".publicKeys = hut;
"gitlab-bsc-docker-token.age".publicKeys = hut;
"nix-serve.age".publicKeys = hut;
"jungle-robot-password.age".publicKeys = hut;
"ipmi.yml.age".publicKeys = hut;
"nix-serve.age".publicKeys = mon;
"jungle-robot-password.age".publicKeys = mon;
"ipmi.yml.age".publicKeys = mon;
"tent-gitlab-runner-pm-docker-token.age".publicKeys = tent;
"tent-gitlab-runner-pm-shell-token.age".publicKeys = tent;
"tent-gitlab-runner-bsc-docker-token.age".publicKeys = tent;
"vpn-dac-login.age".publicKeys = tent;
"vpn-dac-client-key.age".publicKeys = tent;
"ceph-user.age".publicKeys = safe;
"munge-key.age".publicKeys = safe;

View File

@@ -0,0 +1,11 @@
age-encryption.org/v1
-> ssh-ed25519 G5LX5w HlQ4V8lBd3im5j8KHEuQZBTuztvPj1QoWdv6FL6qzGI
Jpt91X1UIIVFQt1X6Q//kALn+Cetp/LqBZZvTuhFthw
-> ssh-ed25519 CAWG4Q StnngJAcuAwUnTrXDR3nJ2KFN0jNdTqSz+/1TfmWkzA
CR4AQ6fqaJVY1mdUIX1gzaZwRs1sU8F8hHztnkN8vN0
-> ssh-ed25519 xA739A xya5A5t63Owx+VrGgUfV/lIP8b/xV1cerMpuZBLaDVM
w+pA583yUnFq2AvGBGzWbQIGQEY9WqW0CSLQ9v+SG0c
-> ssh-ed25519 MSF3dg aXkLxCyYdOwVopHHmpXEI6WlAIizKdJi4IO0KEdhS3s
WKXkTszZN66+QZdSDJ4D9q7xgYWMfliOLCubIF2Dqkc
--- uVWoU2lMkqQ/9Z0BqKRCeUpsKi8lwmHukT/FV8wYMbg
<EFBFBD><EFBFBD>1G+<2B>6<EFBFBD><36>g[|x]2T<32>й<EFBFBD><D0B9><EFBFBD> <20>CKu)<29><><EFBFBD>]<5D><><38><D693><EFBFBD><EFBFBD>l<EFBFBD><6C>S<EFBFBD><53><EFBFBD>Q<EFBFBD><07><>x<EFBFBD><78><EFBFBD><EFBFBD>#7r<37>k{*<2A><>3ս~C<>b<EFBFBD><62><EFBFBD><EFBFBD><EFBFBD><EFBFBD>ڵ<EFBFBD>Np<1E><05>]J]h<>je+d%Е<>#<23>m<EFBFBD>?=6}<7D>

View File

@@ -0,0 +1,11 @@
age-encryption.org/v1
-> ssh-ed25519 G5LX5w sg9SmahxBg35MDIxhrp4oHkaTaxsKoVQju2eNhCt0BM
CZ64dEGqz2tbkG8KtimZvLUEMrQpVVBJP7Fu46WTMgc
-> ssh-ed25519 CAWG4Q jzS1R14W1CWxdziMLG/yCGPLWSkiyE+9lqyCVe491ng
acJo/nhKq3pSPoFEPaFLN1fzHHbEzstNoLtohWAHKiM
-> ssh-ed25519 xA739A qeGJoLeSIQwLU2Yg+Gi2bikHJ3HscLfyo1msqL3JwHw
tTwaxRBKTl/SoyY/LnxR/j/5WvCNX5VeZLKi018YMrY
-> ssh-ed25519 MSF3dg Wym7Uyf1XvH1H6mNDERkO8opkMiN0zzXm2PjXftEOWs
Uw8ZwwKIB5UqgVuoSLE2QajNDJZkH7/Y3Nsy+WFl7Xs
--- 94hGVbYiCGZdMEJesCMLh7IZi+w5l/Kr1lZJHQgrc0o
j5j磛<6A><04><>J<EFBFBD><4A><EFBFBD>a<EFBFBD>]<5D>a%dr<64><72>FDT<44><54>^<5E><>Q<EFBFBD>s/<2F>kwB<77>$<24><>$<24><>H<EFBFBD>'<27><><EFBFBD><EFBFBD><EFBFBD>w<14><?^|<7C><07>h$<24>ؗ<EFBFBD>GI<47>ĕsT2RU<52><55>*/O<>7<EFBFBD><37><EFBFBD>G<EFBFBD><70>4<EFBFBD><34><EFBFBD>M9<4D>j<><06>

View File

@@ -0,0 +1,12 @@
age-encryption.org/v1
-> ssh-ed25519 G5LX5w 5K0mzfJGvAB2LGmoQ9ZLbWooVEX6F4+fQdo1JUoB3FM
AKGa507bUrYjXFaMQ1MXTDBFYsdS6zbs+flmxYN0UNo
-> ssh-ed25519 CAWG4Q 8KzLc949on8iN1pK8q11OpCIeO71t6b0zxCLHhcQ6ns
uy7z6RdIuoUes+Uap3k5eoFFuu/DcSrEBwq4V4C/ygc
-> ssh-ed25519 xA739A SLx5cKo0fdAHj+cLpJ4FYTWTUTyDsCqKQOufDu3xnGo
VnS/WsiSaf6RpXuhgfij4pYu4p9hlJl1oXrfYY9rKlQ
-> ssh-ed25519 MSF3dg c5ZXvdNxNfZU3HeWsttuhy+UC5JxWN/IFuCuCGbksn4
vcKlIirf+VvERX71YpmwW6zp6ClhlG2PR4R8LIN7cQo
--- pJKICDaYAlxqNnvHIuzB3Yk7tv0ZNYflGTQD+Zk/8+4
<EFBFBD>h/\J<>J
<EFBFBD>0?<3F> <20>p<EFBFBD><70><EFBFBD>@܉7<DC89><37>3<EFBFBD><33><EFBFBD><EFBFBD>z<EFBFBD><7A><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>a<EFBFBD><61>'<27>,ka<6B>I<EFBFBD>XXOZ<4F>I\<5C><><EFBFBD><EFBFBD><EFBFBD> <09>BP<42><50>/cUɿ~B<><42>S' Q<><51><EFBFBD><EFBFBD>f<06><><EFBFBD>er<65><72><EFBFBD><EFBFBD>^<5E><><EFBFBD><EFBFBD>8l<38><6C>V<EFBFBD>E<EFBFBD><45><EFBFBD>

Binary file not shown.

12
secrets/vpn-dac-login.age Normal file
View File

@@ -0,0 +1,12 @@
age-encryption.org/v1
-> ssh-ed25519 G5LX5w /RF8uZ/KahUqjEFILbF3+Jin+U0SQdoQChcc9RJ9axc
aEmPk++86nBR6d2BIa/oaUdyiLS6cH8TUoYJE3bxba4
-> ssh-ed25519 CAWG4Q qHyh9nQi8c3z/KHby9y5vhzN0Dwz0zca98ebjJmXrzs
ZbmwNzrSSQ3RvskE8SqcBa0vMy8pzm/HPGHLm5zuPGQ
-> ssh-ed25519 xA739A FlGbfS4bUxA3gVDzb3yPjp4hV8a7aiNBLUctnN3bGEY
3fI6SyVjVhh2M8uc/XV3blpdQMPMYi2qzaHNXvx0bvM
-> ssh-ed25519 MSF3dg 0Bs/aW0nNISS+93It75o6hKZWa7S+LF5bF5ApsJ2fQ8
y7o0KYDHEen13ndIxg/mYil3eMxxzvYF2pWqhMb+rBU
--- Iqo75G4+02Y9nc1OOkcEx+iQlKnGYCekAx76tRH53wA
<10>
<EFBFBD>X<EFBFBD><EFBFBD>%f <0C><><12>hX <0B><>R<>c<EFBFBD>+z<><7A>eg<65>& <20>d<EFBFBD><64><EFBFBD>ק<06><>A<EFBFBD><41><EFBFBD>чXM<58>1<EFBFBD>

View File

@@ -5,18 +5,12 @@ description: "Request access to the machines"
![Cave](./cave.jpg)
Before requesting access to the jungle machines, you must be able to access the
`ssfhead.bsc.es` node (only available via the intranet or VPN). You can request
access to the login machine using a resource petition in the BSC intranet.
Then, to request access to the machines we will need some information about you:
To request access to the machines we will need some information:
1. Which machines you want access to ([hut](/hut), [fox](/fox), owl1, owl2, eudy, koro...)
1. Your user name and user id (to match the NFS permissions)
1. Your user name (make sure it matches the one you use for the BSC intranet)
1. Your real name and surname (for identification purposes)
1. The salted hash of your login password, generated with `mkpasswd -m sha-512`
1. An SSH public key of type Ed25519 (can be generated with `ssh-keygen -t ed25519`)
Send an email to <jungle@bsc.es> with the details, or directly open a
merge request in the [jungle
repository](https://pm.bsc.es/gitlab/rarias/jungle/).
Send an email to <jungle@bsc.es> with the details.

View File

@@ -38,8 +38,8 @@ enable it for all builds in the system.
```nix
{ ... }: {
nix.settings = {
substituters = [ "https://jungle.bsc.es/cache" ];
trusted-public-keys = [ "jungle.bsc.es:pEc7MlAT0HEwLQYPtpkPLwRsGf80ZI26aj29zMw/HH0=" ];
extra-substituters = [ "https://jungle.bsc.es/cache" ];
extra-trusted-public-keys = [ "jungle.bsc.es:pEc7MlAT0HEwLQYPtpkPLwRsGf80ZI26aj29zMw/HH0=" ];
};
}
```
@@ -60,8 +60,8 @@ Note: you'll have to be a trusted user.
If using nix outside of NixOS, you'll have to update `/etc/nix/nix.conf`
```
# echo "substituters = https://jungle.bsc.es/cache" >> /etc/nix/nix.conf
# echo "trusted-public-keys = jungle.bsc.es:pEc7MlAT0HEwLQYPtpkPLwRsGf80ZI26aj29zMw/HH0=" >> /etc/nix/nix.conf
# echo "extra-substituters = https://jungle.bsc.es/cache" >> /etc/nix/nix.conf
# echo "extra-trusted-public-keys = jungle.bsc.es:pEc7MlAT0HEwLQYPtpkPLwRsGf80ZI26aj29zMw/HH0=" >> /etc/nix/nix.conf
```
### Hint in flakes

View File

@@ -5,13 +5,13 @@ author: "Rodrigo Arias Mallo"
date: 2024-09-20
---
The hut machine provides a paste service using the program `p` (as in paste).
The tent machine provides a paste service using the program `p` (as in paste).
You can use it directly from the hut machine or remotely if you have [SSH
access](/access) to hut using the following alias:
You can use it directly from the tent machine or remotely if you have [SSH
access](/access) to tent using the following alias:
```
alias p="ssh hut p"
alias p="ssh tent p"
```
You can add it to bashrc or zshrc for persistent installation.
@@ -19,7 +19,7 @@ You can add it to bashrc or zshrc for persistent installation.
## Usage
The `p` command reads from the standard input, uploads the content to a file
in the ceph filesystem and prints the URL to access it. It only accepts an
in the local filesystem and prints the URL to access it. It only accepts an
optional argument, which is the extension of the file that will be stored on
disk (without the dot). By default it uses the `txt` extension, so plain text
can be read in the browser directly.
@@ -28,21 +28,21 @@ can be read in the browser directly.
p [extension]
```
To remove files, go to `/ceph/p/$USER` and remove them manually.
To remove files, go to `/var/lib/p/$USER` and remove them manually.
## Examples
Share a text file, in this case the source of p itself:
```
hut% p < m/hut/p.nix
tent% p < m/tent/p.nix
https://jungle.bsc.es/p/rarias/okbtG130.txt
```
Paste the last dmesg lines directly from a pipe:
```
hut% dmesg | tail -5 | p
tent% dmesg | tail -5 | p
https://jungle.bsc.es/p/rarias/luX4STm9.txt
```