Keep compute nodes off when power comes back

When the power comes back, we don't know if the AC unit will be operating properly or if the room will be at a safe temperature. So, instead of powering all the machines back, only configure the login to power on, so we can check the state of the room and power the rest of the machines.
Move StartLimit* options to unit section
2025-07-31 17:39:21 +02:00 · 2025-07-24 14:32:46 +02:00 · 2025-07-24 11:22:38 +02:00 · 2025-07-24 11:22:36 +02:00 · 2025-07-24 11:22:33 +02:00 · 2025-07-24 11:22:10 +02:00
44 changed files with 1023 additions and 372 deletions
--- a/flake.lock
+++ b/flake.lock
@@ -10,11 +10,11 @@
        "systems": "systems"
      },
      "locked": {
-        "lastModified": 1723293904,
-        "narHash": "sha256-b+uqzj+Wa6xgMS9aNbX4I+sXeb5biPDi39VgvSFqFvU=",
+        "lastModified": 1750173260,
+        "narHash": "sha256-9P1FziAwl5+3edkfFcr5HeGtQUtrSdk/MksX39GieoA=",
        "owner": "ryantm",
        "repo": "agenix",
-        "rev": "f6291c5935fdc4e0bef208cfc0dcab7e3f7a1c41",
+        "rev": "531beac616433bac6f9e2a19feb8e99a22a66baf",
        "type": "github"
      },
      "original": {
@@ -30,11 +30,11 @@
        ]
      },
      "locked": {
-        "lastModified": 1732868163,
-        "narHash": "sha256-qck4h298AgcNI6BnGhEwl26MTLXjumuJVr+9kak7uPo=",
+        "lastModified": 1749650500,
+        "narHash": "sha256-2MHfVPV6RA7qPSCtXh4+KK0F0UjN+J4z8//+n6NK7Xs=",
        "ref": "refs/heads/master",
-        "rev": "6782fc6c5b5a29e84a7f2c2d1064f4bcb1288c0f",
-        "revCount": 952,
+        "rev": "9d1944c658929b6f98b3f3803fead4d1b91c4405",
+        "revCount": 961,
        "type": "git",
        "url": "https://git.sr.ht/~rodarima/bscpkgs"
      },
@@ -51,11 +51,11 @@
        ]
      },
      "locked": {
-        "lastModified": 1700795494,
-        "narHash": "sha256-gzGLZSiOhf155FW7262kdHo2YDeugp3VuIFb4/GGng0=",
+        "lastModified": 1744478979,
+        "narHash": "sha256-dyN+teG9G82G+m+PX/aSAagkC+vUv0SgUw3XkPhQodQ=",
        "owner": "lnl7",
        "repo": "nix-darwin",
-        "rev": "4b9b83d5a92e8c1fbfd8eb27eda375908c11ec4d",
+        "rev": "43975d782b418ebf4969e9ccba82466728c2851b",
        "type": "github"
      },
      "original": {
@@ -73,11 +73,11 @@
        ]
      },
      "locked": {
-        "lastModified": 1703113217,
-        "narHash": "sha256-7ulcXOk63TIT2lVDSExj7XzFx09LpdSAPtvgtM7yQPE=",
+        "lastModified": 1745494811,
+        "narHash": "sha256-YZCh2o9Ua1n9uCvrvi5pRxtuVNml8X2a03qIFfRKpFs=",
        "owner": "nix-community",
        "repo": "home-manager",
-        "rev": "3bfaacf46133c037bb356193bd2f1765d9dc82c1",
+        "rev": "abfad3d2958c9e6300a883bd443512c55dfeb1be",
        "type": "github"
      },
      "original": {
@@ -88,16 +88,16 @@
    },
    "nixpkgs": {
      "locked": {
-        "lastModified": 1736867362,
-        "narHash": "sha256-i/UJ5I7HoqmFMwZEH6vAvBxOrjjOJNU739lnZnhUln8=",
+        "lastModified": 1752436162,
+        "narHash": "sha256-Kt1UIPi7kZqkSc5HVj6UY5YLHHEzPBkgpNUByuyxtlw=",
        "owner": "NixOS",
        "repo": "nixpkgs",
-        "rev": "9c6b49aeac36e2ed73a8c472f1546f6d9cf1addc",
+        "rev": "dfcd5b901dbab46c9c6e80b265648481aafb01f8",
        "type": "github"
      },
      "original": {
        "owner": "NixOS",
-        "ref": "nixos-24.11",
+        "ref": "nixos-25.05",
        "repo": "nixpkgs",
        "type": "github"
      }
--- a/flake.nix
+++ b/flake.nix
@@ -1,6 +1,6 @@
 {
  inputs = {
-    nixpkgs.url = "github:NixOS/nixpkgs/nixos-24.11";
+    nixpkgs.url = "github:NixOS/nixpkgs/nixos-25.05";
    agenix.url = "github:ryantm/agenix";
    agenix.inputs.nixpkgs.follows = "nixpkgs";
    bscpkgs.url = "git+https://git.sr.ht/~rodarima/bscpkgs";
@@ -27,6 +27,8 @@ in
      lake2   = mkConf "lake2";
      raccoon = mkConf "raccoon";
      fox     = mkConf "fox";
+      apex    = mkConf "apex";
+      weasel  = mkConf "weasel";
    };

    packages.x86_64-linux = self.nixosConfigurations.hut.pkgs // {
--- a/keys.nix
+++ b/keys.nix
@@ -2,25 +2,28 @@
 # here all the public keys
 rec {
  hosts = {
-    hut   = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICO7jIp6JRnRWTMDsTB/aiaICJCl4x8qmKMPSs4lCqP1 hut";
-    owl1  = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIMqMEXO0ApVsBA6yjmb0xP2kWyoPDIWxBB0Q3+QbHVhv owl1";
-    owl2  = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIHurEYpQzNHqWYF6B9Pd7W8UPgF3BxEg0BvSbsA7BAdK owl2";
-    eudy  = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIL+WYPRRvZupqLAG0USKmd/juEPmisyyJaP8hAgYwXsG eudy";
-    koro  = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIImiTFDbxyUYPumvm8C4mEnHfuvtBY1H8undtd6oDd67 koro";
-    bay   = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICvGBzpRQKuQYHdlUQeAk6jmdbkrhmdLwTBqf3el7IgU bay";
-    lake2 = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAINo66//S1yatpQHE/BuYD/Gfq64TY7ZN5XOGXmNchiO0 lake2";
-    fox   = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIDwItIk5uOJcQEVPoy/CVGRzfmE1ojrdDcI06FrU4NFT fox";
-    tent  = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFAtTpHtdYoelbknD/IcfBlThwLKJv/dSmylOgpg3FRM tent";
+    hut    = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICO7jIp6JRnRWTMDsTB/aiaICJCl4x8qmKMPSs4lCqP1 hut";
+    owl1   = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIMqMEXO0ApVsBA6yjmb0xP2kWyoPDIWxBB0Q3+QbHVhv owl1";
+    owl2   = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIHurEYpQzNHqWYF6B9Pd7W8UPgF3BxEg0BvSbsA7BAdK owl2";
+    eudy   = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIL+WYPRRvZupqLAG0USKmd/juEPmisyyJaP8hAgYwXsG eudy";
+    koro   = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIImiTFDbxyUYPumvm8C4mEnHfuvtBY1H8undtd6oDd67 koro";
+    bay    = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICvGBzpRQKuQYHdlUQeAk6jmdbkrhmdLwTBqf3el7IgU bay";
+    lake2  = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAINo66//S1yatpQHE/BuYD/Gfq64TY7ZN5XOGXmNchiO0 lake2";
+    fox    = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIDwItIk5uOJcQEVPoy/CVGRzfmE1ojrdDcI06FrU4NFT fox";
+    tent   = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFAtTpHtdYoelbknD/IcfBlThwLKJv/dSmylOgpg3FRM tent";
+    apex   = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIBvUFjSfoxXnKwXhEFXx5ckRKJ0oewJ82mRitSMNMKjh apex";
+    weasel = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFLJrQ8BF6KcweQV8pLkSbFT+tbDxSG9qxrdQE65zJZp weasel";
  };

  hostGroup = with hosts; rec {
    untrusted  = [ fox ];
    compute    = [ owl1 owl2 ];
-    playground = [ eudy koro ];
+    playground = [ eudy koro weasel ];
    storage    = [ bay lake2 ];
    monitor    = [ hut ];
+    login      = [ apex ];

-    system     = storage ++ monitor;
+    system     = storage ++ monitor ++ login;
    safe       = system ++ compute;
    all        = safe ++ playground;
  };
--- a/m/apex/configuration.nix
+++ b/m/apex/configuration.nix
@@ -0,0 +1,86 @@
+{ lib, config, pkgs, ... }:
+
+{
+  imports = [
+    ../common/xeon.nix
+    ../common/ssf/hosts.nix
+    ../module/ceph.nix
+    ../module/power-policy.nix
+    ./nfs.nix
+  ];
+
+  power.policy = "always-on";
+
+  # Don't install grub MBR for now
+  boot.loader.grub.device = "nodev";
+
+  boot.initrd.kernelModules = [
+    "megaraid_sas" # For HW RAID
+  ];
+
+  environment.systemPackages = with pkgs; [
+    storcli # To manage HW RAID
+  ];
+
+  fileSystems."/home" = {
+    device = "/dev/disk/by-label/home";
+    fsType = "ext4";
+  };
+
+  # No swap, there is plenty of RAM
+  swapDevices = lib.mkForce [];
+
+  networking = {
+    hostName = "apex";
+    defaultGateway = "84.88.53.233";
+    nameservers = [ "8.8.8.8" ];
+
+    # Public facing interface
+    interfaces.eno1.ipv4.addresses = [ {
+      address = "84.88.53.236";
+      prefixLength = 29;
+    } ];
+
+    # Internal LAN to our Ethernet switch
+    interfaces.eno2.ipv4.addresses = [ {
+      address = "10.0.40.30";
+      prefixLength = 24;
+    } ];
+
+    # Infiniband over Omnipath switch (disconnected for now)
+    # interfaces.ibp5s0 = {};
+
+    nat = {
+      enable = true;
+      internalInterfaces = [ "eno2" ];
+      externalInterface = "eno1";
+    };
+  };
+
+  # Use SSH tunnel to reach internal hosts
+  programs.ssh.extraConfig = ''
+    Host bscpm04.bsc.es gitlab-internal.bsc.es knights3.bsc.es
+      ProxyCommand nc -X connect -x localhost:23080 %h %p
+    Host raccoon
+      HostName knights3.bsc.es
+      ProxyCommand nc -X connect -x localhost:23080 %h %p
+    Host tent
+      ProxyJump raccoon
+  '';
+
+  networking.firewall = {
+    extraCommands = ''
+      # Blackhole BSC vulnerability scanner (OpenVAS) as it is spamming our
+      # logs. Insert as first position so we also protect SSH.
+      iptables -I nixos-fw 1 -p tcp -s 192.168.8.16 -j nixos-fw-refuse
+      # Same with opsmonweb01.bsc.es which seems to be trying to access via SSH
+      iptables -I nixos-fw 2 -p tcp -s 84.88.52.176 -j nixos-fw-refuse
+    '';
+  };
+
+  # Use tent for cache
+  nix.settings = {
+    extra-substituters = [ "https://jungle.bsc.es/cache" ];
+    extra-trusted-public-keys = [ "jungle.bsc.es:pEc7MlAT0HEwLQYPtpkPLwRsGf80ZI26aj29zMw/HH0=" ];
+  };
+}
--- a/m/apex/nfs.nix
+++ b/m/apex/nfs.nix
@@ -0,0 +1,32 @@
+{ ... }:
+
+{
+  services.nfs.server = {
+    enable = true;
+    lockdPort = 4001;
+    mountdPort = 4002;
+    statdPort = 4000;
+    exports = ''
+      /home 10.0.40.0/24(rw,async,no_subtree_check,no_root_squash)
+    '';
+  };
+  networking.firewall = {
+    # Check with `rpcinfo -p`
+    extraCommands = ''
+      # Accept NFS traffic from compute nodes but not from the outside
+      iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 111   -j nixos-fw-accept
+      iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 2049  -j nixos-fw-accept
+      iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 4000  -j nixos-fw-accept
+      iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 4001  -j nixos-fw-accept
+      iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 4002  -j nixos-fw-accept
+      iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 20048 -j nixos-fw-accept
+      # Same but UDP
+      iptables -A nixos-fw -p udp -s 10.0.40.0/24 --dport 111   -j nixos-fw-accept
+      iptables -A nixos-fw -p udp -s 10.0.40.0/24 --dport 2049  -j nixos-fw-accept
+      iptables -A nixos-fw -p udp -s 10.0.40.0/24 --dport 4000  -j nixos-fw-accept
+      iptables -A nixos-fw -p udp -s 10.0.40.0/24 --dport 4001  -j nixos-fw-accept
+      iptables -A nixos-fw -p udp -s 10.0.40.0/24 --dport 4002  -j nixos-fw-accept
+      iptables -A nixos-fw -p udp -s 10.0.40.0/24 --dport 20048 -j nixos-fw-accept
+    '';
+  };
+}
--- a/m/common/base.nix
+++ b/m/common/base.nix
@@ -3,6 +3,7 @@
  # Includes the basic configuration for an Intel server.
  imports = [
    ./base/agenix.nix
+    ./base/always-power-on.nix
    ./base/august-shutdown.nix
    ./base/boot.nix
    ./base/env.nix
--- a/m/common/base/august-shutdown.nix
+++ b/m/common/base/august-shutdown.nix
@@ -1,12 +1,12 @@
 {
-  # Shutdown all machines on August 2nd at 11:00 AM, so we can protect the
+  # Shutdown all machines on August 3rd at 22:00, so we can protect the
  # hardware from spurious electrical peaks on the yearly electrical cut for
  # manteinance that starts on August 4th.
  systemd.timers.august-shutdown = {
-    description = "Shutdown on August 2nd for maintenance";
+    description = "Shutdown on August 3rd for maintenance";
    wantedBy = [ "timers.target" ];
    timerConfig = {
-      OnCalendar = "*-08-02 11:00:00";
+      OnCalendar = "*-08-03 22:00:00";
      RandomizedDelaySec = "10min";
      Unit = "systemd-poweroff.service";
    };
--- a/m/common/base/env.nix
+++ b/m/common/base/env.nix
@@ -3,8 +3,8 @@
 {
  environment.systemPackages = with pkgs; [
    vim wget git htop tmux pciutils tcpdump ripgrep nix-index nixos-option
-    nix-diff ipmitool freeipmi ethtool lm_sensors ix cmake gnumake file tree
-    ncdu config.boot.kernelPackages.perf ldns
+    nix-diff ipmitool freeipmi ethtool lm_sensors cmake gnumake file tree
+    ncdu config.boot.kernelPackages.perf ldns pv
    # From bsckgs overlay
    osumb
  ];
--- a/m/common/base/net.nix
+++ b/m/common/base/net.nix
@@ -1,4 +1,4 @@
-{ pkgs, ... }:
+{ pkgs, lib, ... }:

 {
  networking = {
@@ -10,8 +10,11 @@
      allowedTCPPorts = [ 22 ];
    };

+    # Make sure we use iptables
+    nftables.enable = lib.mkForce false;
+
    hosts = {
-      "84.88.53.236" = [ "ssfhead.bsc.es" "ssfhead" ];
+      "84.88.53.236" = [ "apex" "ssfhead.bsc.es" "ssfhead" ];
      "84.88.51.152" = [ "raccoon" ];
      "84.88.51.142" = [ "raccoon-ipmi" ];
    };
--- a/m/common/base/nix.nix
+++ b/m/common/base/nix.nix
@@ -6,6 +6,8 @@
    (import ../../../pkgs/overlay.nix)
  ];

+  nixpkgs.config.allowUnfree = true;
+
  nix = {
    nixPath = [
      "nixpkgs=${nixpkgs}"
--- a/m/common/base/power-policy.nix
+++ b/m/common/base/power-policy.nix
@@ -0,0 +1,9 @@
+{
+  imports = [
+    ../../module/power-policy.nix
+  ];
+
+  # By default, keep the machines off as we don't know if the AC will be working
+  # once the electricity comes back.
+  power.policy = "always-off";
+}
--- a/m/common/base/users.nix
+++ b/m/common/base/users.nix
@@ -56,7 +56,7 @@
        home = "/home/Computational/rpenacob";
        description = "Raúl Peñacoba";
        group = "Computational";
-        hosts = [ "owl1" "owl2" "hut" "tent" "fox" ];
+        hosts = [ "apex" "owl1" "owl2" "hut" "tent" "fox" ];
        hashedPassword = "$6$TZm3bDIFyPrMhj1E$uEDXoYYd1z2Wd5mMPfh3DZAjP7ztVjJ4ezIcn82C0ImqafPA.AnTmcVftHEzLB3tbe2O4SxDyPSDEQgJ4GOtj/";
        openssh.authorizedKeys.keys = [
          "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFYfXg37mauGeurqsLpedgA2XQ9d4Nm0ZGo/hI1f7wwH rpenacob@bsc"
@@ -69,10 +69,10 @@
        home = "/home/Computational/anavarro";
        description = "Antoni Navarro";
        group = "Computational";
-        hosts = [ "hut" "tent" "raccoon" "fox" ];
-        hashedPassword = "$6$QdNDsuLehoZTYZlb$CDhCouYDPrhoiB7/seu7RF.Gqg4zMQz0n5sA4U1KDgHaZOxy2as9pbIGeF8tOHJKRoZajk5GiaZv0rZMn7Oq31";
+        hosts = [ "apex" "hut" "tent" "raccoon" "fox" "weasel" ];
+        hashedPassword = "$6$EgturvVYXlKgP43g$gTN78LLHIhaF8hsrCXD.O6mKnZSASWSJmCyndTX8QBWT6wTlUhcWVAKz65lFJPXjlJA4u7G1ydYQ0GG6Wk07b1";
        openssh.authorizedKeys.keys = [
-          "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAILWjRSlKgzBPZQhIeEtk6Lvws2XNcYwHcwPv4osSgst5 anavarro@ssfhead"
+          "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIMsbM21uepnJwPrRe6jYFz8zrZ6AYMtSEvvt4c9spmFP toni@delltoni"
        ];
      };

@@ -82,7 +82,7 @@
        home = "/home/Computational/abonerib";
        description = "Aleix Boné";
        group = "Computational";
-        hosts = [ "owl1" "owl2" "hut" "tent" "raccoon" "fox" ];
+        hosts = [ "apex" "owl1" "owl2" "hut" "tent" "raccoon" "fox" "weasel" ];
        hashedPassword = "$6$V1EQWJr474whv7XJ$OfJ0wueM2l.dgiJiiah0Tip9ITcJ7S7qDvtSycsiQ43QBFyP4lU0e0HaXWps85nqB4TypttYR4hNLoz3bz662/";
        openssh.authorizedKeys.keys = [
          "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIIIFiqXqt88VuUfyANkZyLJNiuroIITaGlOOTMhVDKjf abonerib@bsc"
@@ -95,7 +95,7 @@
        home = "/home/Computational/vlopez";
        description = "Victor López";
        group = "Computational";
-        hosts = [ "koro" ];
+        hosts = [ "apex" "koro" ];
        hashedPassword = "$6$0ZBkgIYE/renVqtt$1uWlJsb0FEezRVNoETTzZMx4X2SvWiOsKvi0ppWCRqI66S6TqMBXBdP4fcQyvRRBt0e4Z7opZIvvITBsEtO0f0";
        openssh.authorizedKeys.keys = [
          "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGMwlUZRf9jfG666Qa5Sb+KtEhXqkiMlBV2su3x/dXHq victor@arch"
@@ -108,7 +108,7 @@
        home = "/home/Computational/dbautist";
        description = "Dylan Bautista Cases";
        group = "Computational";
-        hosts = [ "hut" "tent" "raccoon" ];
+        hosts = [ "apex" "hut" "tent" "raccoon" ];
        hashedPassword = "$6$a2lpzMRVkG9nSgIm$12G6.ka0sFX1YimqJkBAjbvhRKZ.Hl090B27pdbnQOW0wzyxVWySWhyDDCILjQELky.HKYl9gqOeVXW49nW7q/";
        openssh.authorizedKeys.keys = [
          "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAb+EQBoS98zrCwnGKkHKwMLdYABMTqv7q9E0+T0QmkS dbautist@bsc-848818791"
@@ -121,7 +121,7 @@
        home = "/home/Computational/dalvare1";
        description = "David Álvarez";
        group = "Computational";
-        hosts = [ "hut" "tent" "fox" ];
+        hosts = [ "apex" "hut" "tent" "fox" ];
        hashedPassword = "$6$mpyIsV3mdq.rK8$FvfZdRH5OcEkUt5PnIUijWyUYZvB1SgeqxpJ2p91TTe.3eQIDTcLEQ5rxeg.e5IEXAZHHQ/aMsR5kPEujEghx0";
        openssh.authorizedKeys.keys = [
          "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGEfy6F4rF80r4Cpo2H5xaWqhuUZzUsVsILSKGJzt5jF dalvare1@ssfhead"
@@ -134,7 +134,7 @@
        home = "/home/Computational/varcila";
        description = "Vincent Arcila";
        group = "Computational";
-        hosts = [ "hut" "tent" "fox" ];
+        hosts = [ "apex" "hut" "tent" "fox" ];
        hashedPassword = "$6$oB0Tcn99DcM4Ch$Vn1A0ulLTn/8B2oFPi9wWl/NOsJzaFAWjqekwcuC9sMC7cgxEVb.Nk5XSzQ2xzYcNe5MLtmzkVYnRS1CqP39Y0";
        openssh.authorizedKeys.keys = [
          "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIKGt0ESYxekBiHJQowmKpfdouw0hVm3N7tUMtAaeLejK vincent@varch"
--- a/m/common/ssf.nix
+++ b/m/common/ssf.nix
@@ -3,6 +3,7 @@
  imports = [
    ./xeon.nix
    ./ssf/fs.nix
+    ./ssf/hosts.nix
    ./ssf/net.nix
    ./ssf/ssh.nix
  ];
--- a/m/common/ssf/hosts.nix
+++ b/m/common/ssf/hosts.nix
@@ -0,0 +1,23 @@
+{ pkgs, ... }:
+
+{
+  networking.hosts = {
+    # Login
+    "10.0.40.30" = [ "apex" ];
+
+    # Storage
+    "10.0.40.40" = [ "bay" ];   "10.0.42.40" = [ "bay-ib" ];    "10.0.40.141" = [ "bay-ipmi" ];
+    "10.0.40.41" = [ "oss01" ]; "10.0.42.41" = [ "oss01-ib0" ]; "10.0.40.142" = [ "oss01-ipmi" ];
+    "10.0.40.42" = [ "lake2" ]; "10.0.42.42" = [ "lake2-ib" ];  "10.0.40.143" = [ "lake2-ipmi" ];
+
+    # Xeon compute
+    "10.0.40.1" = [ "owl1" ];   "10.0.42.1" = [ "owl1-ib" ];   "10.0.40.101" = [ "owl1-ipmi" ];
+    "10.0.40.2" = [ "owl2" ];   "10.0.42.2" = [ "owl2-ib" ];   "10.0.40.102" = [ "owl2-ipmi" ];
+    "10.0.40.3" = [ "xeon03" ]; "10.0.42.3" = [ "xeon03-ib" ]; "10.0.40.103" = [ "xeon03-ipmi" ];
+    #"10.0.40.4" = [ "tent" ];   "10.0.42.4" = [ "tent-ib" ];   "10.0.40.104" = [ "tent-ipmi" ];
+    "10.0.40.5" = [ "koro" ];   "10.0.42.5" = [ "koro-ib" ];   "10.0.40.105" = [ "koro-ipmi" ];
+    "10.0.40.6" = [ "weasel" ]; "10.0.42.6" = [ "weasel-ib" ]; "10.0.40.106" = [ "weasel-ipmi" ];
+    "10.0.40.7" = [ "hut" ];    "10.0.42.7" = [ "hut-ib" ];    "10.0.40.107" = [ "hut-ipmi" ];
+    "10.0.40.8" = [ "eudy" ];   "10.0.42.8" = [ "eudy-ib" ];   "10.0.40.108" = [ "eudy-ipmi" ];
+  };
+}
--- a/m/common/ssf/net.nix
+++ b/m/common/ssf/net.nix
@@ -9,14 +9,6 @@
    defaultGateway = "10.0.40.30";
    nameservers = ["8.8.8.8"];

-    proxy = {
-      default = "http://hut:23080/";
-      noProxy = "127.0.0.1,localhost,internal.domain,10.0.40.40,hut";
-      # Don't set all_proxy as go complains and breaks the gitlab runner, see:
-      # https://github.com/golang/go/issues/16715
-      allProxy = null;
-    };
-
    firewall = {
      extraCommands = ''
        # Prevent ssfhead from contacting our slurmd daemon
@@ -27,64 +19,5 @@
        iptables -A nixos-fw -p tcp -s 10.0.40.0/24 --dport 60000:61000 -j nixos-fw-accept
      '';
    };
-
-    extraHosts = ''
-      10.0.40.30              ssfhead
-      
-      # Node Entry for node: mds01 (ID=72)
-      10.0.40.40              bay mds01 mds01-eth0
-      10.0.42.40              bay-ib mds01-ib0
-      10.0.40.141             bay-ipmi mds01-ipmi0 mds01-ipmi
-      
-      # Node Entry for node: oss01 (ID=73)
-      10.0.40.41              oss01 oss01-eth0
-      10.0.42.41              oss01-ib0
-      10.0.40.142             oss01-ipmi0 oss01-ipmi
-      
-      # Node Entry for node: oss02 (ID=74)
-      10.0.40.42              lake2 oss02 oss02-eth0
-      10.0.42.42              lake2-ib oss02-ib0
-      10.0.40.143             lake2-ipmi oss02-ipmi0 oss02-ipmi
-      
-      # Node Entry for node: xeon01 (ID=15)
-      10.0.40.1               owl1 xeon01 xeon01-eth0
-      10.0.42.1               owl1-ib xeon01-ib0
-      10.0.40.101             owl1-ipmi xeon01-ipmi0 xeon01-ipmi
-      
-      # Node Entry for node: xeon02 (ID=16)
-      10.0.40.2               owl2 xeon02 xeon02-eth0
-      10.0.42.2               owl2-ib xeon02-ib0
-      10.0.40.102             owl2-ipmi xeon02-ipmi0 xeon02-ipmi
-      
-      # Node Entry for node: xeon03 (ID=17)
-      10.0.40.3               xeon03 xeon03-eth0
-      10.0.42.3               xeon03-ib0
-      10.0.40.103             xeon03-ipmi0 xeon03-ipmi
-      
-      # Node Entry for node: xeon04 (ID=18)
-      10.0.40.4               xeon04 xeon04-eth0
-      10.0.42.4               xeon04-ib0
-      10.0.40.104             xeon04-ipmi0 xeon04-ipmi
-      
-      # Node Entry for node: xeon05 (ID=19)
-      10.0.40.5               koro xeon05 xeon05-eth0
-      10.0.42.5               koro-ib xeon05-ib0
-      10.0.40.105             koro-ipmi xeon05-ipmi0
-      
-      # Node Entry for node: xeon06 (ID=20)
-      10.0.40.6               xeon06 xeon06-eth0
-      10.0.42.6               xeon06-ib0
-      10.0.40.106             xeon06-ipmi0 xeon06-ipmi
-      
-      # Node Entry for node: xeon07 (ID=21)
-      10.0.40.7               hut xeon07 xeon07-eth0
-      10.0.42.7               hut-ib xeon07-ib0
-      10.0.40.107             hut-ipmi xeon07-ipmi0 xeon07-ipmi
-      
-      # Node Entry for node: xeon08 (ID=22)
-      10.0.40.8               eudy xeon08 xeon08-eth0
-      10.0.42.8               eudy-ib xeon08-ib0
-      10.0.40.108             eudy-ipmi xeon08-ipmi0 xeon08-ipmi
-    '';
  };
 }
--- a/m/common/ssf/ssh.nix
+++ b/m/common/ssf/ssh.nix
@@ -1,8 +1,16 @@
 {
-  # Connect to intranet git hosts via proxy
+  # Use SSH tunnel to apex to reach internal hosts
  programs.ssh.extraConfig = ''
-    # Connect to BSC machines via hut proxy too
-    Host amdlogin1.bsc.es armlogin1.bsc.es hualogin1.bsc.es glogin1.bsc.es glogin2.bsc.es fpgalogin1.bsc.es
-      ProxyCommand nc -X connect -x hut:23080 %h %p
+    Host tent
+      ProxyJump raccoon
+
+    # Access raccoon via the HTTP proxy
+    Host raccoon knights3.bsc.es
+      HostName knights3.bsc.es
+      ProxyCommand=ssh apex 'nc -X connect -x localhost:23080 %h %p'
+
+    # Make sure we can reach gitlab even if we don't have SSH access to raccoon
+    Host bscpm04.bsc.es gitlab-internal.bsc.es
+      ProxyCommand=ssh apex 'nc -X connect -x localhost:23080 %h %p'
  '';
 }
--- a/m/fox/configuration.nix
+++ b/m/fox/configuration.nix
@@ -5,8 +5,16 @@
    ../common/base.nix
    ../common/xeon/console.nix
    ../module/emulation.nix
+    ../module/nvidia.nix
+    ../module/power-policy.nix
  ];

+  power.policy = "always-on";
+
+  # Don't turn off on August as UPC has different dates.
+  # Fox works fine on power cuts.
+  systemd.timers.august-shutdown.enable = false;
+
  # Select the this using the ID to avoid mismatches
  boot.loader.grub.device = "/dev/disk/by-id/wwn-0x500a07514b0c1103";

@@ -53,12 +61,8 @@
    extra-trusted-public-keys = [ "jungle.bsc.es:pEc7MlAT0HEwLQYPtpkPLwRsGf80ZI26aj29zMw/HH0=" ];
  };

-  # Configure Nvidia driver to use with CUDA
-  hardware.nvidia.package = config.boot.kernelPackages.nvidiaPackages.production;
-  hardware.graphics.enable = true;
-  nixpkgs.config.allowUnfree = true;
-  nixpkgs.config.nvidia.acceptLicense = true;
-  services.xserver.videoDrivers = [ "nvidia" ];
+  # Recommended for new graphics cards
+  hardware.nvidia.open = true;

  # Mount NVME disks
  fileSystems."/nvme0" = { device = "/dev/disk/by-label/nvme0"; fsType = "ext4"; };
--- a/m/hut/blackbox.yml
+++ b/m/hut/blackbox.yml
@@ -3,160 +3,12 @@ modules:
    prober: http
    timeout: 5s
    http:
-      proxy_url: "http://127.0.0.1:23080"
-      skip_resolve_phase_with_proxy: true
-      follow_redirects: true
-      valid_status_codes: []  # Defaults to 2xx
-      method: GET
-  http_with_proxy:
-    prober: http
-    http:
-      proxy_url: "http://127.0.0.1:3128"
-      skip_resolve_phase_with_proxy: true
-  http_with_proxy_and_headers:
-    prober: http
-    http:
-      proxy_url: "http://127.0.0.1:3128"
-      proxy_connect_header:
-        Proxy-Authorization:
-          - Bearer token
-  http_post_2xx:
-    prober: http
-    timeout: 5s
-    http:
-      method: POST
-      headers:
-        Content-Type: application/json
-      body: '{}'
-  http_post_body_file:
-    prober: http
-    timeout: 5s
-    http:
-      method: POST
-      body_file: "/files/body.txt"
-  http_basic_auth_example:
-    prober: http
-    timeout: 5s
-    http:
-      method: POST
-      headers:
-        Host: "login.example.com"
-      basic_auth:
-        username: "username"
-        password: "mysecret"
-  http_2xx_oauth_client_credentials:
-    prober: http
-    timeout: 5s
-    http:
-      valid_http_versions: ["HTTP/1.1", "HTTP/2"]
      follow_redirects: true
      preferred_ip_protocol: "ip4"
-      valid_status_codes:
-        - 200
-        - 201
-      oauth2:
-        client_id: "client_id"
-        client_secret: "client_secret"
-        token_url: "https://api.example.com/token"
-        endpoint_params:
-          grant_type: "client_credentials"
-  http_custom_ca_example:
-    prober: http
-    http:
+      valid_status_codes: []  # Defaults to 2xx
      method: GET
-      tls_config:
-        ca_file: "/certs/my_cert.crt"
-  http_gzip:
-    prober: http
-    http:
-      method: GET
-      compression: gzip
-  http_gzip_with_accept_encoding:
-    prober: http
-    http:
-      method: GET
-      compression: gzip
-      headers:
-        Accept-Encoding: gzip
-  tls_connect:
-    prober: tcp
-    timeout: 5s
-    tcp:
-      tls: true
-  tcp_connect_example:
-    prober: tcp
-    timeout: 5s
-  imap_starttls:
-    prober: tcp
-    timeout: 5s
-    tcp:
-      query_response:
-        - expect: "OK.*STARTTLS"
-        - send: ". STARTTLS"
-        - expect: "OK"
-        - starttls: true
-        - send: ". capability"
-        - expect: "CAPABILITY IMAP4rev1"
-  smtp_starttls:
-    prober: tcp
-    timeout: 5s
-    tcp:
-      query_response:
-        - expect: "^220 ([^ ]+) ESMTP (.+)$"
-        - send: "EHLO prober\r"
-        - expect: "^250-STARTTLS"
-        - send: "STARTTLS\r"
-        - expect: "^220"
-        - starttls: true
-        - send: "EHLO prober\r"
-        - expect: "^250-AUTH"
-        - send: "QUIT\r"
-  irc_banner_example:
-    prober: tcp
-    timeout: 5s
-    tcp:
-      query_response:
-        - send: "NICK prober"
-        - send: "USER prober prober prober :prober"
-        - expect: "PING :([^ ]+)"
-          send: "PONG ${1}"
-        - expect: "^:[^ ]+ 001"
  icmp:
    prober: icmp
    timeout: 5s
    icmp:
      preferred_ip_protocol: "ip4"
-  dns_udp_example:
-    prober: dns
-    timeout: 5s
-    dns:
-      query_name: "www.prometheus.io"
-      query_type: "A"
-      valid_rcodes:
-        - NOERROR
-      validate_answer_rrs:
-        fail_if_matches_regexp:
-          - ".*127.0.0.1"
-        fail_if_all_match_regexp:
-          - ".*127.0.0.1"
-        fail_if_not_matches_regexp:
-          - "www.prometheus.io.\t300\tIN\tA\t127.0.0.1"
-        fail_if_none_matches_regexp:
-          - "127.0.0.1"
-      validate_authority_rrs:
-        fail_if_matches_regexp:
-          - ".*127.0.0.1"
-      validate_additional_rrs:
-        fail_if_matches_regexp:
-          - ".*127.0.0.1"
-  dns_soa:
-    prober: dns
-    dns:
-      query_name: "prometheus.io"
-      query_type: "SOA"
-  dns_tcp_example:
-    prober: dns
-    dns:
-      transport_protocol: "tcp" # defaults to "udp"
-      preferred_ip_protocol: "ip4" # defaults to "ip6"
-      query_name: "www.prometheus.io"
--- a/m/hut/targets.yml
+++ b/m/hut/targets.yml
@@ -4,7 +4,7 @@
  - xeon03-ipmi
  - xeon04-ipmi
  - koro-ipmi
-  - xeon06-ipmi
+  - weasel-ipmi
  - hut-ipmi
  - eudy-ipmi
  # Storage
--- a/m/map.nix
+++ b/m/map.nix
@@ -6,7 +6,7 @@
    switch-opa = { pos=41; size=1; };

    # SSF login
-    ssfhead = { pos=39; size=2; label="SSFHEAD"; board="R2208WTTYSR"; contact="operations@bsc.es"; };
+    apex = { pos=39; size=2; label="SSFHEAD"; board="R2208WTTYSR"; contact="rodrigo.arias@bsc.es"; };

    # Storage
    bay   = { pos=38; size=1; label="MDS01"; board="S2600WT2R"; sn="BQWL64850303"; contact="rodrigo.arias@bsc.es"; };
@@ -19,7 +19,7 @@
    xeon03 = { pos=33; size=1; label="SSF-XEON03"; board="S2600WTTR"; sn="BQWL64750826"; contact="rodrigo.arias@bsc.es"; };
    # Slot 34 empty
    koro   = { pos=31; size=1; label="SSF-XEON05"; board="S2600WTTR"; sn="BQWL64954293"; contact="rodrigo.arias@bsc.es"; };
-    xeon06 = { pos=30; size=1; label="SSF-XEON06"; board="S2600WTTR"; sn="BQWL64750846"; contact="antoni.navarro@bsc.es"; };
+    weasel = { pos=30; size=1; label="SSF-XEON06"; board="S2600WTTR"; sn="BQWL64750846"; contact="antoni.navarro@bsc.es"; };
    hut    = { pos=29; size=1; label="SSF-XEON07"; board="S2600WTTR"; sn="BQWL64751184"; contact="rodrigo.arias@bsc.es"; };
    eudy   = { pos=28; size=1; label="SSF-XEON08"; board="S2600WTTR"; sn="BQWL64756586"; contact="aleix.rocanonell@bsc.es"; };

--- a/m/module/nvidia.nix
+++ b/m/module/nvidia.nix
@@ -0,0 +1,20 @@
+{ lib, config, pkgs, ... }:
+{
+  # Configure Nvidia driver to use with CUDA
+  hardware.nvidia.package = config.boot.kernelPackages.nvidiaPackages.production;
+  hardware.nvidia.open = lib.mkDefault (builtins.abort "hardware.nvidia.open not set");
+  hardware.graphics.enable = true;
+  nixpkgs.config.nvidia.acceptLicense = true;
+  services.xserver.videoDrivers = [ "nvidia" ];
+
+  # enable support for derivations which require nvidia-gpu to be available
+  # > requiredSystemFeatures = [ "cuda" ];
+  programs.nix-required-mounts.enable = true;
+  programs.nix-required-mounts.presets.nvidia-gpu.enable = true;
+  # They forgot to add the symlink
+  programs.nix-required-mounts.allowedPatterns.nvidia-gpu.paths = [
+    config.systemd.tmpfiles.settings.graphics-driver."/run/opengl-driver"."L+".argument
+  ];
+
+  environment.systemPackages = [ pkgs.cudainfo ];
+}
--- a/m/module/power-policy.nix
+++ b/m/module/power-policy.nix
@@ -0,0 +1,33 @@
+{ config, lib, pkgs, ... }:
+
+with lib;
+
+let
+  cfg = config.power.policy;
+in
+{
+  options = {
+    power.policy = mkOption {
+      type = types.nullOr (types.enum [ "always-on" "previous" "always-off" ]);
+      default = null;
+      description = "Set power policy to use via IPMI.";
+    };
+  };
+
+  config = mkIf (cfg != null) {
+    systemd.services."power-policy" = {
+      description = "Set power policy to use via IPMI";
+      wantedBy = [ "multi-user.target" ];
+      unitConfig = {
+        StartLimitBurst = "10";
+        StartLimitIntervalSec = "10m";
+      };
+      serviceConfig = {
+        ExecStart = "${pkgs.ipmitool}/bin/ipmitool chassis policy ${cfg}";
+        Type = "oneshot";
+        Restart = "on-failure";
+        RestartSec = "5s";
+      };
+    };
+  };
+}
--- a/m/module/ssh-hut-extern.nix
+++ b/m/module/ssh-hut-extern.nix
@@ -1,9 +1,8 @@
 {
  programs.ssh.extraConfig = ''
-    Host ssfhead
+    Host apex ssfhead
      HostName ssflogin.bsc.es
    Host hut
-      ProxyJump ssfhead
-      HostName xeon07
+      ProxyJump apex
  '';
 }
--- a/m/raccoon/configuration.nix
+++ b/m/raccoon/configuration.nix
@@ -6,9 +6,13 @@
    ../module/emulation.nix
    ../module/debuginfod.nix
    ../module/ssh-hut-extern.nix
+    ../module/nvidia.nix
+    ../module/power-policy.nix
    ../eudy/kernel/perf.nix
  ];

+  power.policy = "always-on";
+
  # Don't install Grub on the disk yet
  boot.loader.grub.device = "nodev";

@@ -49,15 +53,7 @@
  # Enable performance governor
  powerManagement.cpuFreqGovernor = "performance";

-  # Configure Nvidia driver to use with CUDA
-  hardware.nvidia.package = config.boot.kernelPackages.nvidiaPackages.production;
-  hardware.graphics.enable = true;
-  nixpkgs.config.allowUnfree = true;
-  nixpkgs.config.nvidia.acceptLicense = true;
-  services.xserver.videoDrivers = [ "nvidia" ];
-
-  # Disable garbage collection for now
-  nix.gc.automatic = lib.mkForce false;
+  hardware.nvidia.open = false; # Maxwell is older than Turing architecture

  services.openssh.settings.X11Forwarding = true;

--- a/m/weasel/configuration.nix
+++ b/m/weasel/configuration.nix
@@ -0,0 +1,28 @@
+{ lib, ... }:
+
+{
+  imports = [
+    ../common/ssf.nix
+  ];
+
+  # Select this using the ID to avoid mismatches
+  boot.loader.grub.device = "/dev/disk/by-id/wwn-0x55cd2e414d5356ca";
+
+  # No swap, there is plenty of RAM
+  swapDevices = lib.mkForce [];
+
+  # Users with sudo access
+  users.groups.wheel.members = [ "abonerib" "anavarro" ];
+
+  networking = {
+    hostName = "weasel";
+    interfaces.eno1.ipv4.addresses = [ {
+      address = "10.0.40.6";
+      prefixLength = 24;
+    } ];
+    interfaces.ibp5s0.ipv4.addresses = [ {
+      address = "10.0.42.6";
+      prefixLength = 24;
+    } ];
+  };
+}
--- a/pkgs/cudainfo/Makefile
+++ b/pkgs/cudainfo/Makefile
@@ -0,0 +1,12 @@
+HOSTCXX  ?= g++
+NVCC     := nvcc -ccbin $(HOSTCXX)
+CXXFLAGS := -m64
+
+# Target rules
+all: cudainfo
+
+cudainfo: cudainfo.cpp
+	$(NVCC) $(CXXFLAGS) -o $@ $<
+
+clean:
+	rm -f cudainfo cudainfo.o
--- a/pkgs/cudainfo/cudainfo.cpp
+++ b/pkgs/cudainfo/cudainfo.cpp
@@ -0,0 +1,600 @@
+/*
+ * Copyright 1993-2015 NVIDIA Corporation.  All rights reserved.
+ *
+ * Please refer to the NVIDIA end user license agreement (EULA) associated
+ * with this source code for terms and conditions that govern your use of
+ * this software. Any use, reproduction, disclosure, or distribution of
+ * this software and related documentation outside the terms of the EULA
+ * is strictly prohibited.
+ *
+ */
+/* This sample queries the properties of the CUDA devices present in the system via CUDA Runtime API. */
+
+// Shared Utilities (QA Testing)
+
+// std::system includes
+#include <memory>
+#include <iostream>
+
+#include <cuda_runtime.h>
+
+// This will output the proper CUDA error strings in the event that a CUDA host call returns an error
+#define checkCudaErrors(val)           check ( (val), #val, __FILE__, __LINE__ )
+
+// CUDA Runtime error messages
+#ifdef __DRIVER_TYPES_H__
+static const char *_cudaGetErrorEnum(cudaError_t error)
+{
+    switch (error)
+    {
+        case cudaSuccess:
+            return "cudaSuccess";
+
+        case cudaErrorMissingConfiguration:
+            return "cudaErrorMissingConfiguration";
+
+        case cudaErrorMemoryAllocation:
+            return "cudaErrorMemoryAllocation";
+
+        case cudaErrorInitializationError:
+            return "cudaErrorInitializationError";
+
+        case cudaErrorLaunchFailure:
+            return "cudaErrorLaunchFailure";
+
+        case cudaErrorPriorLaunchFailure:
+            return "cudaErrorPriorLaunchFailure";
+
+        case cudaErrorLaunchTimeout:
+            return "cudaErrorLaunchTimeout";
+
+        case cudaErrorLaunchOutOfResources:
+            return "cudaErrorLaunchOutOfResources";
+
+        case cudaErrorInvalidDeviceFunction:
+            return "cudaErrorInvalidDeviceFunction";
+
+        case cudaErrorInvalidConfiguration:
+            return "cudaErrorInvalidConfiguration";
+
+        case cudaErrorInvalidDevice:
+            return "cudaErrorInvalidDevice";
+
+        case cudaErrorInvalidValue:
+            return "cudaErrorInvalidValue";
+
+        case cudaErrorInvalidPitchValue:
+            return "cudaErrorInvalidPitchValue";
+
+        case cudaErrorInvalidSymbol:
+            return "cudaErrorInvalidSymbol";
+
+        case cudaErrorMapBufferObjectFailed:
+            return "cudaErrorMapBufferObjectFailed";
+
+        case cudaErrorUnmapBufferObjectFailed:
+            return "cudaErrorUnmapBufferObjectFailed";
+
+        case cudaErrorInvalidHostPointer:
+            return "cudaErrorInvalidHostPointer";
+
+        case cudaErrorInvalidDevicePointer:
+            return "cudaErrorInvalidDevicePointer";
+
+        case cudaErrorInvalidTexture:
+            return "cudaErrorInvalidTexture";
+
+        case cudaErrorInvalidTextureBinding:
+            return "cudaErrorInvalidTextureBinding";
+
+        case cudaErrorInvalidChannelDescriptor:
+            return "cudaErrorInvalidChannelDescriptor";
+
+        case cudaErrorInvalidMemcpyDirection:
+            return "cudaErrorInvalidMemcpyDirection";
+
+        case cudaErrorAddressOfConstant:
+            return "cudaErrorAddressOfConstant";
+
+        case cudaErrorTextureFetchFailed:
+            return "cudaErrorTextureFetchFailed";
+
+        case cudaErrorTextureNotBound:
+            return "cudaErrorTextureNotBound";
+
+        case cudaErrorSynchronizationError:
+            return "cudaErrorSynchronizationError";
+
+        case cudaErrorInvalidFilterSetting:
+            return "cudaErrorInvalidFilterSetting";
+
+        case cudaErrorInvalidNormSetting:
+            return "cudaErrorInvalidNormSetting";
+
+        case cudaErrorMixedDeviceExecution:
+            return "cudaErrorMixedDeviceExecution";
+
+        case cudaErrorCudartUnloading:
+            return "cudaErrorCudartUnloading";
+
+        case cudaErrorUnknown:
+            return "cudaErrorUnknown";
+
+        case cudaErrorNotYetImplemented:
+            return "cudaErrorNotYetImplemented";
+
+        case cudaErrorMemoryValueTooLarge:
+            return "cudaErrorMemoryValueTooLarge";
+
+        case cudaErrorInvalidResourceHandle:
+            return "cudaErrorInvalidResourceHandle";
+
+        case cudaErrorNotReady:
+            return "cudaErrorNotReady";
+
+        case cudaErrorInsufficientDriver:
+            return "cudaErrorInsufficientDriver";
+
+        case cudaErrorSetOnActiveProcess:
+            return "cudaErrorSetOnActiveProcess";
+
+        case cudaErrorInvalidSurface:
+            return "cudaErrorInvalidSurface";
+
+        case cudaErrorNoDevice:
+            return "cudaErrorNoDevice";
+
+        case cudaErrorECCUncorrectable:
+            return "cudaErrorECCUncorrectable";
+
+        case cudaErrorSharedObjectSymbolNotFound:
+            return "cudaErrorSharedObjectSymbolNotFound";
+
+        case cudaErrorSharedObjectInitFailed:
+            return "cudaErrorSharedObjectInitFailed";
+
+        case cudaErrorUnsupportedLimit:
+            return "cudaErrorUnsupportedLimit";
+
+        case cudaErrorDuplicateVariableName:
+            return "cudaErrorDuplicateVariableName";
+
+        case cudaErrorDuplicateTextureName:
+            return "cudaErrorDuplicateTextureName";
+
+        case cudaErrorDuplicateSurfaceName:
+            return "cudaErrorDuplicateSurfaceName";
+
+        case cudaErrorDevicesUnavailable:
+            return "cudaErrorDevicesUnavailable";
+
+        case cudaErrorInvalidKernelImage:
+            return "cudaErrorInvalidKernelImage";
+
+        case cudaErrorNoKernelImageForDevice:
+            return "cudaErrorNoKernelImageForDevice";
+
+        case cudaErrorIncompatibleDriverContext:
+            return "cudaErrorIncompatibleDriverContext";
+
+        case cudaErrorPeerAccessAlreadyEnabled:
+            return "cudaErrorPeerAccessAlreadyEnabled";
+
+        case cudaErrorPeerAccessNotEnabled:
+            return "cudaErrorPeerAccessNotEnabled";
+
+        case cudaErrorDeviceAlreadyInUse:
+            return "cudaErrorDeviceAlreadyInUse";
+
+        case cudaErrorProfilerDisabled:
+            return "cudaErrorProfilerDisabled";
+
+        case cudaErrorProfilerNotInitialized:
+            return "cudaErrorProfilerNotInitialized";
+
+        case cudaErrorProfilerAlreadyStarted:
+            return "cudaErrorProfilerAlreadyStarted";
+
+        case cudaErrorProfilerAlreadyStopped:
+            return "cudaErrorProfilerAlreadyStopped";
+
+        /* Since CUDA 4.0*/
+        case cudaErrorAssert:
+            return "cudaErrorAssert";
+
+        case cudaErrorTooManyPeers:
+            return "cudaErrorTooManyPeers";
+
+        case cudaErrorHostMemoryAlreadyRegistered:
+            return "cudaErrorHostMemoryAlreadyRegistered";
+
+        case cudaErrorHostMemoryNotRegistered:
+            return "cudaErrorHostMemoryNotRegistered";
+
+        /* Since CUDA 5.0 */
+        case cudaErrorOperatingSystem:
+            return "cudaErrorOperatingSystem";
+
+        case cudaErrorPeerAccessUnsupported:
+            return "cudaErrorPeerAccessUnsupported";
+
+        case cudaErrorLaunchMaxDepthExceeded:
+            return "cudaErrorLaunchMaxDepthExceeded";
+
+        case cudaErrorLaunchFileScopedTex:
+            return "cudaErrorLaunchFileScopedTex";
+
+        case cudaErrorLaunchFileScopedSurf:
+            return "cudaErrorLaunchFileScopedSurf";
+
+        case cudaErrorSyncDepthExceeded:
+            return "cudaErrorSyncDepthExceeded";
+
+        case cudaErrorLaunchPendingCountExceeded:
+            return "cudaErrorLaunchPendingCountExceeded";
+
+        case cudaErrorNotPermitted:
+            return "cudaErrorNotPermitted";
+
+        case cudaErrorNotSupported:
+            return "cudaErrorNotSupported";
+
+        /* Since CUDA 6.0 */
+        case cudaErrorHardwareStackError:
+            return "cudaErrorHardwareStackError";
+
+        case cudaErrorIllegalInstruction:
+            return "cudaErrorIllegalInstruction";
+
+        case cudaErrorMisalignedAddress:
+            return "cudaErrorMisalignedAddress";
+
+        case cudaErrorInvalidAddressSpace:
+            return "cudaErrorInvalidAddressSpace";
+
+        case cudaErrorInvalidPc:
+            return "cudaErrorInvalidPc";
+
+        case cudaErrorIllegalAddress:
+            return "cudaErrorIllegalAddress";
+
+        /* Since CUDA 6.5*/
+        case cudaErrorInvalidPtx:
+            return "cudaErrorInvalidPtx";
+
+        case cudaErrorInvalidGraphicsContext:
+            return "cudaErrorInvalidGraphicsContext";
+
+        case cudaErrorStartupFailure:
+            return "cudaErrorStartupFailure";
+
+        case cudaErrorApiFailureBase:
+            return "cudaErrorApiFailureBase";
+    }
+
+    return "<unknown>";
+}
+#endif
+
+template< typename T >
+void check(T result, char const *const func, const char *const file, int const line)
+{
+    if (result)
+    {
+        fprintf(stderr, "CUDA error at %s:%d code=%d(%s) \"%s\" \n",
+                file, line, static_cast<unsigned int>(result), _cudaGetErrorEnum(result), func);
+        cudaDeviceReset();
+        // Make sure we call CUDA Device Reset before exiting
+        exit(EXIT_FAILURE);
+    }
+}
+
+int *pArgc = NULL;
+char **pArgv = NULL;
+
+#if CUDART_VERSION < 5000
+
+// CUDA-C includes
+#include <cuda.h>
+
+// This function wraps the CUDA Driver API into a template function
+template <class T>
+inline void getCudaAttribute(T *attribute, CUdevice_attribute device_attribute, int device)
+{
+    CUresult error =    cuDeviceGetAttribute(attribute, device_attribute, device);
+
+    if (CUDA_SUCCESS != error) {
+        fprintf(stderr, "cuSafeCallNoSync() Driver API error = %04d from file <%s>, line %i.\n",
+                error, __FILE__, __LINE__);
+
+        // cudaDeviceReset causes the driver to clean up all state. While
+        // not mandatory in normal operation, it is good practice.  It is also
+        // needed to ensure correct operation when the application is being
+        // profiled. Calling cudaDeviceReset causes all profile data to be
+        // flushed before the application exits
+        cudaDeviceReset();
+        exit(EXIT_FAILURE);
+    }
+}
+
+#endif /* CUDART_VERSION < 5000 */
+
+// Beginning of GPU Architecture definitions
+inline int ConvertSMVer2Cores(int major, int minor)
+{
+    // Defines for GPU Architecture types (using the SM version to determine the # of cores per SM
+    typedef struct {
+        int SM; // 0xMm (hexidecimal notation), M = SM Major version, and m = SM minor version
+        int Cores;
+    } sSMtoCores;
+
+    sSMtoCores nGpuArchCoresPerSM[] = {
+        { 0x20, 32 }, // Fermi Generation (SM 2.0) GF100 class
+        { 0x21, 48 }, // Fermi Generation (SM 2.1) GF10x class
+        { 0x30, 192}, // Kepler Generation (SM 3.0) GK10x class
+        { 0x32, 192}, // Kepler Generation (SM 3.2) GK10x class
+        { 0x35, 192}, // Kepler Generation (SM 3.5) GK11x class
+        { 0x37, 192}, // Kepler Generation (SM 3.7) GK21x class
+        { 0x50, 128}, // Maxwell Generation (SM 5.0) GM10x class
+        { 0x52, 128}, // Maxwell Generation (SM 5.2) GM20x class
+        {   -1, -1 }
+    };
+
+    int index = 0;
+
+    while (nGpuArchCoresPerSM[index].SM != -1) {
+        if (nGpuArchCoresPerSM[index].SM == ((major << 4) + minor)) {
+            return nGpuArchCoresPerSM[index].Cores;
+        }
+
+        index++;
+    }
+
+    // If we don't find the values, we default use the previous one to run properly
+    printf("MapSMtoCores for SM %d.%d is undefined.  Default to use %d Cores/SM\n", major, minor, nGpuArchCoresPerSM[index-1].Cores);
+    return nGpuArchCoresPerSM[index-1].Cores;
+}
+
+////////////////////////////////////////////////////////////////////////////////
+// Program main
+////////////////////////////////////////////////////////////////////////////////
+int
+main(int argc, char **argv)
+{
+    pArgc = &argc;
+    pArgv = argv;
+
+    printf("%s Starting...\n\n", argv[0]);
+    printf(" CUDA Device Query (Runtime API) version (CUDART static linking)\n\n");
+
+    int deviceCount = 0;
+    cudaError_t error_id = cudaGetDeviceCount(&deviceCount);
+
+    if (error_id != cudaSuccess) {
+        printf("cudaGetDeviceCount failed: %s (%d)\n",
+			cudaGetErrorString(error_id), (int) error_id);
+        printf("Result = FAIL\n");
+        exit(EXIT_FAILURE);
+    }
+
+    // This function call returns 0 if there are no CUDA capable devices.
+    if (deviceCount == 0)
+        printf("There are no available device(s) that support CUDA\n");
+    else
+        printf("Detected %d CUDA Capable device(s)\n", deviceCount);
+
+    int dev, driverVersion = 0, runtimeVersion = 0;
+
+    for (dev = 0; dev < deviceCount; ++dev) {
+        cudaSetDevice(dev);
+        cudaDeviceProp deviceProp;
+        cudaGetDeviceProperties(&deviceProp, dev);
+
+        printf("\nDevice %d: \"%s\"\n", dev, deviceProp.name);
+
+        // Console log
+        cudaDriverGetVersion(&driverVersion);
+        cudaRuntimeGetVersion(&runtimeVersion);
+        printf("  CUDA Driver Version / Runtime Version          %d.%d / %d.%d\n", driverVersion/1000, (driverVersion%100)/10, runtimeVersion/1000, (runtimeVersion%100)/10);
+        printf("  CUDA Capability Major/Minor version number:    %d.%d\n", deviceProp.major, deviceProp.minor);
+
+        printf("  Total amount of global memory:                 %.0f MBytes (%llu bytes)\n",
+                (float)deviceProp.totalGlobalMem/1048576.0f, (unsigned long long) deviceProp.totalGlobalMem);
+
+        printf("  (%2d) Multiprocessors, (%3d) CUDA Cores/MP:     %d CUDA Cores\n",
+               deviceProp.multiProcessorCount,
+               ConvertSMVer2Cores(deviceProp.major, deviceProp.minor),
+               ConvertSMVer2Cores(deviceProp.major, deviceProp.minor) * deviceProp.multiProcessorCount);
+        printf("  GPU Max Clock rate:                            %.0f MHz (%0.2f GHz)\n", deviceProp.clockRate * 1e-3f, deviceProp.clockRate * 1e-6f);
+
+
+#if CUDART_VERSION >= 5000
+        // This is supported in CUDA 5.0 (runtime API device properties)
+        printf("  Memory Clock rate:                             %.0f Mhz\n", deviceProp.memoryClockRate * 1e-3f);
+        printf("  Memory Bus Width:                              %d-bit\n",   deviceProp.memoryBusWidth);
+
+        if (deviceProp.l2CacheSize) {
+            printf("  L2 Cache Size:                                 %d bytes\n", deviceProp.l2CacheSize);
+        }
+
+#else
+        // This only available in CUDA 4.0-4.2 (but these were only exposed in the CUDA Driver API)
+        int memoryClock;
+        getCudaAttribute<int>(&memoryClock, CU_DEVICE_ATTRIBUTE_MEMORY_CLOCK_RATE, dev);
+        printf("  Memory Clock rate:                             %.0f Mhz\n", memoryClock * 1e-3f);
+        int memBusWidth;
+        getCudaAttribute<int>(&memBusWidth, CU_DEVICE_ATTRIBUTE_GLOBAL_MEMORY_BUS_WIDTH, dev);
+        printf("  Memory Bus Width:                              %d-bit\n", memBusWidth);
+        int L2CacheSize;
+        getCudaAttribute<int>(&L2CacheSize, CU_DEVICE_ATTRIBUTE_L2_CACHE_SIZE, dev);
+
+        if (L2CacheSize) {
+            printf("  L2 Cache Size:                                 %d bytes\n", L2CacheSize);
+        }
+
+#endif
+
+        printf("  Maximum Texture Dimension Size (x,y,z)         1D=(%d), 2D=(%d, %d), 3D=(%d, %d, %d)\n",
+               deviceProp.maxTexture1D   , deviceProp.maxTexture2D[0], deviceProp.maxTexture2D[1],
+               deviceProp.maxTexture3D[0], deviceProp.maxTexture3D[1], deviceProp.maxTexture3D[2]);
+        printf("  Maximum Layered 1D Texture Size, (num) layers  1D=(%d), %d layers\n",
+               deviceProp.maxTexture1DLayered[0], deviceProp.maxTexture1DLayered[1]);
+        printf("  Maximum Layered 2D Texture Size, (num) layers  2D=(%d, %d), %d layers\n",
+               deviceProp.maxTexture2DLayered[0], deviceProp.maxTexture2DLayered[1], deviceProp.maxTexture2DLayered[2]);
+
+
+        printf("  Total amount of constant memory:               %lu bytes\n", deviceProp.totalConstMem);
+        printf("  Total amount of shared memory per block:       %lu bytes\n", deviceProp.sharedMemPerBlock);
+        printf("  Total number of registers available per block: %d\n", deviceProp.regsPerBlock);
+        printf("  Warp size:                                     %d\n", deviceProp.warpSize);
+        printf("  Maximum number of threads per multiprocessor:  %d\n", deviceProp.maxThreadsPerMultiProcessor);
+        printf("  Maximum number of threads per block:           %d\n", deviceProp.maxThreadsPerBlock);
+        printf("  Max dimension size of a thread block (x,y,z): (%d, %d, %d)\n",
+               deviceProp.maxThreadsDim[0],
+               deviceProp.maxThreadsDim[1],
+               deviceProp.maxThreadsDim[2]);
+        printf("  Max dimension size of a grid size    (x,y,z): (%d, %d, %d)\n",
+               deviceProp.maxGridSize[0],
+               deviceProp.maxGridSize[1],
+               deviceProp.maxGridSize[2]);
+        printf("  Maximum memory pitch:                          %lu bytes\n", deviceProp.memPitch);
+        printf("  Texture alignment:                             %lu bytes\n", deviceProp.textureAlignment);
+        printf("  Concurrent copy and kernel execution:          %s with %d copy engine(s)\n", (deviceProp.deviceOverlap ? "Yes" : "No"), deviceProp.asyncEngineCount);
+        printf("  Run time limit on kernels:                     %s\n", deviceProp.kernelExecTimeoutEnabled ? "Yes" : "No");
+        printf("  Integrated GPU sharing Host Memory:            %s\n", deviceProp.integrated ? "Yes" : "No");
+        printf("  Support host page-locked memory mapping:       %s\n", deviceProp.canMapHostMemory ? "Yes" : "No");
+        printf("  Alignment requirement for Surfaces:            %s\n", deviceProp.surfaceAlignment ? "Yes" : "No");
+        printf("  Device has ECC support:                        %s\n", deviceProp.ECCEnabled ? "Enabled" : "Disabled");
+#if defined(WIN32) || defined(_WIN32) || defined(WIN64) || defined(_WIN64)
+        printf("  CUDA Device Driver Mode (TCC or WDDM):         %s\n", deviceProp.tccDriver ? "TCC (Tesla Compute Cluster Driver)" : "WDDM (Windows Display Driver Model)");
+#endif
+        printf("  Device supports Unified Addressing (UVA):      %s\n", deviceProp.unifiedAddressing ? "Yes" : "No");
+        printf("  Device PCI Domain ID / Bus ID / location ID:   %d / %d / %d\n", deviceProp.pciDomainID, deviceProp.pciBusID, deviceProp.pciDeviceID);
+
+        const char *sComputeMode[] = {
+            "Default (multiple host threads can use ::cudaSetDevice() with device simultaneously)",
+            "Exclusive (only one host thread in one process is able to use ::cudaSetDevice() with this device)",
+            "Prohibited (no host thread can use ::cudaSetDevice() with this device)",
+            "Exclusive Process (many threads in one process is able to use ::cudaSetDevice() with this device)",
+            "Unknown",
+            NULL
+        };
+        printf("  Compute Mode:\n");
+        printf("     < %s >\n", sComputeMode[deviceProp.computeMode]);
+    }
+
+    // If there are 2 or more GPUs, query to determine whether RDMA is supported
+    if (deviceCount >= 2)
+    {
+        cudaDeviceProp prop[64];
+        int gpuid[64]; // we want to find the first two GPU's that can support P2P
+        int gpu_p2p_count = 0;
+
+        for (int i=0; i < deviceCount; i++)
+        {
+            checkCudaErrors(cudaGetDeviceProperties(&prop[i], i));
+
+            // Only boards based on Fermi or later can support P2P
+            if ((prop[i].major >= 2)
+#if defined(WIN32) || defined(_WIN32) || defined(WIN64) || defined(_WIN64)
+                // on Windows (64-bit), the Tesla Compute Cluster driver for windows must be enabled to supprot this
+                && prop[i].tccDriver
+#endif
+               )
+            {
+                // This is an array of P2P capable GPUs
+                gpuid[gpu_p2p_count++] = i;
+            }
+        }
+
+        // Show all the combinations of support P2P GPUs
+        int can_access_peer_0_1, can_access_peer_1_0;
+
+        if (gpu_p2p_count >= 2)
+        {
+            for (int i = 0; i < gpu_p2p_count-1; i++)
+            {
+                for (int j = 1; j < gpu_p2p_count; j++)
+                {
+                    checkCudaErrors(cudaDeviceCanAccessPeer(&can_access_peer_0_1, gpuid[i], gpuid[j]));
+                    printf("> Peer access from %s (GPU%d) -> %s (GPU%d) : %s\n", prop[gpuid[i]].name, gpuid[i],
+                           prop[gpuid[j]].name, gpuid[j] ,
+                           can_access_peer_0_1 ? "Yes" : "No");
+                }
+            }
+
+            for (int j = 1; j < gpu_p2p_count; j++)
+            {
+                for (int i = 0; i < gpu_p2p_count-1; i++)
+                {
+                    checkCudaErrors(cudaDeviceCanAccessPeer(&can_access_peer_1_0, gpuid[j], gpuid[i]));
+                    printf("> Peer access from %s (GPU%d) -> %s (GPU%d) : %s\n", prop[gpuid[j]].name, gpuid[j],
+                           prop[gpuid[i]].name, gpuid[i] ,
+                           can_access_peer_1_0 ? "Yes" : "No");
+                }
+            }
+        }
+    }
+
+    // csv masterlog info
+    // *****************************
+    // exe and CUDA driver name
+    printf("\n");
+    std::string sProfileString = "deviceQuery, CUDA Driver = CUDART";
+    char cTemp[128];
+
+    // driver version
+    sProfileString += ", CUDA Driver Version = ";
+#if defined(WIN32) || defined(_WIN32) || defined(WIN64) || defined(_WIN64)
+    sprintf_s(cTemp, 10, "%d.%d", driverVersion/1000, (driverVersion%100)/10);
+#else
+    sprintf(cTemp, "%d.%d", driverVersion/1000, (driverVersion%100)/10);
+#endif
+    sProfileString +=  cTemp;
+
+    // Runtime version
+    sProfileString += ", CUDA Runtime Version = ";
+#if defined(WIN32) || defined(_WIN32) || defined(WIN64) || defined(_WIN64)
+    sprintf_s(cTemp, 10, "%d.%d", runtimeVersion/1000, (runtimeVersion%100)/10);
+#else
+    sprintf(cTemp, "%d.%d", runtimeVersion/1000, (runtimeVersion%100)/10);
+#endif
+    sProfileString +=  cTemp;
+
+    // Device count
+    sProfileString += ", NumDevs = ";
+#if defined(WIN32) || defined(_WIN32) || defined(WIN64) || defined(_WIN64)
+    sprintf_s(cTemp, 10, "%d", deviceCount);
+#else
+    sprintf(cTemp, "%d", deviceCount);
+#endif
+    sProfileString += cTemp;
+
+    // Print Out all device Names
+    for (dev = 0; dev < deviceCount; ++dev)
+    {
+#if defined(WIN32) || defined(_WIN32) || defined(WIN64) || defined(_WIN64)
+        sprintf_s(cTemp, 13, ", Device%d = ", dev);
+#else
+        sprintf(cTemp, ", Device%d = ", dev);
+#endif
+        cudaDeviceProp deviceProp;
+        cudaGetDeviceProperties(&deviceProp, dev);
+        sProfileString += cTemp;
+        sProfileString += deviceProp.name;
+    }
+
+    sProfileString += "\n";
+    printf("%s", sProfileString.c_str());
+
+    printf("Result = PASS\n");
+
+    // finish
+    // cudaDeviceReset causes the driver to clean up all state. While
+    // not mandatory in normal operation, it is good practice.  It is also
+    // needed to ensure correct operation when the application is being
+    // profiled. Calling cudaDeviceReset causes all profile data to be
+    // flushed before the application exits
+    cudaDeviceReset();
+    return 0;
+}
--- a/pkgs/cudainfo/default.nix
+++ b/pkgs/cudainfo/default.nix
@@ -0,0 +1,43 @@
+{
+  stdenv
+, cudatoolkit
+, cudaPackages
+, autoAddDriverRunpath
+, strace
+}:
+
+stdenv.mkDerivation (finalAttrs: {
+  name = "cudainfo";
+  src = ./.;
+  buildInputs = [
+    cudatoolkit # Required for nvcc
+    cudaPackages.cuda_cudart.static # Required for -lcudart_static
+    autoAddDriverRunpath
+  ];
+  installPhase = ''
+    mkdir -p $out/bin
+    cp -a cudainfo $out/bin
+  '';
+  passthru.gpuCheck = stdenv.mkDerivation {
+    name = "cudainfo-test";
+    requiredSystemFeatures = [ "cuda" ];
+    dontBuild = true;
+    nativeCheckInputs = [
+      finalAttrs.finalPackage # The cudainfo package from above
+      strace # When it fails, it will show the trace
+    ];
+    dontUnpack = true;
+    doCheck = true;
+    checkPhase = ''
+      if ! cudainfo; then
+        set -x
+        cudainfo=$(command -v cudainfo)
+        ldd $cudainfo
+        readelf -d $cudainfo
+        strace -f $cudainfo
+        set +x
+      fi
+    '';
+    installPhase = "touch $out";
+  };
+})
--- a/pkgs/mpich-fix-hwtopo.patch
+++ b/pkgs/mpich-fix-hwtopo.patch
@@ -1,36 +0,0 @@
-diff --git a/src/util/mpir_hwtopo.c b/src/util/mpir_hwtopo.c
-index 33e88bc..ee3641c 100644
--- a/src/util/mpir_hwtopo.c
-+++ b/src/util/mpir_hwtopo.c
-@@ -200,18 +200,6 @@ int MPII_hwtopo_init(void)
- #ifdef HAVE_HWLOC
-     bindset = hwloc_bitmap_alloc();
-     hwloc_topology_init(&hwloc_topology);
-    char *xmlfile = MPIR_pmi_get_jobattr("PMI_hwloc_xmlfile");
-    if (xmlfile != NULL) {
-        int rc;
-        rc = hwloc_topology_set_xml(hwloc_topology, xmlfile);
-        if (rc == 0) {
-            /* To have hwloc still actually call OS-specific hooks, the
-             * HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM has to be set to assert that the loaded
-             * file is really the underlying system. */
-            hwloc_topology_set_flags(hwloc_topology, HWLOC_TOPOLOGY_FLAG_IS_THISSYSTEM);
-        }
-        MPL_free(xmlfile);
-    }
-
-     hwloc_topology_set_io_types_filter(hwloc_topology, HWLOC_TYPE_FILTER_KEEP_ALL);
-     if (!hwloc_topology_load(hwloc_topology)) 
-
--- a/src/mpi/init/local_proc_attrs.c
-+++ b/src/mpi/init/local_proc_attrs.c
-@@ -79,10 +79,6 @@ int MPII_init_local_proc_attrs(int *p_thread_required)
-     /* Set the number of tag bits. The device may override this value. */
-     MPIR_Process.tag_bits = MPIR_TAG_BITS_DEFAULT;
-
-    char *requested_kinds = MPIR_pmi_get_jobattr("PMI_mpi_memory_alloc_kinds");
-    MPIR_get_supported_memory_kinds(requested_kinds, &MPIR_Process.memory_alloc_kinds);
-    MPL_free(requested_kinds);
-
-     return mpi_errno;
- }
--- a/pkgs/overlay.nix
+++ b/pkgs/overlay.nix
@@ -11,10 +11,6 @@ final: prev:
      paths = [ pmix.dev pmix.out ];
    };
  in prev.mpich.overrideAttrs (old: {
-    patches = (old.patches or []) ++ [
-      # See https://github.com/pmodels/mpich/issues/6946
-      ./mpich-fix-hwtopo.patch
-    ];
    buildInput = old.buildInputs ++ [
      libfabric
      pmixAll
@@ -56,4 +52,5 @@ final: prev:
  prometheus-slurm-exporter = prev.callPackage ./slurm-exporter.nix { };
  meteocat-exporter = prev.callPackage ./meteocat-exporter/default.nix { };
  upc-qaire-exporter = prev.callPackage ./upc-qaire-exporter/default.nix { };
+  cudainfo = prev.callPackage ./cudainfo/default.nix { };
 }
--- a/secrets/ceph-user.age
+++ b/secrets/ceph-user.age
--- a/secrets/gitea-runner-token.age
+++ b/secrets/gitea-runner-token.age
@@ -1,11 +1,11 @@
 age-encryption.org/v1
-> ssh-ed25519 HY2yRg WUMWvyagPalsy7u1RaEFAwJvFowso1/quNBo+nAkxhQ
-OHcebB7koPKhy58A6qngEVNWckkWChyEK3dwgy8EL5o
-> ssh-ed25519 CAWG4Q Yx/HLIryUNE2BaqTl84FrNRy4XLCY2TRkRgbA9k3qU4
-LZljfuLS5yMVVK6N57iC6cKEaFP6Hh2OkvWJjuFg8q0
-> ssh-ed25519 xA739A DOXjPRttSWz51Sr7KfjgKfAtaIYMo3foB1Ywqw9HYDY
-CA5puXK/1HDOitA2XHBI3OdKmZ7BzHst4DyuWGMC6hE
-> ssh-ed25519 MSF3dg +2LetdIiIZUk7wtHNS1tYsLo4ypwqZ9gpg77RQrnzHU
-yIUu8BVbF3dhUx3531RR50/cJQd9gd8VfKUQzEeT/iQ
--- oY/wQ+RjZO2CmKZtbQ0yOVZ5fv2+AlvvkRu1UDfCNAA
-_8`G<>=C7@x&<26><>\Ft<46>)<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>cPe<EFBFBD><EFBFBD>%<25>ֽ[zX-0<>[<11><><EFBFBD>ɲ<><C9B2>tz<74><7A>;%<25><><EFBFBD><EFBFBD><EFBFBD>~<7E>H0<48>؃*XD<58>;<EFBFBD><EFBFBD>
+-> ssh-ed25519 HY2yRg d7+nvfAcdC3GjJxipXFrsfGGyP5jAY+gRWRV+4FVYAM
+CG7r0bRGgnUWcdfDnpe7HwZ3L/y7b5iuJuqvf15b3/Y
+-> ssh-ed25519 CAWG4Q X0vITOErz4wkR3VQYOcVlnrkHtwe+ytdZz1Hcrs4vVs
+6IWYOhXLQ+BnML9YfLLHJYEO2CZ/uEc9IBqhoWvjDHI
+-> ssh-ed25519 xA739A p5e/0AJtZ0+zbRvkB/usLuxusY8xXRx9Ksi/LQlcIHw
+M4S/qlzT9POyJx4gY9lmycstUcdwG2cinN4OlV22zzo
+-> ssh-ed25519 MSF3dg Ydl7uBWzBx6sAaxbzC3x8qiaU3ysGqV4rUFLpHCEV30
+/1AUHBhCNOs9i7LJbmzwQDHsu+ybzYf6+coztKk5E3U
+--- kYt15WxClpT7PXD1oFe9GqJU+OswjH7y9wIc8/GzZ7M
+<EFBFBD><EFBFBD>h<>ߓ<><EFBFBD><EFBFBD>`<EFBFBD><EFBFBD><EFBFBD>V4F<EFBFBD><EFBFBD>_k)^<5E>m$uj:ѳ<><D1B3><17><><EFBFBD>}<7D>Z]$U]<12>u<EFBFBD> <20>0<EFBFBD><30><EFBFBD>v8<76>?<3F>X<EFBFBD>P<EFBFBD>g%d<>#<23>d9{rAi<EFBFBD><EFBFBD>
--- a/secrets/gitlab-bsc-docker-token.age
+++ b/secrets/gitlab-bsc-docker-token.age
--- a/secrets/gitlab-runner-docker-token.age
+++ b/secrets/gitlab-runner-docker-token.age
--- a/secrets/gitlab-runner-shell-token.age
+++ b/secrets/gitlab-runner-shell-token.age
--- a/secrets/ipmi.yml.age
+++ b/secrets/ipmi.yml.age
--- a/secrets/jungle-robot-password.age
+++ b/secrets/jungle-robot-password.age
@@ -1,13 +1,13 @@
 age-encryption.org/v1
-> ssh-ed25519 HY2yRg 0tpCZ5yI339pgPKGh3HJ8cnkhKlMoyYiKR1mo1cvkm0
-EVVpJ8nyw/W9B65Tw59IjJC5Pb4uQX5LGnzPcf/hUs0
-> ssh-ed25519 G5LX5w YaDAKeAAunommW6q6+hTjrjaadmB17OG89t1Dx/T5z4
-tJXdciiBTz9V+0nf1sGAk4vSlOgfeEgrKr+oDJ/4ays
-> ssh-ed25519 CAWG4Q i/cpMcOaZpH7aqwsR/fZiVL9CreL9dkk5F5S9dXrQBY
-uU8G51pMH00ywaIVY+AzjpiqzanUYpn9ANRabugSXbE
-> ssh-ed25519 xA739A DTiXqnCz1zNgyLt8VvnOkVLDwfa0qJpUBQw9Ms/qHHA
-wKjSYYOUEJkPisxT6MNW1eoYk++ECrs1ib9uEYXsAQY
-> ssh-ed25519 MSF3dg JmvJsExWPW4b6RT62mz4Wscx7EsyDPVf91A9ps9+shM
-67jZYnxJpQAhnRWnTOXs+Cu445dRCpDzIGGp1xYuF3s
--- QmdvzR7QqRPxS1fHc8rR/PDZxN8u+BVKAVvE8cMLhqc
-<EFBFBD><EFBFBD><02><>EG<45><0B>Q<<3C><><EFBFBD><EFBFBD><EFBFBD>Kl<4B>U,<11><><EFBFBD><EFBFBD>[-<2D><>º<1A><Uc<55>e<EFBFBD><65><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>)<06><><05><>"f<>˖<EFBFBD><CB96><19>Z<EFBFBD>'C
+-> ssh-ed25519 HY2yRg rsbyYULV9S/kz4OzBLQIVfyotgKrzPzvjPNVw69coTo
+i9fgGAYTPxJ4Ulft3xzwNPF8v85Ae9ePMNWp593vSfA
+-> ssh-ed25519 G5LX5w mhB3iiqV2e+tT31FCREX2Bqq2F2g+vTYvjCuyGSeJxs
+Ep9zZykCGFW841S2mnllEi0oPnRiRuYIGtv6ckp+IBg
+-> ssh-ed25519 CAWG4Q M0AJEZuiC6FnRy8rAJQ9T9dCXfIfLXGk0uBGhYOxRSg
+5jSRNTi0c6we/oLBdUy1am5saH/5Nh1fmVqYajXFbGc
+-> ssh-ed25519 xA739A Zf9tUKg4S4UuWMGEtAWVg0pa6vTzKIl2Ty39IjEG2mE
+RCSkVFyO2ZuDlAHung9bTeM91aTXxNRJ779kE0C6pK4
+-> ssh-ed25519 MSF3dg QLiG9s3mgfO6HnQ8/ReizkGllsjYebIL5ZthSVcD7Ao
+YdzcodBarrdg6R96Ys01aEPoeYygbT56yz90BMFfr0U
+--- fS/rGOP3IGG8b3bCDy26nBL0P1rtqC70CmKOGDsg8Tw
+;Y<><59>M<EFBFBD>_<EFBFBD><5F><EFBFBD>Zꙺ:<3A>]Ez89ze<><65><EFBFBD>	<09><>D<01>X<EFBFBD><19><><EFBFBD><EFBFBD><EFBFBD><EFBFBD>9{<7B>x^<5E><><EFBFBD>L<>l<EFBFBD><6C><EFBFBD><0B>㦑9R<39>VhWs
--- a/secrets/munge-key.age
+++ b/secrets/munge-key.age
--- a/secrets/nix-serve.age
+++ b/secrets/nix-serve.age
@@ -1,13 +1,14 @@
 age-encryption.org/v1
-> ssh-ed25519 HY2yRg T/Qom1qxE0M+FuvsXD/KZ6Usfp6v3Xwx043kDgxbCz4
-6GRg0QjuHd2+d6lJfZqqPMPMjS91HEcJ/W0KRV6Et50
-> ssh-ed25519 G5LX5w pzg0wK+Q6KZP67CkyZNYbNcahlq9SIuFN18H85ARykU
-aDSrO49tg/a3GOAJR96lh803bXoZqp/G6VMiSvf91vw
-> ssh-ed25519 CAWG4Q X+F/6LF8VUUoV72iCLzKKpYGRDoUHuBy1E+yr29RKEo
-c779vpt/fiN7n0kGAc5jA9fWkzCPrthlNZdN4p6csrk
-> ssh-ed25519 xA739A sbg087VKj/gcycV9JrBNCoCfB4kRMDSVo3EtfpRVDyg
-Lv5ges1KmxGwvz4UPZCD0v4YN2ms2Q3wmrJ14XCKYsQ
-> ssh-ed25519 MSF3dg pCLeyeWYbnNWQwwlGcsKz0KZ4BaaYKCGjo0XOPpo+no
-IsNxFoB2nTxyThJxtAxSA6gauXHGQJnVefs/K2MZ+DM
--- tgB3F+k1/PQt+r5Cz+FqH31hCZFvr0Y8uZVKkdA80yo
-60.<2E><><19><0F><><EFBFBD><EFBFBD>(<28>s<EFBFBD>?68<36><38>Q<EFBFBD>I<><49><EFBFBD>d<EFBFBD><64><EFBFBD>gb<67><62><EFBFBD><EFBFBD><EFBFBD><04>`<60><><13><>A<EFBFBD><EFBFBD>z<L}<7D>2&w<0B>!<21><>6<EFBFBD> ;F<><46>r<EFBFBD>BR\<5C><><1B>ً<02>h"<22>"<22><>~q<>×<EFBFBD><C397><EFBFBD>1ƾ<31><C6BE>!({0<>^<5E><>Q<05>1e<31><0F><><EFBFBD><EFBFBD>୏<><E0AD8F>+<2B>
+-> ssh-ed25519 HY2yRg tdVrzL3EryCEDJSiAjHfr3AC6rhyKLLe9ZaKKa/fyEk
+kIbJjp/odUkQ9E2fXpk4zratLieyMNdNLHYGQt8+860
+-> ssh-ed25519 G5LX5w A0wBDwowrQyByfinVVrypH5VyvyKk3O3O8+2JnVgcCI
+kLiXfQkC+8QycLyyM/6dAKEE6SGxSZJS7PuOTQr10XE
+-> ssh-ed25519 CAWG4Q HkbFgDtrbuv+KCwULZppiy88ZHl3kHcdlTVTOfMKTzM
+KMGdQl8Gl51gUp1bxEa41a0VBBiHWD81/9C75NX/pzA
+-> ssh-ed25519 xA739A XfYFE5jPFvcoTMXtwJgs3+HPLQxRmvz1W7yqE7jSYGE
+497iDMqiIx1u+cBu8KZDNF2SPpGCrVqjGdUPD8kEjE4
+-> ssh-ed25519 MSF3dg Vbxxsmfoywpi4W9WUMzgay3Nd1UBigliYHD7Wew9AHM
+aLt5GN8jJWbbrHfs1321tQz44lBaATe0BipT/EGc80I
+--- JHESkz0eGNPo3ZEGALVH4xsQ4p1O/6ShlfOw58fjH1k
+
+<EFBFBD>AwN<EFBFBD>g<EFBFBD><EFBFBD><EFBFBD><EFBFBD>C<EFBFBD><EFBFBD><EFBFBD>Ԣְ7<EFBFBD>	ǟ4#0<><30><EFBFBD>ss<73><73><EFBFBD>-*<2A><19>$Z<><5A><EFBFBD><EFBFBD><EFBFBD>[*<2A><>ia<69>{<7B>?=<3D><08><>v-E<EE9495>7<EFBFBD><37><10>0<EFBFBD><30>]<5D><>q0<71>)q"K<><4B><EFBFBD>{BZs<7F><73><EFBFBD><EFBFBD>*<2A>l<EFBFBD><6C>9-E+<02><>8<(<28><><EFBFBD>a*$dN<64><4E>xd
--- a/secrets/tent-gitlab-runner-bsc-docker-token.age
+++ b/secrets/tent-gitlab-runner-bsc-docker-token.age
--- a/secrets/tent-gitlab-runner-pm-docker-token.age
+++ b/secrets/tent-gitlab-runner-pm-docker-token.age
--- a/secrets/tent-gitlab-runner-pm-shell-token.age
+++ b/secrets/tent-gitlab-runner-pm-shell-token.age
@@ -1,13 +1,12 @@
 age-encryption.org/v1
-> ssh-ed25519 G5LX5w V9bHLoGuY4stRwbzVS9Qa0L9yoY+UoCoXc+dJJQW/Ag
-2ut9GfdJ3KBCqZRaloZCQsl8MLfaZAZxqj6JtPJzu2k
-> ssh-ed25519 CAWG4Q OAqnIfMECpKglZ7aF9tv/PQinG1Ou2+IEZ+nf4dtQjg
-dANdMLe4iI0d6Xd/dIMpZK+mgw2+VmJFQScHaIxD7WI
-> ssh-ed25519 xA739A nVNF4Y6VSa5PP6FFBJpVmoFYYseoFx5F2wJU+Pwk+Xk
-A5CiuTSNlX9Y76qhYgblBdJl3zPhtjWho2oL5/sIKu0
-> ssh-ed25519 MSF3dg /WMsGnBGzquIMyw06gHKpSS4OUxheulT59kxi+/pxxU
-ppwcv7RLzUbQUM7j0Tb9rRVT9XyPMhqYr2fr4S0nTJY
--- zOe0Ko0oxArbmxePMPDVAT0pDju7IeOAih7sNrDcoVs
-i<EFBFBD>k<EFBFBD>A
-hODV<44>w!<21><0C><>E݈<45><DD88>+<2B><>`<60><><EFBFBD><EFBFBD>C<><43>5<EFBFBD>L<EFBFBD>A<EFBFBD>t<1A>M^<01>E<<1B>HI<48>_<EFBFBD>nn<6E><6E><EFBFBD>o<EFBFBD>?<3F>j-<EFBFBD>
-A<1B>nԔί<1B>>Z<><5A>z<EFBFBD><7A><EFBFBD>dT<64><54>b"<22>(@<40><>{_ځC
+-> ssh-ed25519 G5LX5w 5K0mzfJGvAB2LGmoQ9ZLbWooVEX6F4+fQdo1JUoB3FM
+AKGa507bUrYjXFaMQ1MXTDBFYsdS6zbs+flmxYN0UNo
+-> ssh-ed25519 CAWG4Q 8KzLc949on8iN1pK8q11OpCIeO71t6b0zxCLHhcQ6ns
+uy7z6RdIuoUes+Uap3k5eoFFuu/DcSrEBwq4V4C/ygc
+-> ssh-ed25519 xA739A SLx5cKo0fdAHj+cLpJ4FYTWTUTyDsCqKQOufDu3xnGo
+VnS/WsiSaf6RpXuhgfij4pYu4p9hlJl1oXrfYY9rKlQ
+-> ssh-ed25519 MSF3dg c5ZXvdNxNfZU3HeWsttuhy+UC5JxWN/IFuCuCGbksn4
+vcKlIirf+VvERX71YpmwW6zp6ClhlG2PR4R8LIN7cQo
+--- pJKICDaYAlxqNnvHIuzB3Yk7tv0ZNYflGTQD+Zk/8+4
+<EFBFBD>h/\J<>J
+<EFBFBD>0?<3F> <20>p<EFBFBD><70><EFBFBD>@܉7<DC89><37>3<EFBFBD><33><EFBFBD><EFBFBD>z<EFBFBD><7A><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>a<EFBFBD><61>'<27>,ka<6B>I<EFBFBD>XXOZ<4F>I\<5C><><EFBFBD><EFBFBD><EFBFBD>	<09>BP<42><50>/cUɿ~B<><42>S'Q<><51><EFBFBD><EFBFBD>f<06><><EFBFBD>er<65><72><EFBFBD><EFBFBD>^<5E><><EFBFBD><EFBFBD>8l<38><6C>V<EFBFBD>E<EFBFBD><45><EFBFBD>
--- a/secrets/vpn-dac-client-key.age
+++ b/secrets/vpn-dac-client-key.age
--- a/secrets/vpn-dac-login.age
+++ b/secrets/vpn-dac-login.age
Author	SHA1	Message	Date
Rodrigo Arias Mallo	e57b1b5fd8	Keep compute nodes off when power comes back When the power comes back, we don't know if the AC unit will be operating properly or if the room will be at a safe temperature. So, instead of powering all the machines back, only configure the login to power on, so we can check the state of the room and power the rest of the machines.	2025-07-31 17:39:21 +02:00
Rodrigo Arias Mallo	d948f8b752	Move StartLimit* options to unit section The StartLimitBurst and StartLimitIntervalSec options belong to the [Unit] section, otherwise they are ignored in [Service]: > Unknown key 'StartLimitIntervalSec' in section [Service], ignoring. When using [Unit], the limits are properly set: apex% systemctl show power-policy.service \| grep StartLimit StartLimitIntervalUSec=10min StartLimitBurst=10 StartLimitAction=none Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-07-24 14:32:46 +02:00
Rodrigo Arias Mallo	8f7787e217	Set power policy to always turn on In all machines, as soon as we recover the power, turn the machine back on. We cannot rely on the previous state as we will shut them down before the power is cut to prevent damage on the power supply monitoring circuit. Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es> Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-07-24 11:22:38 +02:00
Rodrigo Arias Mallo	30b9b23112	Add NixOS module to control power policy Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es> Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-07-24 11:22:36 +02:00
Rodrigo Arias Mallo	9a056737de	Move August shutdown to 3rd at 22h Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es> Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-07-24 11:22:33 +02:00
Rodrigo Arias Mallo	ac700d34a5	Disable automatic August shutdown for Fox The UPC has different dates for the yearly power cut, and Fox can recover properly from a power loss, so we don't need to have it turned off before the power cut. Simply disabling the timer is enough. Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es> Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-07-24 11:22:10 +02:00
Rodrigo Arias Mallo	9b681ab7ce	Add cudainfo program to test CUDA The cudainfo program checks that we can initialize the CUDA RT library and communicate with the driver. It can be used as standalone program or built with cudainfo.gpuCheck so it is executed inside the build sandbox to see if it also works fine. It uses the autoAddDriverRunpath hook to inject in the runpath the location of the library directory for CUDA libraries. Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-07-23 11:52:09 +02:00
Rodrigo Arias Mallo	9ce394bffd	Add missing symlink in cuda sandbox Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-07-23 11:51:47 +02:00
Aleix Boné	8cd7b713ca	Enable cuda systemFeature in raccoon and fox This allows running derivations which depend on cuda runtime without breaking the sandbox. We only need to add `requiredSystemFeatures = [ "cuda" ];` to the derivation. Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>	2025-07-22 17:07:13 +02:00
Aleix Boné	8eed90d2bd	Move shared nvidia settings to a separate module Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>	2025-07-22 17:06:45 +02:00
Aleix Boné	aee54ef39f	Replace xeon07 by hut in ssh config The xeon07 machine has been renamed to hut. Reviewed-by: Rodrigo Arias Mallo <rodrigo.arias@bsc.es>	2025-07-21 18:10:08 +02:00
Rodrigo Arias Mallo	69f7ab701b	Enable automatic Nix GC in raccoon Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-07-21 17:58:26 +02:00
Rodrigo Arias Mallo	4c9bcebcdc	Select proprietary NVIDIA driver in raccoon The NVIDIA GTX 960 from 2016 has the Maxwell architecture, and NixOS suggests using the proprietary driver for older than Turing: > It is suggested to use the open source kernel modules on Turing or > later GPUs (RTX series, GTX 16xx), and the closed source modules > otherwise. Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-07-21 17:58:21 +02:00
Rodrigo Arias Mallo	86e7c72b9b	Enable open source NVidia driver in fox It is recommended for newer versions. Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-07-18 09:57:38 +02:00
Rodrigo Arias Mallo	a7dffc33b5	Remove option allowUnfree from fox and raccoon It is already set to true for all machines. Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-07-18 09:57:21 +02:00
Rodrigo Arias Mallo	6765dba3e4	Ban another scanner trying to connect via SSH It is constantly spamming out logs: apex# journalctl \| grep 'Connection closed by 84.88.52.176' \| wc -l 2255 Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-07-18 09:51:49 +02:00
Rodrigo Arias Mallo	0acfb7a8e0	Update weasel IPMI hostname for monitoring Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-07-18 09:51:21 +02:00
Rodrigo Arias Mallo	dfbb21a5bd	Remove merged MPICH patch Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>	2025-07-16 13:07:12 +02:00
Rodrigo Arias Mallo	2bb3b2fc4a	Remove package ix as it is gone Fails with: "error: ix has been removed from Nixpkgs, as the ix.io pastebin has been offline since Dec. 2023". Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>	2025-07-16 13:07:06 +02:00
Rodrigo Arias Mallo	3270fe50a2	flake.lock: Update Flake lock file updates: • Updated input 'agenix': 'github:ryantm/agenix/f6291c5935fdc4e0bef208cfc0dcab7e3f7a1c41?narHash=sha256-b%2Buqzj%2BWa6xgMS9aNbX4I%2BsXeb5biPDi39VgvSFqFvU%3D' (2024-08-10) → 'github:ryantm/agenix/531beac616433bac6f9e2a19feb8e99a22a66baf?narHash=sha256-9P1FziAwl5%2B3edkfFcr5HeGtQUtrSdk/MksX39GieoA%3D' (2025-06-17) • Updated input 'agenix/darwin': 'github:lnl7/nix-darwin/4b9b83d5a92e8c1fbfd8eb27eda375908c11ec4d?narHash=sha256-gzGLZSiOhf155FW7262kdHo2YDeugp3VuIFb4/GGng0%3D' (2023-11-24) → 'github:lnl7/nix-darwin/43975d782b418ebf4969e9ccba82466728c2851b?narHash=sha256-dyN%2BteG9G82G%2Bm%2BPX/aSAagkC%2BvUv0SgUw3XkPhQodQ%3D' (2025-04-12) • Updated input 'agenix/home-manager': 'github:nix-community/home-manager/3bfaacf46133c037bb356193bd2f1765d9dc82c1?narHash=sha256-7ulcXOk63TIT2lVDSExj7XzFx09LpdSAPtvgtM7yQPE%3D' (2023-12-20) → 'github:nix-community/home-manager/abfad3d2958c9e6300a883bd443512c55dfeb1be?narHash=sha256-YZCh2o9Ua1n9uCvrvi5pRxtuVNml8X2a03qIFfRKpFs%3D' (2025-04-24) • Updated input 'bscpkgs': 'git+https://git.sr.ht/~rodarima/bscpkgs?ref=refs/heads/master&rev=6782fc6c5b5a29e84a7f2c2d1064f4bcb1288c0f' (2024-11-29) → 'git+https://git.sr.ht/~rodarima/bscpkgs?ref=refs/heads/master&rev=9d1944c658929b6f98b3f3803fead4d1b91c4405' (2025-06-11) • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/9c6b49aeac36e2ed73a8c472f1546f6d9cf1addc?narHash=sha256-i/UJ5I7HoqmFMwZEH6vAvBxOrjjOJNU739lnZnhUln8%3D' (2025-01-14) → 'github:NixOS/nixpkgs/dfcd5b901dbab46c9c6e80b265648481aafb01f8?narHash=sha256-Kt1UIPi7kZqkSc5HVj6UY5YLHHEzPBkgpNUByuyxtlw%3D' (2025-07-13) Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>	2025-07-16 13:07:01 +02:00
Rodrigo Arias Mallo	499112cdad	Upgrade nixpkgs to nixos 25.05 Reviewed-by: Aleix Roca Nonell <aleix.rocanonell@bsc.es>	2025-07-16 13:06:40 +02:00
Rodrigo Arias Mallo	a6698e6a6b	Silently ban OpenVAS BSC scanner from apex It is spamming our logs with refused connection lines: apex% sudo journalctl -b0 \| grep 'refused connection.*SRC=192.168.8.16' \| wc -l 13945 Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-07-15 17:40:41 +02:00
Rodrigo Arias Mallo	b394c5a8f4	Rotate anavarro password and SSH key Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-07-15 17:24:41 +02:00
Rodrigo Arias Mallo	3d5b845057	Add weasel machine configuration Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-07-15 17:24:38 +02:00
Rodrigo Arias Mallo	9e83565977	Remove extra flush commands on firewall stop They are not needed as they are already flushed when the firewall starts or stops. Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-07-15 11:18:45 +02:00
Rodrigo Arias Mallo	ce2cda1c41	Prevent accidental use of nftables Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-07-15 11:18:42 +02:00
Rodrigo Arias Mallo	e6aef2cbd0	Add proxy configuration for internal hosts Access internal hosts via apex proxy. From the compute nodes we first open an SSH connection to apex, and then tunnel it through the HTTP proxy with netcat. This way we allow reaching internal GitLab repositories without requiring the user to have credentials in the remote host, while we can use multiple remotes to provide redundancy. Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-07-15 11:18:36 +02:00
Rodrigo Arias Mallo	b7603053fa	Remove unused blackbox configuration modules Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-07-15 11:18:30 +02:00
Rodrigo Arias Mallo	3ca55acfdf	Use IPv4 in blackbox probes Otherwise they simply fail as IPv6 doesn't work. Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-07-15 11:18:26 +02:00
Rodrigo Arias Mallo	e505a952af	Make NFS mount async to improve latency Don't wait to flush writes, as we don't care about consistency on a crash: > This option allows the NFS server to violate the NFS protocol and > reply to requests before any changes made by that request have been > committed to stable storage (e.g. disc drive). > > Using this option usually improves performance, but at the cost that > an unclean server restart (i.e. a crash) can cause data to be lost or > corrupted. Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-07-15 11:18:20 +02:00
Rodrigo Arias Mallo	3ad9452637	Disable root_squash from NFS Allows root to read files in the NFS export, so we can directly run `nixos-rebuild switch` from /home. Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-07-15 11:18:16 +02:00
Rodrigo Arias Mallo	fdd21d0dd0	Remove SSH proxy to access BSC clusters We now have direct connection to them. Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-07-15 11:18:13 +02:00
Rodrigo Arias Mallo	c40871bbfe	Add users to apex machine They need to be able to login to apex to access any other machine from the SSF rack. Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-07-15 11:18:09 +02:00
Rodrigo Arias Mallo	e8f5ce735e	Remove proxy from hut HTTP probes Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-07-15 11:18:04 +02:00
Rodrigo Arias Mallo	4a25056897	Remove proxy configuration from environment All machines have now direct connection with the outside world. Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-07-15 11:18:00 +02:00
Rodrigo Arias Mallo	89e0c0df28	Add storcli utility to apex Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-07-15 11:17:57 +02:00
Rodrigo Arias Mallo	1b731a756a	Add new configuration for apex Reviewed-by: Aleix Boné <abonerib@bsc.es>	2025-07-15 11:17:43 +02:00