Rodrigo Arias Mallo 7d4db6b6de control: Exit on error
This prevents srun from silently returning with an error, without
actually queueing the job of a run.
2020-12-07 16:33:40 +01:00
2020-12-07 13:47:17 +01:00
2020-12-07 16:33:40 +01:00
2020-12-04 11:18:44 +01:00
2020-07-20 16:07:26 +02:00
2020-09-16 12:22:55 +02:00
2020-12-07 13:33:42 +01:00
2020-12-07 13:47:17 +01:00


                          bscpkgs: User guide


ABSTRACT

  This repository contains a set of nix packages used in the Barcelona
  Supercomputing Center by the Programming Models group.

  The current setup uses the xeon07 machine to build packages, which are 
  automatically uploaded to MareNostrum4, due to lack of permissions in 
  the latter to perform the build safely.

  Some preliminary steps must be done manually to be able to build and 
  install packages (derivations in nix jargon).

1. Introduction

  To easily connect to xeon07 in one step, setup the SSH (for version 
  7.3 and upwards) configuration file in ~/.ssh/config adding these 
  lines:

    Host cobi
          HostName ssflogin.bsc.es
          User your-username-here

    Host xeon07
          ProxyJump cobi
          HostName xeon07
          User your-username-here

  You should be able to connect with:

    laptop$ ssh xeon07

1.1 Network access

  In order to use nix you would need to be able to download the sources 
  from Internet. Usually the download requires the ports 22, 80 and 443 
  to be open for outgoing traffic.

  Check that you have network access in xeon07 provided by the 
  environment variables "http_proxy" and "https_proxy". Try to fetch a 
  webpage with curl, to ensure the proxy is working:

    xeon07$ curl x.com
    x

1.2 SSH keys

  Package sources are usually downloaded directly from the git server, 
  so you must be able to access all repositories without a password 
  prompt.

  Most repositories at https://pm.bsc.es/gitlab are open to read for 
  logged in users, but there are some exceptions (for example the nanos6 
  repository) where you must have explicitly granted read access.

  If you don't have a ssh key at ~/.ssh/*.pub in xeon07 create a new one 
  without password protection by running:

    xeon07$ ssh-keygen
    Generating public/private rsa key pair.
    Enter file in which to save the key (~/.ssh/id_rsa):
    Enter passphrase (empty for no passphrase):
    Enter same passphrase again:
    Your identification has been saved in ~/.ssh/id_rsa.
    Your public key has been saved in ~/.ssh/id_rsa.pub.
    ...

  By default it will create the private key at ~/.ssh/id_rsa. Copy the 
  contents of your public ssh key in ~/.ssh/id_rsa.pub and paste it in 
  GitLab at:

    https://pm.bsc.es/gitlab/profile/keys

  Then, configure it for use in the ~/.ssh/config file, adding:

    Host bscpm03.bsc.es
      IdentityFile ~/.ssh/id_rsa

  Finally verify the SSH connection to the server works and you get a 
  greeting from the GitLab server with your username:

    xeon07$ ssh git@bscpm03.bsc.es
    PTY allocation request failed on channel 0
    Welcome to GitLab, @rarias!
    Connection to bscpm03.bsc.es closed.

  Verify that you can access nanos6/nanos6 repository (otherwise you 
  first need to ask to be granted read access), at:

    https://pm.bsc.es/gitlab/nanos6/nanos6
  
  Finally, you should be able to download the nanos6/nanos6 git 
  repository without any password interaction by running:

    xeon07$ git clone git@bscpm03.bsc.es:nanos6/nanos6.git

  You will also need to access MareNostrum 4 from the xeon07 node, in 
  order to submit experiments. Add the following lines as well to the 
  ~/.ssh/config file and set your user name:

    Host mn0 mn1 mn2
            User your-mn4-username
            IdentityFile ~/.ssh/id_rsa

  Then copy the key to MareNostrum 4 (it will ask you the first time for 
  your password):

    xeon07$ ssh-copy-id -i ~/.ssh/id_rsa.pub mn1

  And ensure that you can connect without a password:

    xeon07$ ssh mn1
    ...
    login1$

1.3 The bscpkgs repo

  Once you have Internet and you have granted access to the PM GitLab 
  repositories you can begin building software with nix. First ensure 
  that the nix binaries are available from your shell in xeon07:

    xeon07$ nix --version
    nix (Nix) 2.3.6

  Now you are ready to build and install packages with nix. Clone the 
  bscpkgs repository:

    xeon07$ git clone git@bscpm03.bsc.es:rarias/bscpkgs.git

  Nix looks in the current folder for a file named "default.nix" for 
  packages, so go to the repo directory:

    xeon07$ cd bscpkgs

  Now you should be able to build nanos6:

    xeon07$ nix-build -A bsc.nanos6
    ..
    /nix/store/3i0qkdywm9xjv2cm1ldx9smb552sf6r1-nanos6-2.4-6f10a32

  The installation is placed in the nix store (with the path stated in 
  the last line of the build process), with the "result" symbolic link 
  pointing to the same location:

    xeon07$ readlink result
    /nix/store/3i0qkdywm9xjv2cm1ldx9smb552sf6r1-nanos6-2.4-6f10a32

1.4 Configuration of mn4 (MareNostrum 4)

  In order to execute the programs built at xeon07, you first need to 
  enter nix environment. To do so, add to the end of the file ~/.bashrc 
  in mn4 the following line:

    export PATH=/gpfs/projects/bsc15/nix/bin:$PATH

  Then logout and login again (our source the ~/.bashrc file) and you 
  will now have the `nix-setup` command available. This command executes 
  a new shell where the /nix store is available. To execute it:

    mn4$ nix-setup

  Now you will see a new shell, where you can access the nix store:

    nix|mn4$ ls /nix
    gcroots  profiles  store  var

  The last build of nanos6 can be also found in mn4 at the same 
  location:

    /nix/store/3i0qkdywm9xjv2cm1ldx9smb552sf6r1-nanos6-2.4-6f10a32

  Remember to enter the nix environment by running `nix-setup` when you 
  need something from the nix store.

  You cannot perform any build operations from mn4: to do so use the 
  xeon07 machine.

2. Basic usage of nix

  Nix is a package manager which handles easily reproducibility and 
  configuration of packages and dependencies. See more info here:

    https://nixos.org/nix/manual/

  We will only cover the basic usage of nix for the BSC packages.

2.1 The user environment

  All nix packages are stored under the /nix directory. When you need to 
  "install" some binary from nix, a symlink is added to a folder 
  included in the $PATH variable. In particular, you should have 
  something similar added to your $PATH:

    xeon07$ echo $PATH | sed 's/:/\n/g' | grep nix
    /home/Computational/rarias/.nix-profile/bin
    /nix/var/nix/profiles/default/bin

  The first one is your custom installation of packages that are stored 
  in your home directory and the second one is the default installation 
  which contains the nix tools (which are installed in the /nix 
  directory as well).

  Use `nix search` to look for official packages in the "nixpkgs" 
  channel (the default repository of packages):

  xeon07$ nix search cowsay
  warning: using cached results; pass '-u' to update the cache
  * cowsay (cowsay)
    A program which generates ASCII pictures of a cow with a message

  * neo-cowsay (neo-cowsay)
    Cowsay reborn, written in Go

  * ponysay (ponysay-3.0.3)
    Cowsay reimplemention for ponies

  * tewisay (tewisay-unstable-2017-04-14)
    Cowsay replacement with unicode and partial ansi escape support

  When you need a program that is not available in your environment, 
  much like when you use "module load ..." you can use nix-env to modify 
  what is currently loaded. For example:

    xeon07$ nix-env -iA nixpkgs.cowsay

  Notice that you should specify the prefix "nixpkgs." before. The 
  command will download (if not found already in the nix store), compile 
  (if necessary) and load the program `cowsay` from the nixpkgs 
  repository in the environment. You should be able to run it as:

    xeon07$ cowsay "hello world"
     _____________
    < hello world >
     -------------
            \   ^__^
             \  (oo)\_______
                (__)\       )\/\
                    ||----w |
                    ||     ||

  You can now inspect the ~/.nix-profile/bin folder, and see that a new 
  symlink was added to the actual installation of the binary:

    xeon07$ file ~/.nix-profile/bin/cowsay
    /home/Computational/rarias/.nix-profile/bin/cowsay: symbolic link to 
    `/nix/store/673gczmhr5b449521srz2n7g1klykz6n-cowsay-3.03+dfsg2/bin/cowsay'

  You can list the current packages installed in your environment by 
  running:

    xeon07$ nix-env -q
    cowsay-3.03+dfsg2
    nix-2.3.6

  Notice that this setup only affects your user environment. Also, it is 
  permanent for any new session until you modify the environment again 
  and is immediate, all sessions will have the new environment 
  instantaneously.

  You can remove any package from the environment using:

    xeon07$ nix-env -e cowsay

  See the manual with `nix-env --help` if you want to know more details.

2.2 Building packages

  Usually, all official packages are already compiled and distributed 
  from a cache server so you don't need to rebuild them again. However, 
  BSC packages are distributed only in source code form as we don't have 
  any binary cache server yet.
  
  Nix will handle the build process without any user interaction (with a 
  few exceptions which you shouldn't have to worry). If any other user 
  has already built the package then the build process is not needed, 
  and the package is used as is.

  In order to build a BSC package go to the `bscpkgs` directory, and 
  run:

    xeon07$ nix-build -A bsc.dummy

  Notice the "bsc." prefix for BSC packages. The package will be built 
  and installed in the /nix directory, then a symlink is placed in the 
  result directory:

    xeon07$ find result/ -type f
    result/
    result/bin
    result/bin/dummy

  The way in which nix handles the packages and dependencies ensures 
  that the environment of the build process of any package is exactly 
  the same, so the generated output should be the same if the builds are 
  deterministic.
  
  You can check the reproducibility of the build by adding the "--check" 
  flag, which will rebuild the package and compare the checksum of every 
  file with the ones previously built:

    xeon07$ nix-build -A bsc.dummy --check
    ...
    xeon07$ echo $?
    0

  A return code of zero ensures the output is bit by bit identical to 
  the one installed. There are some packages that include 
  indeterministic information in the build process (such as the 
  timestamp of the current time) which will produce an error. Those 
  packages must be patched to ensure the output is deterministic.

  Notice that if you "cd" into the "result/" directory you will be at 
  /nix directory (as you have follow the symlink) where you don't have 
  write permission. Therefore if your program attempts to write to the 
  current directory it will fail. It is recommended to instead run your 
  program from the top directory:

    xeon07$ result/bin/dummy
    Hello world!

  Or you can install it in the environment:

    xeon07$ nix-env -i ./result

  And "cd" into any directory where you want to output some files and 
  just run it by the name:

    xeon07$ cd /tmp
    xeon07$ dummy
    Hello world!

  Finally, you can remove it from the environment if you don't need it:

    xeon07$ nix-env -e dummy

  If you want to know more details use "nix-build --help" to see the 
  manual.

2.3 The build process

  Each package is built following a programmable configuration 
  description in the nix language. Builds in nix are performed under 
  very strict conditions. No access to any file in the file system is 
  allowed, unless stated in the dependencies, which are in the /nix 
  store only.

  There is no network access in the build process and other restrictions 
  are enforced so that the build environment is reproducible. See more 
  details here:

    https://nixos.wiki/wiki/Nix#Sandboxing

  The top level "default.nix" file of the bscpkgs serves as a index 
  of all BSC packages. You can see the definition for each package, for 
  example the nbody app:

    nbody = callPackage ./bsc/apps/nbody/default.nix {
      stdenv = pkgs.gcc9Stdenv;
      mpi = intel-mpi;
      icc = icc;
      tampi = tampi;
      nanos6 = nanos6-git;
    };

  The compilation details are specified in the 
  "bsc/apps/nbody/default.nix" file.  You can configure the package by 
  changing the inputs, for example, what specific implementation of 
  nanos6 or MPI you want to use. To change the MPI implementation to the 
  official MPICH package use:

    nbody = callPackage ./bsc/apps/nbody/default.nix {
      stdenv = pkgs.gcc9Stdenv;
      mpi = pkgs.mpich; # Notice pkgs prefix for official packages
      icc = icc;
      tampi = tampi;
      nanos6 = nanos6-git;
    };

  Then you can rebuild the nbody package:

    xeon07$ nix-build -A bsc.nbody
    ...

  And verify that the binary is indeed linked to MPICH now:

    xeon07$ ldd result/bin/nbody_mpi.N2.2048.exe | grep mpi
        libmpi.so.12 => /nix/store/dwkkcv78a5bs8smflpx9ppp3klhz3i98-mpich-3.3.2/lib/libmpi.so.12 (0x00007f6be0f07000)

  If you modify a package which another package requires as a 
  dependency, nix will rebuild all required packages to propagate your 
  changes on demand.

  However, if you come back to the original configuration, the package 
  will still be in the /nix store (unless the garbage collector was 
  manually run and removed your old build), so you don't need to rebuild 
  it again.

  For example if nbody is configured back to use Intel MPI:

    nbody = callPackage ./bsc/apps/nbody/default.nix {
      stdenv = pkgs.gcc9Stdenv;
      mpi = intel-mpi;
      icc = icc;
      tampi = tampi;
      nanos6 = nanos6-git;
    };

  The build process now is not required:

    xeon07$ nix-build -A bsc.nbody
    /nix/store/rbq7wrjcmg6fzd6yhrlnkfvzcavdbdpc-nbody
    xeon07$ ldd result/bin/nbody_mpi.N2.2048.exe | grep mpi
        libmpifort.so.12 => /nix/store/jvsjvxj2a08340fpdrqbqix9z3mpp3bd-intel-mpi-2019.7.217/lib/libmpifort.so.12 (0x00007f3a00402000)
        libmpi.so.12 => /nix/store/jvsjvxj2a08340fpdrqbqix9z3mpp3bd-intel-mpi-2019.7.217/lib/libmpi.so.12 (0x00007f39fed34000)

  Take a look at the different package description files in the 
  bscpkgs repository if you want to understand more details. Also 
  the nix pills are a very good reference:

    https://nixos.org/nixos/nix-pills/

2.4 Debugging the build process

  It may happen that the build process fails in an unexpected way. Most 
  problems are related to missing dependencies and can be easily found 
  by looking at the error messages.

  Other build problems are more subtle and require more debugging time.  
  One way of inspecting a build problem is by adding the breakpointHook 
  hook to the nativeBuildInputs array in a nix derivation (see
  https://nixos.org/nixpkgs/manual/#ssec-setup-hooks for more info), 
  which will stop the build process and allow a shell to be attached to 
  the sandbox.

    xeon07$ nix-build -A bsc.nbody
    ...
    /nix/store/gvqm2yc9xx4vh3nglgckz8siya66jnkx-stdenv-linux/setup: line 
    83: fake-missing-command: command not found
    build failed in buildPhase with exit code 127
    To attach install cntr and run the following command as root:

      cntr attach -t command \
        cntr-/nix/store/sk2nsj7xfr62cjk6m3725ydfyswqz7n1-nbody

  The command must run as root user, so you can use `sudo -i` to run it, 
  (the -i option is required to load the shell profile which provides 
  the nix path containing the cntr tool):

    xeon$ sudo -i cntr attach -t command \
      cntr-/nix/store/sk2nsj7xfr62cjk6m3725ydfyswqz7n1-nbody
    nixbld@localhost:/var/lib/cntr> ls
    bin  build  dev  etc  nix  proc  tmp  var

  Then you can inspect the build environment to see why the build 
  failed. Source the build/env-vars file to get the same environment 
  variables (which include the $PATH) of the build process.

/* vim: set ts=2 sw=2 tw=72 fo=watqc expandtab spell autoindent: */
Description
No description provided
Readme 2 MiB
Languages
Nix 98.1%
C 1.1%
Shell 0.5%
Roff 0.3%