Rodrigo Arias Mallo
972be56eed
It will be used to cut the CTF traces to take only the computation part in cosideration. |
||
---|---|---|
bsc | ||
garlic | ||
test | ||
.gitignore | ||
default.nix | ||
NOISE | ||
overlay.nix | ||
README |
bscpkgs: User guide ABSTRACT This repository contains a set of nix packages used in the Barcelona Supercomputing Center by the Programming Models group. The current setup uses the xeon07 machine to build packages, which are automatically uploaded to MareNostrum4, due to lack of permissions in the latter to perform the build safely. Some preliminary steps must be done manually to be able to build and install packages (derivations in nix jargon). 1. Introduction To easily connect to xeon07 in one step, setup the SSH (for version 7.3 and upwards) configuration file in ~/.ssh/config adding these lines: Host cobi HostName ssflogin.bsc.es User your-username-here Host xeon07 ProxyJump cobi HostName xeon07 User your-username-here You should be able to connect with: laptop$ ssh xeon07 1.1 Network access In order to use nix you would need to be able to download the sources from Internet. Usually the download requires the ports 22, 80 and 443 to be open for outgoing traffic. Check that you have network access in xeon07 provided by the environment variables "http_proxy" and "https_proxy". Try to fetch a webpage with curl, to ensure the proxy is working: xeon07$ curl x.com x 1.2 SSH keys Package sources are usually downloaded directly from the git server, so you must be able to access all repositories without a password prompt. Most repositories at https://pm.bsc.es/gitlab are open to read for logged in users, but there are some exceptions (for example the nanos6 repository) where you must have explicitly granted read access. If you don't have a ssh key at ~/.ssh/*.pub in xeon07 create a new one without password protection by running: xeon07$ ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (~/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in ~/.ssh/id_rsa. Your public key has been saved in ~/.ssh/id_rsa.pub. ... By default it will create the private key at ~/.ssh/id_rsa. Copy the contents of your public ssh key in ~/.ssh/id_rsa.pub and paste it in GitLab at: https://pm.bsc.es/gitlab/profile/keys Then, configure it for use in the ~/.ssh/config file, adding: Host bscpm03.bsc.es IdentityFile ~/.ssh/id_rsa Finally verify the SSH connection to the server works and you get a greeting from the GitLab server with your username: xeon07$ ssh git@bscpm03.bsc.es PTY allocation request failed on channel 0 Welcome to GitLab, @rarias! Connection to bscpm03.bsc.es closed. Verify that you can access nanos6/nanos6 repository (otherwise you first need to ask to be granted read access), at: https://pm.bsc.es/gitlab/nanos6/nanos6 Finally, you should be able to download the nanos6/nanos6 git repository without any password interaction by running: xeon07$ git clone git@bscpm03.bsc.es:nanos6/nanos6.git You will also need to access MareNostrum 4 from the xeon07 node, in order to submit experiments. Add the following lines as well to the ~/.ssh/config file and set your user name: Host mn0 mn1 mn2 User your-mn4-username IdentityFile ~/.ssh/id_rsa Then copy the key to MareNostrum 4 (it will ask you the first time for your password): xeon07$ ssh-copy-id -i ~/.ssh/id_rsa.pub mn1 And ensure that you can connect without a password: xeon07$ ssh mn1 ... login1$ 1.3 The bscpkgs repo Once you have Internet and you have granted access to the PM GitLab repositories you can begin building software with nix. First ensure that the nix binaries are available from your shell in xeon07: xeon07$ nix --version nix (Nix) 2.3.6 Now you are ready to build and install packages with nix. Clone the bscpkgs repository: xeon07$ git clone git@bscpm03.bsc.es:rarias/bscpkgs.git Nix looks in the current folder for a file named "default.nix" for packages, so go to the repo directory: xeon07$ cd bscpkgs Now you should be able to build nanos6: xeon07$ nix-build -A bsc.nanos6 .. /nix/store/3i0qkdywm9xjv2cm1ldx9smb552sf6r1-nanos6-2.4-6f10a32 The installation is placed in the nix store (with the path stated in the last line of the build process), with the "result" symbolic link pointing to the same location: xeon07$ readlink result /nix/store/3i0qkdywm9xjv2cm1ldx9smb552sf6r1-nanos6-2.4-6f10a32 1.4 Configuration of mn4 (MareNostrum 4) In order to execute the programs built at xeon07, you first need to enter nix environment. To do so, add to the end of the file ~/.bashrc in mn4 the following line: export PATH=/gpfs/projects/bsc15/nix/bin:$PATH Then logout and login again (our source the ~/.bashrc file) and you will now have the `nix-setup` command available. This command executes a new shell where the /nix store is available. To execute it: mn4$ nix-setup Now you will see a new shell, where you can access the nix store: nix|mn4$ ls /nix gcroots profiles store var The last build of nanos6 can be also found in mn4 at the same location: /nix/store/3i0qkdywm9xjv2cm1ldx9smb552sf6r1-nanos6-2.4-6f10a32 Remember to enter the nix environment by running `nix-setup` when you need something from the nix store. You cannot perform any build operations from mn4: to do so use the xeon07 machine. 2. Basic usage of nix Nix is a package manager which handles easily reproducibility and configuration of packages and dependencies. See more info here: https://nixos.org/nix/manual/ We will only cover the basic usage of nix for the BSC packages. 2.1 The user environment All nix packages are stored under the /nix directory. When you need to "install" some binary from nix, a symlink is added to a folder included in the $PATH variable. In particular, you should have something similar added to your $PATH: xeon07$ echo $PATH | sed 's/:/\n/g' | grep nix /home/Computational/rarias/.nix-profile/bin /nix/var/nix/profiles/default/bin The first one is your custom installation of packages that are stored in your home directory and the second one is the default installation which contains the nix tools (which are installed in the /nix directory as well). Use `nix search` to look for official packages in the "nixpkgs" channel (the default repository of packages): xeon07$ nix search cowsay warning: using cached results; pass '-u' to update the cache * cowsay (cowsay) A program which generates ASCII pictures of a cow with a message * neo-cowsay (neo-cowsay) Cowsay reborn, written in Go * ponysay (ponysay-3.0.3) Cowsay reimplemention for ponies * tewisay (tewisay-unstable-2017-04-14) Cowsay replacement with unicode and partial ansi escape support When you need a program that is not available in your environment, much like when you use "module load ..." you can use nix-env to modify what is currently loaded. For example: xeon07$ nix-env -iA nixpkgs.cowsay Notice that you should specify the prefix "nixpkgs." before. The command will download (if not found already in the nix store), compile (if necessary) and load the program `cowsay` from the nixpkgs repository in the environment. You should be able to run it as: xeon07$ cowsay "hello world" _____________ < hello world > ------------- \ ^__^ \ (oo)\_______ (__)\ )\/\ ||----w | || || You can now inspect the ~/.nix-profile/bin folder, and see that a new symlink was added to the actual installation of the binary: xeon07$ file ~/.nix-profile/bin/cowsay /home/Computational/rarias/.nix-profile/bin/cowsay: symbolic link to `/nix/store/673gczmhr5b449521srz2n7g1klykz6n-cowsay-3.03+dfsg2/bin/cowsay' You can list the current packages installed in your environment by running: xeon07$ nix-env -q cowsay-3.03+dfsg2 nix-2.3.6 Notice that this setup only affects your user environment. Also, it is permanent for any new session until you modify the environment again and is immediate, all sessions will have the new environment instantaneously. You can remove any package from the environment using: xeon07$ nix-env -e cowsay See the manual with `nix-env --help` if you want to know more details. 2.2 Building packages Usually, all official packages are already compiled and distributed from a cache server so you don't need to rebuild them again. However, BSC packages are distributed only in source code form as we don't have any binary cache server yet. Nix will handle the build process without any user interaction (with a few exceptions which you shouldn't have to worry). If any other user has already built the package then the build process is not needed, and the package is used as is. In order to build a BSC package go to the `bscpkgs` directory, and run: xeon07$ nix-build -A bsc.dummy Notice the "bsc." prefix for BSC packages. The package will be built and installed in the /nix directory, then a symlink is placed in the result directory: xeon07$ find result/ -type f result/ result/bin result/bin/dummy The way in which nix handles the packages and dependencies ensures that the environment of the build process of any package is exactly the same, so the generated output should be the same if the builds are deterministic. You can check the reproducibility of the build by adding the "--check" flag, which will rebuild the package and compare the checksum of every file with the ones previously built: xeon07$ nix-build -A bsc.dummy --check ... xeon07$ echo $? 0 A return code of zero ensures the output is bit by bit identical to the one installed. There are some packages that include indeterministic information in the build process (such as the timestamp of the current time) which will produce an error. Those packages must be patched to ensure the output is deterministic. Notice that if you "cd" into the "result/" directory you will be at /nix directory (as you have follow the symlink) where you don't have write permission. Therefore if your program attempts to write to the current directory it will fail. It is recommended to instead run your program from the top directory: xeon07$ result/bin/dummy Hello world! Or you can install it in the environment: xeon07$ nix-env -i ./result And "cd" into any directory where you want to output some files and just run it by the name: xeon07$ cd /tmp xeon07$ dummy Hello world! Finally, you can remove it from the environment if you don't need it: xeon07$ nix-env -e dummy If you want to know more details use "nix-build --help" to see the manual. 2.3 The build process Each package is built following a programmable configuration description in the nix language. Builds in nix are performed under very strict conditions. No access to any file in the file system is allowed, unless stated in the dependencies, which are in the /nix store only. There is no network access in the build process and other restrictions are enforced so that the build environment is reproducible. See more details here: https://nixos.wiki/wiki/Nix#Sandboxing The top level "default.nix" file of the bscpkgs serves as a index of all BSC packages. You can see the definition for each package, for example the nbody app: nbody = callPackage ./bsc/apps/nbody/default.nix { stdenv = pkgs.gcc9Stdenv; mpi = intel-mpi; icc = icc; tampi = tampi; nanos6 = nanos6-git; }; The compilation details are specified in the "bsc/apps/nbody/default.nix" file. You can configure the package by changing the inputs, for example, what specific implementation of nanos6 or MPI you want to use. To change the MPI implementation to the official MPICH package use: nbody = callPackage ./bsc/apps/nbody/default.nix { stdenv = pkgs.gcc9Stdenv; mpi = pkgs.mpich; # Notice pkgs prefix for official packages icc = icc; tampi = tampi; nanos6 = nanos6-git; }; Then you can rebuild the nbody package: xeon07$ nix-build -A bsc.nbody ... And verify that the binary is indeed linked to MPICH now: xeon07$ ldd result/bin/nbody_mpi.N2.2048.exe | grep mpi libmpi.so.12 => /nix/store/dwkkcv78a5bs8smflpx9ppp3klhz3i98-mpich-3.3.2/lib/libmpi.so.12 (0x00007f6be0f07000) If you modify a package which another package requires as a dependency, nix will rebuild all required packages to propagate your changes on demand. However, if you come back to the original configuration, the package will still be in the /nix store (unless the garbage collector was manually run and removed your old build), so you don't need to rebuild it again. For example if nbody is configured back to use Intel MPI: nbody = callPackage ./bsc/apps/nbody/default.nix { stdenv = pkgs.gcc9Stdenv; mpi = intel-mpi; icc = icc; tampi = tampi; nanos6 = nanos6-git; }; The build process now is not required: xeon07$ nix-build -A bsc.nbody /nix/store/rbq7wrjcmg6fzd6yhrlnkfvzcavdbdpc-nbody xeon07$ ldd result/bin/nbody_mpi.N2.2048.exe | grep mpi libmpifort.so.12 => /nix/store/jvsjvxj2a08340fpdrqbqix9z3mpp3bd-intel-mpi-2019.7.217/lib/libmpifort.so.12 (0x00007f3a00402000) libmpi.so.12 => /nix/store/jvsjvxj2a08340fpdrqbqix9z3mpp3bd-intel-mpi-2019.7.217/lib/libmpi.so.12 (0x00007f39fed34000) Take a look at the different package description files in the bscpkgs repository if you want to understand more details. Also the nix pills are a very good reference: https://nixos.org/nixos/nix-pills/ 2.4 Debugging the build process It may happen that the build process fails in an unexpected way. Most problems are related to missing dependencies and can be easily found by looking at the error messages. Other build problems are more subtle and require more debugging time. One way of inspecting a build problem is by adding the breakpointHook hook to the nativeBuildInputs array in a nix derivation (see https://nixos.org/nixpkgs/manual/#ssec-setup-hooks for more info), which will stop the build process and allow a shell to be attached to the sandbox. xeon07$ nix-build -A bsc.nbody ... /nix/store/gvqm2yc9xx4vh3nglgckz8siya66jnkx-stdenv-linux/setup: line 83: fake-missing-command: command not found build failed in buildPhase with exit code 127 To attach install cntr and run the following command as root: cntr attach -t command \ cntr-/nix/store/sk2nsj7xfr62cjk6m3725ydfyswqz7n1-nbody The command must run as root user, so you can use `sudo -i` to run it, (the -i option is required to load the shell profile which provides the nix path containing the cntr tool): xeon$ sudo -i cntr attach -t command \ cntr-/nix/store/sk2nsj7xfr62cjk6m3725ydfyswqz7n1-nbody nixbld@localhost:/var/lib/cntr> ls bin build dev etc nix proc tmp var Then you can inspect the build environment to see why the build failed. Source the build/env-vars file to get the same environment variables (which include the $PATH) of the build process. /* vim: set ts=2 sw=2 tw=72 fo=watqc expandtab spell autoindent: */