Cached and shared filesystem #73

Open
opened 2024-07-23 12:48:19 +02:00 by rarias · 1 comment
Owner

The current shared filesystem in jungle is Ceph, which provides good properties of redundancy (3 copies), but doesn't do any caching, which is reasonable when returning from a write() causes any other node to immediately update the read() buffer from another node.

However, this is not a common use case. Usually we write to files from on node, then close the files and expect them to be updated when a later open is issued from another node.

Ideally we should be able to prepare a cached filesystem with a similar coherence model as NFS to speedup the writes and reads.

One simple test we can do to see how well a FS performs is to switch branches or commits from LLVM, which cause a large amount of files to be written and changed.

The current shared filesystem in jungle is Ceph, which provides good properties of redundancy (3 copies), but doesn't do any caching, which is reasonable when returning from a `write()` causes any other node to immediately update the `read()` buffer from another node. However, this is not a common use case. Usually we write to files from on node, then close the files and expect them to be updated when a later open is issued from another node. Ideally we should be able to prepare a cached filesystem with a similar coherence model as NFS to speedup the writes and reads. One simple test we can do to see how well a FS performs is to switch branches or commits from LLVM, which cause a large amount of files to be written and changed.
rarias added the
io
label 2024-07-23 12:48:19 +02:00
Author
Owner

With the following mount points:

hut% mount | grep -E '(/tmp|/ceph|/nvme|/home)'
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,size=65946828k)
10.0.40.30:/home on /home type nfs (rw,relatime,vers=3,rsize=1024,wsize=1024,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.0.40.30,mountvers=3,mountport=20048,mountproto=udp,local_lock=none,addr=10.0.40.30)
/dev/nvme0n1 on /nvme type ext4 (rw,relatime)
user@9c8d06e0-485f-4aaf-b16b-06d6daf1232b.cephfs=/ on /ceph-slow type ceph (rw,relatime,name=user,secret=<hidden>,acl,mon_addr=10.0.40.40)
user@9c8d06e0-485f-4aaf-b16b-06d6daf1232b.cephfs=/ on /ceph type ceph (rw,relatime,name=user,secret=<hidden>,fsc,acl,mon_addr=10.0.40.40)

And this benchmark script:

#!/usr/bin/env bash

set -ex

dirlist=(
  /tmp/llvm-mono # Tmpfs
  /nvme/scratch/llvm-mono/ # Fast NVME
  /ceph/home/rarias/llvm-mono # Cached ceph
  /ceph-slow/home/rarias/llvm-mono # Uncached ceph
  /home/Computational/rarias/llvm-mono # NFS
)

for dir in ${dirlist[@]}; do
  echo $dir
  cd $dir

  for run in $(seq 3); do
    git checkout llvmorg-15.0.0

    # Empty cache
    sudo sync
    echo 3 | sudo tee /proc/sys/vm/drop_caches

    time git checkout llvmorg-14.0.0
  done
done

I get these times:

FS Time run 1 Time run 2 Time run 3
/tmp 0m12,790s 0m12,972s 0m12,278s
/nvme 0m14,745s 0m15,128s 0m14,775s
/ceph 1m56,763s 1m53,844s 1m55,603s
/ceph-slow 1m59,403s 2m1,979s 1m59,536s
/home 2m39,460s 2m40,810s 2m32,403s
With the following mount points: ``` hut% mount | grep -E '(/tmp|/ceph|/nvme|/home)' tmpfs on /tmp type tmpfs (rw,nosuid,nodev,size=65946828k) 10.0.40.30:/home on /home type nfs (rw,relatime,vers=3,rsize=1024,wsize=1024,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.0.40.30,mountvers=3,mountport=20048,mountproto=udp,local_lock=none,addr=10.0.40.30) /dev/nvme0n1 on /nvme type ext4 (rw,relatime) user@9c8d06e0-485f-4aaf-b16b-06d6daf1232b.cephfs=/ on /ceph-slow type ceph (rw,relatime,name=user,secret=<hidden>,acl,mon_addr=10.0.40.40) user@9c8d06e0-485f-4aaf-b16b-06d6daf1232b.cephfs=/ on /ceph type ceph (rw,relatime,name=user,secret=<hidden>,fsc,acl,mon_addr=10.0.40.40) ``` And this benchmark script: ```sh #!/usr/bin/env bash set -ex dirlist=( /tmp/llvm-mono # Tmpfs /nvme/scratch/llvm-mono/ # Fast NVME /ceph/home/rarias/llvm-mono # Cached ceph /ceph-slow/home/rarias/llvm-mono # Uncached ceph /home/Computational/rarias/llvm-mono # NFS ) for dir in ${dirlist[@]}; do echo $dir cd $dir for run in $(seq 3); do git checkout llvmorg-15.0.0 # Empty cache sudo sync echo 3 | sudo tee /proc/sys/vm/drop_caches time git checkout llvmorg-14.0.0 done done ``` I get these times: | FS | Time run 1 | Time run 2 | Time run 3 | | :----: | :----: | :----: | :----: | | /tmp | 0m12,790s | 0m12,972s | 0m12,278s | | /nvme | 0m14,745s | 0m15,128s | 0m14,775s | | /ceph | 1m56,763s | 1m53,844s | 1m55,603s | | /ceph-slow | 1m59,403s | 2m1,979s | 1m59,536s | | /home | 2m39,460s | 2m40,810s | 2m32,403s |
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: rarias/jungle#73
No description provided.