Move big blobs outside repository #171

Closed
opened 2025-09-17 10:49:28 +02:00 by rarias · 3 comments
Owner

We have some PDFs that are taking most of the repo size. I would like to move them elsewhere so they don’t need to be copied around.

Maybe Git LFS is a good candidate: https://git-lfs.com/

We have some PDFs that are taking most of the repo size. I would like to move them elsewhere so they don’t need to be copied around. Maybe Git LFS is a good candidate: https://git-lfs.com/
Author
Owner

Git LFS seems to work: https://jungle.bsc.es/git/rarias/jungle/compare/master...gitea-lfs

However, I'm not very happy with the additional complexity and extra steps needed to handle blobs.

Another simpler solution is to just have some directory shared on the web server, where we can copy PDF and other large files so we can link them from the website.

Git LFS seems to work: https://jungle.bsc.es/git/rarias/jungle/compare/master...gitea-lfs However, I'm not very happy with the additional complexity and extra steps needed to handle blobs. Another simpler solution is to just have some directory shared on the web server, where we can copy PDF and other large files so we can link them from the website.
rarias added a new dependency 2025-10-01 16:04:45 +02:00
Author
Owner

There are two issues that I would like to solve:

  • Using nix flakes causes the current state of the git directory to be copied to the store. This doesn't take into account the size of old files no longer in the index. It can be measured with:
apex% du -sh $(nix flake metadata | awk '/Path/{print $2}')
30M	/nix/store/r5aihfkzij1vjspm2xmpvph4q39n38hz-source
  • Lower the transfer size of the repository so we speed up evaluations of flakes that use jungle as input. These require fetching the repo:
apex% time git clone https://jungle.bsc.es/git/rarias/jungle.git /tmp/1
Cloning into '/tmp/1'...
remote: Enumerating objects: 5175, done.
remote: Counting objects: 100% (5175/5175), done.
remote: Compressing objects: 100% (2456/2456), done.
remote: Total 5175 (delta 3404), reused 4193 (delta 2559), pack-reused 0 (from 0)
Receiving objects: 100% (5175/5175), 26.39 MiB | 33.16 MiB/s, done.
Resolving deltas: 100% (3404/3404), done.
git clone https://jungle.bsc.es/git/rarias/jungle.git /tmp/1  1,10s user 0,37s system 105% cpu 1,386 total

apex% du -sh /tmp/1
56M	/tmp/1
There are two issues that I would like to solve: - Using nix flakes causes the *current state of the git directory* to be copied to the store. This doesn't take into account the size of old files no longer in the index. It can be measured with: ``` apex% du -sh $(nix flake metadata | awk '/Path/{print $2}') 30M /nix/store/r5aihfkzij1vjspm2xmpvph4q39n38hz-source ``` - Lower the transfer size of the repository so we speed up evaluations of flakes that use jungle as input. These require fetching the repo: ``` apex% time git clone https://jungle.bsc.es/git/rarias/jungle.git /tmp/1 Cloning into '/tmp/1'... remote: Enumerating objects: 5175, done. remote: Counting objects: 100% (5175/5175), done. remote: Compressing objects: 100% (2456/2456), done. remote: Total 5175 (delta 3404), reused 4193 (delta 2559), pack-reused 0 (from 0) Receiving objects: 100% (5175/5175), 26.39 MiB | 33.16 MiB/s, done. Resolving deltas: 100% (3404/3404), done. git clone https://jungle.bsc.es/git/rarias/jungle.git /tmp/1 1,10s user 0,37s system 105% cpu 1,386 total apex% du -sh /tmp/1 56M /tmp/1 ```
Author
Owner

Rewrote history using f3bfe89f27/doc/trim.sh and the resulting repository is now in the master branch.

A backup of the previous repository is at: https://jungle.bsc.es/git/rarias/jungle-backup

The old master branch is also kept for now at: https://jungle.bsc.es/git/rarias/jungle/src/branch/old-master

Rewrote history using https://jungle.bsc.es/git/rarias/jungle/src/commit/f3bfe89f275384a5def14c9fd229129416218ba2/doc/trim.sh and the resulting repository is now in the master branch. A backup of the previous repository is at: https://jungle.bsc.es/git/rarias/jungle-backup The old master branch is also kept for now at: https://jungle.bsc.es/git/rarias/jungle/src/branch/old-master
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Reference: rarias/jungle#171
No description provided.