Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large Git repository size #105

Open
jamesmbaazam opened this issue Jun 14, 2024 · 12 comments
Open

Large Git repository size #105

jamesmbaazam opened this issue Jun 14, 2024 · 12 comments

Comments

@jamesmbaazam
Copy link
Contributor

I just tried to clone this repository and noticed it is extremely large (1.78 GiB). Could you consider reducing the size?

Enumerating objects: 36546, done.
Counting objects: 100% (36546/36546), done.
Delta compression using up to 8 threads
Compressing objects: 100% (14063/14063), done.
Writing objects: 100% (36546/36546), done.
Total 36546 (delta 6517), reused 36546 (delta 6517), pack-reused 0
count: 0
size: 0 bytes
in-pack: 36546
packs: 1
+size-pack: 1.78 GiB
prune-packable: 0
garbage: 0
size-garbage: 0 bytes
@athowes
Copy link
Collaborator

athowes commented Jun 14, 2024

Thanks @jamesmbaazam! I'll try to have a look now to see which are the large objects in the package. Let me know if you already know this.

@sbfnk
Copy link

sbfnk commented Jun 14, 2024

This thread may be useful epiforecasts/EpiNow2#538

@athowes
Copy link
Collaborator

athowes commented Jun 14, 2024

Right yes, I was getting confused because I was just looking for large files and wasn't sure what it would be (most of the largest ones below in my local version aren't commited):

(base) adamhowes@Adams-MBP-3 epidist % find . -type f -exec du -h {} + | sort -rh | head -n 30
1.8G	./.git/objects/pack/pack-c0eadcec05734dbd7fbeea35b55b28e4d80163cf.pack
106M	./vignettes/approx-inference_cache/html/unnamed-chunk-3_2985e7a1c33b14dbee6c6ec2e582dd29.rdb
 39M	./vignettes/epidist_cache/html/unnamed-chunk-10_5bc037bca68bb093cd52c5af19833258.rdb
 39M	./vignettes/approx-inference_cache/html/unnamed-chunk-2_d897e4863777c9f2f93bedcef6242c9f.rdb
 26M	./.Rproj.user/9EE3E129/ctx/ctx-11541/environment
4.1M	./.Rproj.user/9EE3E129/ctx/ctx-27423/environment
3.2M	./.Rproj.user/9EE3E129/ctx/ctx-27423/options
2.9M	./.git/objects/pack/pack-b817aae2005187439bb6ca20d19112f1e65bb811.pack
2.2M	./.git/objects/pack/pack-4612fe591aa5b4272815be1d44d4de620084dd3c.pack
1.8M	./.Rproj.user/9EE3E129/ctx/ctx-11541/options
1.3M	./doc/epidist.html
1.1M	./inst/gadm41_SLE_shp/gadm41_SLE_3.shp
964K	./.git/objects/pack/pack-c0eadcec05734dbd7fbeea35b55b28e4d80163cf.idx
780K	./inst/gadm41_SLE_shp/gadm41_SLE_2.shp
708K	./inst/gadm41_SLE_shp/gadm41_SLE_1.shp
684K	./inst/gadm41_SLE_shp/gadm41_SLE_0.shp
612K	./inst/README.html
496K	./data-raw/pnas.1518587113.sd02.xlsx
464K	./.git/objects/0b/d00b70191ce3449fb6e1cc9ace088f1dfc50c3
400K	./vignettes/approx-inference_cache/html/unnamed-chunk-5_f19631317140b7c91d2fad3fbc266f2b.rdb
396K	./vignettes/approx-inference_cache/html/unnamed-chunk-4_5ef2e049d3d4c08bf99ec1df29b55daa.rdb
384K	./docs/articles/epidist_files/bootstrap-3.3.5/css/bootstrap.css.map
328K	./docs/deps/bootstrap-5.3.1/bootstrap.bundle.min.js.map
316K	./.git/objects/53/6f70af7a33634640a582f839e43b6309e02d40
288K	./docs/deps/bootstrap-5.3.1/bootstrap.min.css
284K	./docs/deps/jquery-3.6.0/jquery-3.6.0.js
284K	./docs/articles/epidist_files/jquery-3.6.0/jquery-3.6.0.js
272K	./.Rproj.user/shared/notebooks/BFE9F241-epidist/1/s/cda40rp2ydjfw/000002.snapshot
260K	./vignettes/epidist_cache/html/unnamed-chunk-5_8603d489aac9e0331f40afed22f03136.rdb
208K	./vignettes/epidist_cache/html/unnamed-chunk-3_93116863c5d8a54d61bf388abb31230e.rdb

So the point is that the Git version contains all the histories too, which is ending up being large. Thanks for link @sbfnk, will read and try to implement.

Edit: after a skim it seems like doing this is relatively intricate. As there are not many people using the package currently, I wonder if doing this could be best timed to coincide with a 0.1.0 release. Otherwise, I think it may be that it would need to be done again at that point anyway (likely development will continue to add file histories). Let me know if people disagree about this and think it's a priority to do sooner.

@jamesmbaazam
Copy link
Contributor Author

This thread may be useful epiforecasts/EpiNow2#538

Thanks, Seb. I was coming here to post this.

@seabbs
Copy link
Contributor

seabbs commented Jun 14, 2024

(likely development will continue to add file histories).

It shouldn't add large files. We do want to close out hanging PRs before we do this (and in general).

Thanks for the input all. This is a legacy of pulling the package out of the analysis repo I think

@athowes athowes changed the title Repository size Large Git repository size Jun 24, 2024
@athowes athowes added the high Required for next release label Aug 8, 2024
@athowes athowes mentioned this issue Aug 8, 2024
11 tasks
@seabbs seabbs self-assigned this Oct 4, 2024
@athowes
Copy link
Collaborator

athowes commented Nov 13, 2024

Requires no PRs.

@athowes athowes added public release and removed high Required for next release labels Nov 20, 2024
@seabbs
Copy link
Contributor

seabbs commented Jan 20, 2025

Work on this has started with main sorted out but I still see a large repo size on a fresh clone. Need to do another check on branches and in particular gh-pages which may not have been properly configured

@seabbs
Copy link
Contributor

seabbs commented Jan 20, 2025

More discussion of this here specifically: epiforecasts/EpiNow2#538 (comment)

@seabbs
Copy link
Contributor

seabbs commented Jan 21, 2025

Revisiting this and catching the tags etc I have this at 186.02 MiB now which is much smaller but still bigger than it really should be. I've hit most of the obvious things (large files and file types that are known bad) so will start going through the histories quickly to spot other issues. I think the target should probably be <20 but if the current size works for people happy to stop here.

@seabbs
Copy link
Contributor

seabbs commented Jan 21, 2025

Down at 50 now after catching some rogue html

@seabbs
Copy link
Contributor

seabbs commented Jan 21, 2025

Now down at 14mb. I think this is as far as I can go without comprising the package history. Note this is just for the full repo with tags and website not the size of the actual built package (which should be smaller)

@seabbs
Copy link
Contributor

seabbs commented Jan 21, 2025

I'm happy to close this as complete if confirmed by interested parties that this is resolved for them (a sensible test is to clone the repo and note how long it takes and the file size it says it has installed).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants