Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inflated Repository Size Due to Committed .sccache Files #1194

Open
jrhemstad opened this issue Dec 8, 2023 · 0 comments
Open

Inflated Repository Size Due to Committed .sccache Files #1194

jrhemstad opened this issue Dec 8, 2023 · 0 comments

Comments

@jrhemstad
Copy link
Collaborator

jrhemstad commented Dec 8, 2023

Thanks to investigation from @ajschmidt8, we've identified a significant increase in the size of our repository, primarily attributed to .sccache files that were inadvertently committed to the repo's history. A fresh clone of the cccl repository is currently about 367MB, which is unusually large.

Investigation and Findings

  1. Clone and Size Assessment:
git clone [email protected]:NVIDIA/cccl.git
cd cccl
du -h --max-depth 0 .   
  1. Identification of Large Files:
git rev-list --objects --all |                                                                                                       
  git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |
  sed -n 's/^blob //p' |
  sort --numeric-sort --key=2 |
  cut -c 1-12,41- |
  $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest
...
2f0daf262060  2.9MiB .sccache/9/e/9e37d362c1007202477b18f7855e55e71b05a5b46c55dd2c3bfbcc355994b163
a3d002f8cef0  3.3MiB .sccache/2/e/2ee8f6281e79e45202aae7b7cb89976ad27a69a7e49d905af549238f2dca0d33
9d720f55849d  3.3MiB .sccache/7/a/7a2c59e04a8f03e5675101cddcc59125d26b162b4100add1a6fe0c33a9a2fe96
27998b24b787  3.3MiB .sccache/0/2/02eabccdff6b33019c37f63bb59b1569ba8cf031e9fa1a8b247c4ab8a67b8903
cdf0c45e6c6a  3.3MiB .sccache/d/6/d662a4523ace3db5f5a08691f9b602b41aa268d5bdd4a6449f252d05acfd8432
288355260fea  3.4MiB .sccache/9/a/9ad06af3a8009bed920039eda66c3d837b280bfd1ae1727bb5627c3b1c8e7dee
663ca17550e3  3.8MiB .sccache/a/8/a86b008835722a40b1c00f11476e9c4481c69d049f11235b5c4eb26ac217d103
fd84d3bca1a1  5.1MiB libcudacxx/libcxxabi/test/test_demangle.pass.cpp
0f745de29822  6.9MiB .sccache/d/2/d263ca87a8a1b6deca13a2e0cdfd364634f59c1b01480ff69c92cb6ccddae1ad
08ed86aa1a9f  8.8MiB .sccache/d/8/d8797629aff35aa8b0d17d5b55673967eb7bb35b420a1d32ba331fed69a296a0
  • Additionally, isolated the size contribution of .sccache files:
    git rev-list --objects --all -- .sccache | git cat-file --batch-check="%(objectsize) %(rest)" | cut -d" " -f1 | paste -s -d + - | bc | numfmt --to=si
    
  • Finding: .sccache files contribute approximately 224MB to the repo size.

Options for Resolution

  1. Do Nothing:

    • Pros: Simplest option, no action required.
    • Cons: Continues the burden of large clone size and potential for slower operations.
  2. Rewrite History to Remove .sccache Files:

    • Pros: Significantly reduce repo size, faster clone and operations.
    • Cons: Complex process; requires rewriting history, which can disrupt all current branches and open pull requests.
    • Implications:
      • All contributors will need to re-clone the repository or carefully rebase their work.
      • Potential loss of historical context for some changes.
      • Requires meticulous coordination and communication with all contributors.

Suggested Course of Action

Given the substantial size contribution of the .sccache files, it seems prudent to consider rewriting the repository's history to remove these files, despite the complexities involved. This action would provide long-term benefits in terms of repository manageability and performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

1 participant