-
Notifications
You must be signed in to change notification settings - Fork 155
[2.51.0 Bug] Missing singleton objects in 'git repack -adf --path-walk' #1956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -838,4 +838,67 @@ test_expect_success '-n overrides repack.updateServerInfo=true' ' | |
test_server_info_missing | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the Git mailing list, Patrick Steinhardt wrote (reply to this): On Wed, Aug 20, 2025 at 06:39:54PM +0000, Derrick Stolee via GitGitGadget wrote:
> diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
> index 611755cc139b..1998d9bf291c 100755
> --- a/t/t7700-repack.sh
> +++ b/t/t7700-repack.sh
> @@ -838,4 +838,47 @@ test_expect_success '-n overrides repack.updateServerInfo=true' '
Tiny nit: I would've probably squashed this patch into the second patch,
as we usually don't use the add-failing-test-and-then-fix-it-later
dance. On the other hand though it gives some nice context, so I
ultimately don't mind it all that much. So please feel free to ignore
this nit.
> test_server_info_missing
> '
>
> +test_expect_failure 'pending objects are repacked appropriately' '
> + git init pending &&
We probably also want `test_when_finished "rm -rf pending"` before
calling git-init(1).
> +
> + (
> + cd pending &&
> +
> + mkdir -p a/b &&
> + echo singleton >file &&
> + echo stuff >a/b/c &&
> + echo more >a/d &&
> + git add file a &&
> + git commit -m "single blobs" &&
> +
> + echo d >a/d &&
> + echo e >a/e &&
> + git add a &&
> + git commit -m "more blobs" &&
> +
> + # This use of a sparse index helps to force
> + # test that the cache-tree is walked, too.
> + git sparse-checkout set --sparse-index a x &&
> +
> + # Just _stage_ the changes.
> + echo f >a/d &&
> + echo h >a/e &&
> + echo i >a/i &&
> + mkdir x &&
> + echo y >x/y &&
> + git add a x &&
Nit: I think I would've moved the explanations you have in the commit
message into these hunks so that the test becomes a bit more
self-explanatory.
> + # Bring the loose objects into a packfile to avoid
> + # leftovers in next test. Without this, the loose
> + # objects persist and the test succeeds for other
> + # reasons.
> + git repack -adf &&
> + git fsck &&
> +
> + # Test path walk version with pack.useSparse.
> + git -c pack.useSparse=true repack -adf --path-walk &&
> + git fsck
> + )
> +'
Patrick There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the Git mailing list, Derrick Stolee wrote (reply to this): On 8/21/2025 4:00 AM, Patrick Steinhardt wrote:
> On Wed, Aug 20, 2025 at 06:39:54PM +0000, Derrick Stolee via GitGitGadget wrote:
>> diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
>> index 611755cc139b..1998d9bf291c 100755
>> --- a/t/t7700-repack.sh
>> +++ b/t/t7700-repack.sh
>> @@ -838,4 +838,47 @@ test_expect_success '-n overrides repack.updateServerInfo=true' '
>
> Tiny nit: I would've probably squashed this patch into the second patch,
> as we usually don't use the add-failing-test-and-then-fix-it-later
> dance. On the other hand though it gives some nice context, so I
> ultimately don't mind it all that much. So please feel free to ignore
> this nit.
I'm probably the person who is always asking folks to create a test
that either fails or demonstrates the "before" behavior before making
the actual change that updates the case. This allows the ability to
"test the test" by checking it in place to confirm that it is indeed
failing.
Using test_expect_failure allows us to avoid breaking bisect.
>> test_server_info_missing
>> '
>>
>> +test_expect_failure 'pending objects are repacked appropriately' '
>> + git init pending &&
>
> We probably also want `test_when_finished "rm -rf pending"` before
> calling git-init(1).
Good idea.
>> +
>> + (
>> + cd pending &&
>> +
>> + mkdir -p a/b &&
>> + echo singleton >file &&
>> + echo stuff >a/b/c &&
>> + echo more >a/d &&
>> + git add file a &&
>> + git commit -m "single blobs" &&
>> +
>> + echo d >a/d &&
>> + echo e >a/e &&
>> + git add a &&
>> + git commit -m "more blobs" &&
>> +
>> + # This use of a sparse index helps to force
>> + # test that the cache-tree is walked, too.
>> + git sparse-checkout set --sparse-index a x &&
>> +
>> + # Just _stage_ the changes.
>> + echo f >a/d &&
>> + echo h >a/e &&
>> + echo i >a/i &&
>> + mkdir x &&
>> + echo y >x/y &&
>> + git add a x &&
>
> Nit: I think I would've moved the explanations you have in the commit
> message into these hunks so that the test becomes a bit more
> self-explanatory.
Hm. That seems to go against our typical pattern of leaving comments
sparse and having the longer explanation available in commit messages
but maybe I'm out of date or tests are a different beast. I'll see
what I can do to make the test more self-documented.
Thanks,
-Stolee
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the Git mailing list, Junio C Hamano wrote (reply to this): Derrick Stolee <[email protected]> writes:
> On 8/21/2025 4:00 AM, Patrick Steinhardt wrote:
>> On Wed, Aug 20, 2025 at 06:39:54PM +0000, Derrick Stolee via GitGitGadget wrote:
>>> diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
>>> index 611755cc139b..1998d9bf291c 100755
>>> --- a/t/t7700-repack.sh
>>> +++ b/t/t7700-repack.sh
>>> @@ -838,4 +838,47 @@ test_expect_success '-n overrides repack.updateServerInfo=true' '
>>
>> Tiny nit: I would've probably squashed this patch into the second patch,
>> as we usually don't use the add-failing-test-and-then-fix-it-later
>> dance. On the other hand though it gives some nice context, so I
>> ultimately don't mind it all that much. So please feel free to ignore
>> this nit.
>
> I'm probably the person who is always asking folks to create a test
> that either fails or demonstrates the "before" behavior before making
> the actual change that updates the case. This allows the ability to
> "test the test" by checking it in place to confirm that it is indeed
> failing.
> Using test_expect_failure allows us to avoid breaking bisect.
Yes, you can develop that way, but on the reviewing and receiving
end, the second patch that shows only the change from expect_failure
to expect_success pushes the more important "what behaviour was this
thing testing?" out of the hunk context, if you split them into two.
If we really wanted to verify the claim that without the fix this
was broken and we have a test to demonstrate a failure on the
receiving end, we can "checkout" paths touched by that commit
outside t/ from HEAD^ to build to see how the system behaved without
the code change just fine, so such a split does not buy us much.
Unless there is a strong reason not to, please always present such
test in the same patch as the code change to fix that breakage.
Thanks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the Git mailing list, Elijah Newren wrote (reply to this): On Wed, Aug 20, 2025 at 11:39 AM Derrick Stolee via GitGitGadget
<[email protected]> wrote:
>
> From: Derrick Stolee <[email protected]>
>
> Users reported an issue where objects were missing from their local
> enlistments after a full repack using 'git repack -adf --path-walk'.
What is an enlistment?
> This was alarming, but took a while to create a reproducer.
but => and ?
> The root cause is that certain objects existed in the index and had no
> second versions. These objects are usually blobs, though trees can be
> included if a cache-tree exists. The issue is that the revision walk
> adds these objects to the "pending" list and the path-walk API forgets
> to mark the lists it creates at this point as "maybe_interesting". If
> these paths only ever have a single version in the history of the repo
> (including the current staged version) then the parent directory never
> tries to add a new object to the list and mark the list as
> "maybe_interesting". Thus, when walking the list later, the group is
> skipped as it is expected that no objects are interesting. This happens
> even when there are actually no UNINTERESTING objects at all! This is
> based on the optimization enabled by the pack.useSparse=true config
> option, which is the default.
>
> Thus, we create a test case that demonstrates the many cases of this
> issue for reproducibility:
>
> 1. File a/b/c has only one committed version.
> 2. Files a/i and x/y only exists as staged changes.
exists => exist
I didn't have any questions or spot any issues on the rest of the patch. |
||
' | ||
|
||
test_expect_success 'pending objects are repacked appropriately' ' | ||
test_when_finished rm -rf pending && | ||
git init pending && | ||
|
||
( | ||
cd pending && | ||
|
||
# Commit file, a/b/c and never change them. | ||
mkdir -p a/b && | ||
echo singleton >file && | ||
echo stuff >a/b/c && | ||
echo more >a/d && | ||
git add file a && | ||
git commit -m "single blobs" && | ||
|
||
# Files a/d and a/e will not be singletons. | ||
echo d >a/d && | ||
echo e >a/e && | ||
git add a && | ||
git commit -m "more blobs" && | ||
|
||
# This use of a sparse index helps to force | ||
# test that the cache-tree is walked, too. | ||
git sparse-checkout set --sparse-index a x && | ||
|
||
# Create staged changes: | ||
# * a/e now has multiple versions. | ||
# * a/i now has only one version. | ||
echo f >a/d && | ||
echo h >a/e && | ||
echo i >a/i && | ||
git add a && | ||
|
||
# Stage and unstage a change to make use of | ||
# resolve-undo cache and how that impacts fsck. | ||
mkdir x && | ||
echo y >x/y && | ||
git add x && | ||
xy=$(git rev-parse :x/y) && | ||
git rm --cached x/y && | ||
|
||
# The blob for x/y must persist through repacks, | ||
# but fsck currently ignores the REUC extension | ||
# for finding links to the blob. | ||
cat >expect <<-EOF && | ||
dangling blob $xy | ||
EOF | ||
|
||
# Bring the loose objects into a packfile to avoid | ||
# leftovers in next test. Without this, the loose | ||
# objects persist and the test succeeds for other | ||
# reasons. | ||
git repack -adf && | ||
git fsck >out && | ||
test_cmp expect out && | ||
|
||
# Test path walk version with pack.useSparse. | ||
git -c pack.useSparse=true repack -adf --path-walk && | ||
git fsck >out && | ||
test_cmp expect out | ||
) | ||
' | ||
|
||
test_done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Elijah Newren wrote (reply to this):