forked from git/git
-
Notifications
You must be signed in to change notification settings - Fork 140
Integrate the sparse index with 'git apply' and 'git add -p/-i' #1914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
derrickstolee
wants to merge
3
commits into
gitgitgadget:master
Choose a base branch
from
derrickstolee:apply-sparse-index
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+113
−4
Open
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,7 +12,7 @@ static const char * const apply_usage[] = { | |
int cmd_apply(int argc, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the Git mailing list, Elijah Newren wrote (reply to this): On Tue, May 6, 2025 at 5:55 PM Derrick Stolee via GitGitGadget
<[email protected]> wrote:
>
> From: Derrick Stolee <[email protected]>
>
> The sparse index allows storing directory entries in the index, marked
> with the skip-wortkree bit and pointing to a tree object. This may be an
> unexpected data shape for some implementation areas, so we are rolling
> it out incrementally on a builtin-per-builtin basis.
>
> This change enables the sparse index for 'git apply'. The main
> motivation for this change is that 'git apply' is used as a child
> process of 'git add -p' and expanding the sparse index for each of those
> child processes can lead to significant performance issues.
>
> The good news is that the actual index manipulation code used by 'git
> apply' is already integrated with the sparse index, so the only product
> change is to mark the builtin as allowing the sparse index so it isn't
> inflated on read.
>
> The more involved part of this change is around adding tests that verify
> how 'git apply' behaves in a sparse-checkout environment and whether or
> not the index expands in certain operations.
>
> Signed-off-by: Derrick Stolee <[email protected]>
> ---
> builtin/apply.c | 7 +++-
> t/t1092-sparse-checkout-compatibility.sh | 46 ++++++++++++++++++++++++
> 2 files changed, 52 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/apply.c b/builtin/apply.c
> index 84f1863d3ac3..a1e20c593d09 100644
> --- a/builtin/apply.c
> +++ b/builtin/apply.c
> @@ -12,7 +12,7 @@ static const char * const apply_usage[] = {
> int cmd_apply(int argc,
> const char **argv,
> const char *prefix,
> - struct repository *repo UNUSED)
> + struct repository *repo)
> {
> int force_apply = 0;
> int options = 0;
> @@ -35,6 +35,11 @@ int cmd_apply(int argc,
> &state, &force_apply, &options,
> apply_usage);
>
> + if (repo) {
> + prepare_repo_settings(repo);
> + repo->settings.command_requires_full_index = 0;
> + }
> +
> if (check_apply_state(&state, force_apply))
> exit(128);
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index f9b448792cb4..ab8bd371eff3 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -1340,6 +1340,30 @@ test_expect_success 'submodule handling' '
> grep "160000 $(git -C initial-repo rev-parse HEAD) 0 modules/sub" cache
> '
>
> +test_expect_success 'git apply functionality' '
> + init_repos &&
> +
> + test_all_match git checkout base &&
> +
> + git -C full-checkout diff base..merge-right -- deep >patch-in-sparse &&
> + git -C full-checkout diff base..merge-right -- folder2 >patch-outside &&
> +
> + # Apply a patch to a file inside the sparse definition
> + test_all_match git apply --index --stat ../patch-in-sparse &&
> + test_all_match git status --porcelain=v2 &&
> +
> + # Apply a patch to a file outside the sparse definition
> + test_sparse_match test_must_fail git apply ../patch-outside &&
> + grep "No such file or directory" sparse-checkout-err &&
I was slightly confused by this at first, because I was thinking of
the case where folder2/a exists in the working directory despite not
matching the sparsity patterns, but you were testing a case where the
working directory matched the sparsity patterns, so folder2/a doesn't
exist.
So, the check here looks good.
And, when folder2/a does exist, then we're in the case handled by
82386b44963f (Merge branch 'en/present-despite-skipped', 2022-03-09),
which forces the directory to not be considered sparse, and so it's
just like the ../patch-in-sparse case...meaning it's not really a
different case to test.
So, it all makes sense.
> +
> + # But it works with --index and --cached
> + test_all_match git apply --index --stat ../patch-outside &&
> + test_all_match git status --porcelain=v2 &&
> + test_all_match git reset --hard &&
> + test_all_match git apply --cached --stat ../patch-outside &&
> + test_all_match git status --porcelain=v2
> +'
> +
> # When working with a sparse index, some commands will need to expand the
> # index to operate properly. If those commands also write the index back
> # to disk, they need to convert the index to sparse before writing.
> @@ -2345,6 +2369,28 @@ test_expect_success 'sparse-index is not expanded: check-attr' '
> ensure_not_expanded check-attr -a --cached -- folder1/a
> '
>
> +test_expect_success 'sparse-index is not expanded: git apply' '
> + init_repos &&
> +
> + git -C sparse-index checkout base &&
> + git -C full-checkout diff base..merge-right -- deep >patch-in-sparse &&
> + git -C full-checkout diff base..merge-right -- folder2 >patch-outside &&
> +
> + # Apply a patch to a file inside the sparse definition
> + ensure_not_expanded apply --index --stat ../patch-in-sparse &&
> +
> + # Apply a patch to a file outside the sparse definition
> + # Fails when caring about the worktree.
> + ensure_not_expanded ! apply ../patch-outside &&
> +
> + # Expands when using --index.
> + ensure_expanded apply --index ../patch-outside &&
> + git -C sparse-index reset --hard &&
All makes sense up to here.
> +
> + # Does not expand when using --cached.
> + ensure_not_expanded apply --cached ../patch-outside
Wait, what? That makes no sense.
After some digging, I see why the test passed, but it's very
misleading. Just before this command, if you ran the following
commands from the sparse-index directory, you'd see the following:
$ rm testme
$ GIT_TRACE2_EVENT=$(pwd)/testme git apply --cached ../patch-outside
$ grep ensure_full_index testme
$
Which matches what you were testing and shows why it passed for you.
But I'd argue the test is not correct and confusing for anyone that
reads it, because:
$ git ls-files -s --sparse | grep folder2
040000 cb4007891397aa2a451037d1c69e57f0cf498c24 0 folder2/0/
100644 78981922613b2afb6025042ff6bd878ac1994e85 0 folder2/a
In other words, the index was *already* (partially) expanded by the
`git apply --index`, and the `git reset --hard` did not fix that
contrary to expectations. Continuing from here we see:
$ git reset --hard
HEAD is now at 703fd3e initial commit
$ git ls-files -s --sparse | grep folder2
040000 cb4007891397aa2a451037d1c69e57f0cf498c24 0 folder2/0/
100644 78981922613b2afb6025042ff6bd878ac1994e85 0 folder2/a
$ git sparse-checkout reapply
$ git ls-files -s --sparse | grep folder2
040000 123706f6fc38949628eaf0483edbf97ba21123ae 0 folder2/
So, we need to do a `git sparse-checkout reapply` to make sure we were
actually in the expected fully sparse state. From here...
$ rm testme
$ GIT_TRACE2_EVENT=$(pwd)/testme git apply --cached ../patch-outside
$ grep ensure_full_index testme
{"event":"region_enter","sid":"20250510T030856.854763Z-H8ec63c79-P0000371c","thread":"main","time":"2025-05-10T03:08:56.856008Z","file":"sparse-index.c","line":372,"repo":1,"nesting":1,"category":"index","label":"ensure_full_index"}
{"event":"region_leave","sid":"20250510T030856.854763Z-H8ec63c79-P0000371c","thread":"main","time":"2025-05-10T03:08:56.856454Z","file":"sparse-index.c","line":455,"repo":1,"t_rel":0.000446,"nesting":1,"category":"index","label":"ensure_full_index"}
{"event":"region_enter","sid":"20250510T030856.854763Z-H8ec63c79-P0000371c","thread":"main","time":"2025-05-10T03:08:56.857016Z","file":"sparse-index.c","line":372,"repo":1,"nesting":1,"category":"index","label":"ensure_full_index"}
{"event":"region_leave","sid":"20250510T030856.854763Z-H8ec63c79-P0000371c","thread":"main","time":"2025-05-10T03:08:56.857135Z","file":"sparse-index.c","line":455,"repo":1,"t_rel":0.000119,"nesting":1,"category":"index","label":"ensure_full_index"}
So, indeed, `git apply --cached ../patch-outside` DOES expand the
index, as I expected. It has to when folder2/ is a directory in the
index, so that we can get a folder2/a entry that we can modify. And
that's just what we see:
$ git ls-files -s --sparse | grep folder2
040000 cb4007891397aa2a451037d1c69e57f0cf498c24 0 folder2/0/
100644 50f4ca4e6265a0497ec2ee6782648138914ad398 0 folder2/a
Can you add a `git sparse-checkout reapply` right after your `git
reset --hard`, and then switch the ensure_not_expanded to
ensure_expanded for the apply --cached call? |
||
const char **argv, | ||
const char *prefix, | ||
struct repository *repo UNUSED) | ||
struct repository *repo) | ||
{ | ||
int force_apply = 0; | ||
int options = 0; | ||
|
@@ -35,6 +35,11 @@ int cmd_apply(int argc, | |
&state, &force_apply, &options, | ||
apply_usage); | ||
|
||
if (repo) { | ||
prepare_repo_settings(repo); | ||
repo->settings.command_requires_full_index = 0; | ||
} | ||
|
||
if (check_apply_state(&state, force_apply)) | ||
exit(128); | ||
|
||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -135,5 +135,6 @@ test_perf_on_all git diff-tree HEAD | |
test_perf_on_all git diff-tree HEAD -- $SPARSE_CONE/a | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the Git mailing list, Elijah Newren wrote (reply to this): On Tue, May 6, 2025 at 5:55 PM Derrick Stolee via GitGitGadget
<[email protected]> wrote:
>
> From: Derrick Stolee <[email protected]>
>
> The previous two changes contributed performance improvements to 'git
> apply' and 'git add -p' when using a sparse index. Add a performance
> test to demonstrate this (and to help validate that performance remains
> good in the future).
>
> In the truncated test output below, we see that the full checkout
> performance changes within noise expectations, but the sparse index
> cases improve 33% and then 96%.
>
> HEAD~3 HEAD~2 HEAD~1
> ---------------------------------------------------------
> 2000.118: (full-v3) 0.80 0.84 +5.0% 0.84 +5.0%
> 2000.119: (full-v4) 0.76 0.79 +3.9% 0.80 +5.3%
> 2000.120: (sparse-v3) 2.09 1.39 -33.5% 0.07 -96.7%
> 2000.121: (sparse-v4) 2.09 1.39 -33.5% 0.07 -96.7%
>
> It is worth noting that if our test was more involved and had multiple
> hunks to evaluate, then the time spent in 'git apply' would dominate due
> to multiple index loads and writes. As it stands, we need the sparse
> index improvement in 'git add -p' itself to confirm this performance
> improvement.
>
> Since the change for 'git add -i' is identical, we avoid a second test
> case for that similar operation.
>
> Signed-off-by: Derrick Stolee <[email protected]>
> ---
> t/perf/p2000-sparse-operations.sh | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
> index 39e92b084143..3da6ae59c000 100755
> --- a/t/perf/p2000-sparse-operations.sh
> +++ b/t/perf/p2000-sparse-operations.sh
> @@ -135,5 +135,6 @@ test_perf_on_all git diff-tree HEAD
> test_perf_on_all git diff-tree HEAD -- $SPARSE_CONE/a
> test_perf_on_all "git worktree add ../temp && git worktree remove ../temp"
> test_perf_on_all git check-attr -a -- $SPARSE_CONE/a
> +test_perf_on_all 'echo >>a && test_write_lines y | git add -p'
>
> test_done
> --
> gitgitgadget
Very nice! |
||
test_perf_on_all "git worktree add ../temp && git worktree remove ../temp" | ||
test_perf_on_all git check-attr -a -- $SPARSE_CONE/a | ||
test_perf_on_all 'echo >>a && test_write_lines y | git add -p' | ||
|
||
test_done |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Elijah Newren wrote (reply to this):