Skip to content

Integrate the sparse index with 'git apply' and 'git add -p/-i' #1914

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions builtin/add.c
Original file line number Diff line number Diff line change
Expand Up @@ -390,6 +390,10 @@ int cmd_add(int argc,

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Tue, May 6, 2025 at 5:55 PM Derrick Stolee via GitGitGadget
<[email protected]> wrote:
>
> From: Derrick Stolee <[email protected]>
>
> It is slow to expand a sparse index in-memory due to parsing of trees.
> We aim to minimize that performance cost when possible. 'git add -p'
> uses 'git apply' child processes to modify the index, but still there
> are some expansions that occur.

still there are some expansions that occur...outside of those child
processes?  Is that what you're trying to say, or was it something
else?

> It turns out that control flows out of cmd_add() in the interactive
> cases before the lines that confirm that the builtin is integrated with
> the sparse index. We need to move that earlier to ensure it prevents a
> full index expansion on read.
>
> Add more test cases that confirm that these interactive add options work
> with the sparse index. One interesting aspect here is that the '-i'
> option avoids expanding the sparse index when a sparse directory exists
> on disk while the '-p' option does hit the ensure_full_index() method.
> This leaves some room for improvement, but this case should be atypical
> as users should remain within their sparse-checkout.

It's not clear whether this paragraph is talking about existing state
(before the patch) or desired state (after the patch).  I think based
on the context it's the former, but the last sentence sounds more like
a future work direction that makes it very unclear, to me at least.

> Signed-off-by: Derrick Stolee <[email protected]>
> ---
>  builtin/add.c                            |  7 +--
>  t/t1092-sparse-checkout-compatibility.sh | 56 ++++++++++++++++++++++++
>  2 files changed, 60 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/add.c b/builtin/add.c
> index 747511b68bc3..7c292ffdc6c2 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -390,6 +390,10 @@ int cmd_add(int argc,
>
>         argc = parse_options(argc, argv, prefix, builtin_add_options,
>                           builtin_add_usage, PARSE_OPT_KEEP_ARGV0);
> +
> +       prepare_repo_settings(repo);
> +       repo->settings.command_requires_full_index = 0;
> +
>         if (patch_interactive)
>                 add_interactive = 1;
>         if (add_interactive) {
> @@ -426,9 +430,6 @@ int cmd_add(int argc,
>         add_new_files = !take_worktree_changes && !refresh_only && !add_renormalize;
>         require_pathspec = !(take_worktree_changes || (0 < addremove_explicit));
>
> -       prepare_repo_settings(repo);
> -       repo->settings.command_requires_full_index = 0;
> -
>         repo_hold_locked_index(repo, &lock_file, LOCK_DIE_ON_ERROR);
>
>         /*
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index ab8bd371eff3..0dc5dd27184d 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -384,6 +384,38 @@ test_expect_success 'add, commit, checkout' '
>         test_all_match git checkout -
>  '
>
> +test_expect_success 'git add -p' '
> +       init_repos &&
> +
> +       write_script edit-contents <<-\EOF &&
> +       echo text >>$1
> +       EOF
> +
> +       # Does not expand when edits are within sparse checkout.
> +       run_on_all ../edit-contents deep/a &&
> +       run_on_all ../edit-contents deep/deeper1/a &&
> +
> +       test_write_lines y n >in &&
> +       run_on_all git add -p <in &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git reset &&
> +
> +       test_write_lines u 1 "" q >in &&
> +       run_on_all git add -i <in &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git reset --hard &&
> +
> +       run_on_sparse mkdir -p folder1 &&
> +       run_on_all ../edit-contents folder1/a &&
> +       test_write_lines y n y >in &&
> +       run_on_all git add -p <in &&
> +       test_sparse_match git status --porcelain=v2 &&
> +       test_sparse_match git reset &&
> +       test_write_lines u 2 3 "" q >in &&
> +       run_on_all git add -i <in &&
> +       test_sparse_match git status --porcelain=v2
> +'
> +
>  test_expect_success 'deep changes during checkout' '
>         init_repos &&
>
> @@ -2391,6 +2423,30 @@ test_expect_success 'sparse-index is not expanded: git apply' '
>         ensure_not_expanded apply --cached ../patch-outside
>  '
>
> +test_expect_success 'sparse-index is not expanded: git add -p' '
> +       init_repos &&
> +
> +       # Does not expand when edits are within sparse checkout.
> +       echo "new content" >sparse-index/deep/a &&
> +       echo "new content" >sparse-index/deep/deeper1/a &&
> +       test_write_lines y n >in &&
> +       ensure_not_expanded add -p <in &&
> +       git -C sparse-index reset &&
> +       ensure_not_expanded add -i <in &&
> +
> +       mkdir -p sparse-index/folder1 &&
> +       echo "new content" >sparse-index/folder1/a &&
> +
> +       # -p does expand when edits are outside sparse checkout.
> +       test_write_lines y n y >in &&
> +       ensure_expanded add -p <in &&
> +
> +       # but -i does not expand.
> +       git -C sparse-index reset &&
> +       test_write_lines u 2 3 "" q >in &&
> +       ensure_not_expanded add -i <in

This has the same error as patch 1, in that you assume your reset
above (which wasn't even a reset --hard) will re-sparsify the index.
Since it doesn't, your test is misleading and only shows that when
already expanded to include the files of interest it doesn't expand
any further.  To re-sparsify your index before the `add -i` call,
you'll need to do a `git reset --hard && git sparse-checkout reapply`
and then recreate folder1/a with "new content" again...and then run
your 'add -i' command.

Anyway, now I think I understand the expected meaning of the final
paragraph of your commit message, but you were tripped up by assuming
the index was re-sparsified between your steps when it wasn't.

argc = parse_options(argc, argv, prefix, builtin_add_options,
builtin_add_usage, PARSE_OPT_KEEP_ARGV0);

prepare_repo_settings(repo);
repo->settings.command_requires_full_index = 0;

if (patch_interactive)
add_interactive = 1;
if (add_interactive) {
Expand Down Expand Up @@ -426,9 +430,6 @@ int cmd_add(int argc,
add_new_files = !take_worktree_changes && !refresh_only && !add_renormalize;
require_pathspec = !(take_worktree_changes || (0 < addremove_explicit));

prepare_repo_settings(repo);
repo->settings.command_requires_full_index = 0;

repo_hold_locked_index(repo, &lock_file, LOCK_DIE_ON_ERROR);

/*
Expand Down
7 changes: 6 additions & 1 deletion builtin/apply.c
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ static const char * const apply_usage[] = {
int cmd_apply(int argc,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Tue, May 6, 2025 at 5:55 PM Derrick Stolee via GitGitGadget
<[email protected]> wrote:
>
> From: Derrick Stolee <[email protected]>
>
> The sparse index allows storing directory entries in the index, marked
> with the skip-wortkree bit and pointing to a tree object. This may be an
> unexpected data shape for some implementation areas, so we are rolling
> it out incrementally on a builtin-per-builtin basis.
>
> This change enables the sparse index for 'git apply'. The main
> motivation for this change is that 'git apply' is used as a child
> process of 'git add -p' and expanding the sparse index for each of those
> child processes can lead to significant performance issues.
>
> The good news is that the actual index manipulation code used by 'git
> apply' is already integrated with the sparse index, so the only product
> change is to mark the builtin as allowing the sparse index so it isn't
> inflated on read.
>
> The more involved part of this change is around adding tests that verify
> how 'git apply' behaves in a sparse-checkout environment and whether or
> not the index expands in certain operations.
>
> Signed-off-by: Derrick Stolee <[email protected]>
> ---
>  builtin/apply.c                          |  7 +++-
>  t/t1092-sparse-checkout-compatibility.sh | 46 ++++++++++++++++++++++++
>  2 files changed, 52 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/apply.c b/builtin/apply.c
> index 84f1863d3ac3..a1e20c593d09 100644
> --- a/builtin/apply.c
> +++ b/builtin/apply.c
> @@ -12,7 +12,7 @@ static const char * const apply_usage[] = {
>  int cmd_apply(int argc,
>               const char **argv,
>               const char *prefix,
> -             struct repository *repo UNUSED)
> +             struct repository *repo)
>  {
>         int force_apply = 0;
>         int options = 0;
> @@ -35,6 +35,11 @@ int cmd_apply(int argc,
>                                    &state, &force_apply, &options,
>                                    apply_usage);
>
> +       if (repo) {
> +               prepare_repo_settings(repo);
> +               repo->settings.command_requires_full_index = 0;
> +       }
> +
>         if (check_apply_state(&state, force_apply))
>                 exit(128);
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index f9b448792cb4..ab8bd371eff3 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -1340,6 +1340,30 @@ test_expect_success 'submodule handling' '
>         grep "160000 $(git -C initial-repo rev-parse HEAD) 0    modules/sub" cache
>  '
>
> +test_expect_success 'git apply functionality' '
> +       init_repos &&
> +
> +       test_all_match git checkout base &&
> +
> +       git -C full-checkout diff base..merge-right -- deep >patch-in-sparse &&
> +       git -C full-checkout diff base..merge-right -- folder2 >patch-outside &&
> +
> +       # Apply a patch to a file inside the sparse definition
> +       test_all_match git apply --index --stat ../patch-in-sparse &&
> +       test_all_match git status --porcelain=v2 &&
> +
> +       # Apply a patch to a file outside the sparse definition
> +       test_sparse_match test_must_fail git apply ../patch-outside &&
> +       grep "No such file or directory" sparse-checkout-err &&

I was slightly confused by this at first, because I was thinking of
the case where folder2/a exists in the working directory despite not
matching the sparsity patterns, but you were testing a case where the
working directory matched the sparsity patterns, so folder2/a doesn't
exist.

So, the check here looks good.

And, when folder2/a does exist, then we're in the case handled by
82386b44963f (Merge branch 'en/present-despite-skipped', 2022-03-09),
which forces the directory to not be considered sparse, and so it's
just like the ../patch-in-sparse case...meaning it's not really a
different case to test.

So, it all makes sense.

> +
> +       # But it works with --index and --cached
> +       test_all_match git apply --index --stat ../patch-outside &&
> +       test_all_match git status --porcelain=v2 &&
> +       test_all_match git reset --hard &&
> +       test_all_match git apply --cached --stat ../patch-outside &&
> +       test_all_match git status --porcelain=v2
> +'
> +
>  # When working with a sparse index, some commands will need to expand the
>  # index to operate properly. If those commands also write the index back
>  # to disk, they need to convert the index to sparse before writing.
> @@ -2345,6 +2369,28 @@ test_expect_success 'sparse-index is not expanded: check-attr' '
>         ensure_not_expanded check-attr -a --cached -- folder1/a
>  '
>
> +test_expect_success 'sparse-index is not expanded: git apply' '
> +       init_repos &&
> +
> +       git -C sparse-index checkout base &&
> +       git -C full-checkout diff base..merge-right -- deep >patch-in-sparse &&
> +       git -C full-checkout diff base..merge-right -- folder2 >patch-outside &&
> +
> +       # Apply a patch to a file inside the sparse definition
> +       ensure_not_expanded apply --index --stat ../patch-in-sparse &&
> +
> +       # Apply a patch to a file outside the sparse definition
> +       # Fails when caring about the worktree.
> +       ensure_not_expanded ! apply ../patch-outside &&
> +
> +       # Expands when using --index.
> +       ensure_expanded apply --index ../patch-outside &&
> +       git -C sparse-index reset --hard &&

All makes sense up to here.

> +
> +       # Does not expand when using --cached.
> +       ensure_not_expanded apply --cached ../patch-outside

Wait, what?  That makes no sense.

After some digging, I see why the test passed, but it's very
misleading.  Just before this command, if you ran the following
commands from the sparse-index directory, you'd see the following:

$ rm testme
$ GIT_TRACE2_EVENT=$(pwd)/testme git apply --cached ../patch-outside
$ grep ensure_full_index testme
$

Which matches what you were testing and shows why it passed for you.
But I'd argue the test is not correct and confusing for anyone that
reads it, because:

$ git ls-files -s --sparse | grep folder2
040000 cb4007891397aa2a451037d1c69e57f0cf498c24 0    folder2/0/
100644 78981922613b2afb6025042ff6bd878ac1994e85 0    folder2/a

In other words, the index was *already* (partially) expanded by the
`git apply --index`, and the `git reset --hard` did not fix that
contrary to expectations.  Continuing from here we see:

$ git reset --hard
HEAD is now at 703fd3e initial commit
$ git ls-files -s --sparse | grep folder2
040000 cb4007891397aa2a451037d1c69e57f0cf498c24 0    folder2/0/
100644 78981922613b2afb6025042ff6bd878ac1994e85 0    folder2/a
$ git sparse-checkout reapply
$ git ls-files -s --sparse | grep folder2
040000 123706f6fc38949628eaf0483edbf97ba21123ae 0    folder2/

So, we need to do a `git sparse-checkout reapply` to make sure we were
actually in the expected fully sparse state.  From here...

$ rm testme
$ GIT_TRACE2_EVENT=$(pwd)/testme git apply --cached ../patch-outside
$ grep ensure_full_index testme
{"event":"region_enter","sid":"20250510T030856.854763Z-H8ec63c79-P0000371c","thread":"main","time":"2025-05-10T03:08:56.856008Z","file":"sparse-index.c","line":372,"repo":1,"nesting":1,"category":"index","label":"ensure_full_index"}
{"event":"region_leave","sid":"20250510T030856.854763Z-H8ec63c79-P0000371c","thread":"main","time":"2025-05-10T03:08:56.856454Z","file":"sparse-index.c","line":455,"repo":1,"t_rel":0.000446,"nesting":1,"category":"index","label":"ensure_full_index"}
{"event":"region_enter","sid":"20250510T030856.854763Z-H8ec63c79-P0000371c","thread":"main","time":"2025-05-10T03:08:56.857016Z","file":"sparse-index.c","line":372,"repo":1,"nesting":1,"category":"index","label":"ensure_full_index"}
{"event":"region_leave","sid":"20250510T030856.854763Z-H8ec63c79-P0000371c","thread":"main","time":"2025-05-10T03:08:56.857135Z","file":"sparse-index.c","line":455,"repo":1,"t_rel":0.000119,"nesting":1,"category":"index","label":"ensure_full_index"}

So, indeed, `git apply --cached ../patch-outside` DOES expand the
index, as I expected.  It has to when folder2/ is a directory in the
index, so that we can get a folder2/a entry that we can modify.  And
that's just what we see:

$ git ls-files -s --sparse | grep folder2
040000 cb4007891397aa2a451037d1c69e57f0cf498c24 0    folder2/0/
100644 50f4ca4e6265a0497ec2ee6782648138914ad398 0    folder2/a


Can you add a `git sparse-checkout reapply` right after your `git
reset --hard`, and then switch the ensure_not_expanded to
ensure_expanded for the apply --cached call?

const char **argv,
const char *prefix,
struct repository *repo UNUSED)
struct repository *repo)
{
int force_apply = 0;
int options = 0;
Expand All @@ -35,6 +35,11 @@ int cmd_apply(int argc,
&state, &force_apply, &options,
apply_usage);

if (repo) {
prepare_repo_settings(repo);
repo->settings.command_requires_full_index = 0;
}

if (check_apply_state(&state, force_apply))
exit(128);

Expand Down
1 change: 1 addition & 0 deletions t/perf/p2000-sparse-operations.sh
Original file line number Diff line number Diff line change
Expand Up @@ -135,5 +135,6 @@ test_perf_on_all git diff-tree HEAD
test_perf_on_all git diff-tree HEAD -- $SPARSE_CONE/a
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Tue, May 6, 2025 at 5:55 PM Derrick Stolee via GitGitGadget
<[email protected]> wrote:
>
> From: Derrick Stolee <[email protected]>
>
> The previous two changes contributed performance improvements to 'git
> apply' and 'git add -p' when using a sparse index. Add a performance
> test to demonstrate this (and to help validate that performance remains
> good in the future).
>
> In the truncated test output below, we see that the full checkout
> performance changes within noise expectations, but the sparse index
> cases improve 33% and then 96%.
>
>                       HEAD~3     HEAD~2         HEAD~1
> ---------------------------------------------------------
> 2000.118: (full-v3)     0.80   0.84 +5.0%     0.84  +5.0%
> 2000.119: (full-v4)     0.76   0.79 +3.9%     0.80  +5.3%
> 2000.120: (sparse-v3)   2.09   1.39 -33.5%    0.07 -96.7%
> 2000.121: (sparse-v4)   2.09   1.39 -33.5%    0.07 -96.7%
>
> It is worth noting that if our test was more involved and had multiple
> hunks to evaluate, then the time spent in 'git apply' would dominate due
> to multiple index loads and writes. As it stands, we need the sparse
> index improvement in 'git add -p' itself to confirm this performance
> improvement.
>
> Since the change for 'git add -i' is identical, we avoid a second test
> case for that similar operation.
>
> Signed-off-by: Derrick Stolee <[email protected]>
> ---
>  t/perf/p2000-sparse-operations.sh | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
> index 39e92b084143..3da6ae59c000 100755
> --- a/t/perf/p2000-sparse-operations.sh
> +++ b/t/perf/p2000-sparse-operations.sh
> @@ -135,5 +135,6 @@ test_perf_on_all git diff-tree HEAD
>  test_perf_on_all git diff-tree HEAD -- $SPARSE_CONE/a
>  test_perf_on_all "git worktree add ../temp && git worktree remove ../temp"
>  test_perf_on_all git check-attr -a -- $SPARSE_CONE/a
> +test_perf_on_all 'echo >>a && test_write_lines y | git add -p'
>
>  test_done
> --
> gitgitgadget

Very nice!

test_perf_on_all "git worktree add ../temp && git worktree remove ../temp"
test_perf_on_all git check-attr -a -- $SPARSE_CONE/a
test_perf_on_all 'echo >>a && test_write_lines y | git add -p'

test_done
102 changes: 102 additions & 0 deletions t/t1092-sparse-checkout-compatibility.sh
Original file line number Diff line number Diff line change
Expand Up @@ -384,6 +384,38 @@ test_expect_success 'add, commit, checkout' '
test_all_match git checkout -
'

test_expect_success 'git add -p' '
init_repos &&

write_script edit-contents <<-\EOF &&
echo text >>$1
EOF

# Does not expand when edits are within sparse checkout.
run_on_all ../edit-contents deep/a &&
run_on_all ../edit-contents deep/deeper1/a &&

test_write_lines y n >in &&
run_on_all git add -p <in &&
test_all_match git status --porcelain=v2 &&
test_all_match git reset &&

test_write_lines u 1 "" q >in &&
run_on_all git add -i <in &&
test_all_match git status --porcelain=v2 &&
test_all_match git reset --hard &&

run_on_sparse mkdir -p folder1 &&
run_on_all ../edit-contents folder1/a &&
test_write_lines y n y >in &&
run_on_all git add -p <in &&
test_sparse_match git status --porcelain=v2 &&
test_sparse_match git reset &&
test_write_lines u 2 3 "" q >in &&
run_on_all git add -i <in &&
test_sparse_match git status --porcelain=v2
'

test_expect_success 'deep changes during checkout' '
init_repos &&

Expand Down Expand Up @@ -1340,6 +1372,30 @@ test_expect_success 'submodule handling' '
grep "160000 $(git -C initial-repo rev-parse HEAD) 0 modules/sub" cache
'

test_expect_success 'git apply functionality' '
init_repos &&

test_all_match git checkout base &&

git -C full-checkout diff base..merge-right -- deep >patch-in-sparse &&
git -C full-checkout diff base..merge-right -- folder2 >patch-outside &&

# Apply a patch to a file inside the sparse definition
test_all_match git apply --index --stat ../patch-in-sparse &&
test_all_match git status --porcelain=v2 &&

# Apply a patch to a file outside the sparse definition
test_sparse_match test_must_fail git apply ../patch-outside &&
grep "No such file or directory" sparse-checkout-err &&

# But it works with --index and --cached
test_all_match git apply --index --stat ../patch-outside &&
test_all_match git status --porcelain=v2 &&
test_all_match git reset --hard &&
test_all_match git apply --cached --stat ../patch-outside &&
test_all_match git status --porcelain=v2
'

# When working with a sparse index, some commands will need to expand the
# index to operate properly. If those commands also write the index back
# to disk, they need to convert the index to sparse before writing.
Expand Down Expand Up @@ -2345,6 +2401,52 @@ test_expect_success 'sparse-index is not expanded: check-attr' '
ensure_not_expanded check-attr -a --cached -- folder1/a
'

test_expect_success 'sparse-index is not expanded: git apply' '
init_repos &&

git -C sparse-index checkout base &&
git -C full-checkout diff base..merge-right -- deep >patch-in-sparse &&
git -C full-checkout diff base..merge-right -- folder2 >patch-outside &&

# Apply a patch to a file inside the sparse definition
ensure_not_expanded apply --index --stat ../patch-in-sparse &&

# Apply a patch to a file outside the sparse definition
# Fails when caring about the worktree.
ensure_not_expanded ! apply ../patch-outside &&

# Expands when using --index.
ensure_expanded apply --index ../patch-outside &&
git -C sparse-index reset --hard &&

# Does not expand when using --cached.
ensure_not_expanded apply --cached ../patch-outside
'

test_expect_success 'sparse-index is not expanded: git add -p' '
init_repos &&

# Does not expand when edits are within sparse checkout.
echo "new content" >sparse-index/deep/a &&
echo "new content" >sparse-index/deep/deeper1/a &&
test_write_lines y n >in &&
ensure_not_expanded add -p <in &&
git -C sparse-index reset &&
ensure_not_expanded add -i <in &&

mkdir -p sparse-index/folder1 &&
echo "new content" >sparse-index/folder1/a &&

# -p does expand when edits are outside sparse checkout.
test_write_lines y n y >in &&
ensure_expanded add -p <in &&

# but -i does not expand.
git -C sparse-index reset &&
test_write_lines u 2 3 "" q >in &&
ensure_not_expanded add -i <in
'

test_expect_success 'advice.sparseIndexExpanded' '
init_repos &&

Expand Down
Loading