Skip to content
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.

Perf: optimize fetching last file commit #58055

Merged
merged 2 commits into from
Nov 1, 2023
Merged

Conversation

camdencheek
Copy link
Member

@camdencheek camdencheek commented Nov 1, 2023

When fetching the information about the commit that modified a file, we use an ancestors query that is filtered to the file name. This is a paginated interface, so we always request n+1 items so we know whether there is a next page.

However, the extra requested commit can actually be quite expensive to fetch, particularly in the relatively common case that a file has only been modified in one commit: the one that created it. In that case, we have to iterate over the entire history of the repository to look for a commit that doesn't exist to produce information about a next page that will never be fetched.

The ideal solution here would be to only fetch n+1 if the hasNextPage field of the resolver is requested. Unfortunately, our GraphQL package still doesn't support querying for which fields are requested, which is a huge bummer.

Instead, this takes a slightly hackier approach and assumes that if the number requested is 1, it's better to serve that quickly and claim we have a next page without knowing whether that page will contain anything.

Test plan

Manually tested the the trace of the FetchOwnersAndHistory GraphQL request. For a file with only one commit, the span representing the git command shrunk from 300ms to 70ms, and there no longer a visible loading indicator.

@cla-bot cla-bot bot added the cla-signed label Nov 1, 2023
@camdencheek camdencheek requested a review from a team November 1, 2023 19:11
Copy link
Member

@eseliger eseliger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive-by: wouldn't this affect the correctness of the pageInfo?

@camdencheek
Copy link
Member Author

wouldn't this affect the correctness of the pageInfo?

Yes, which is why I added the bonus logic in PageInfo(). Basically, if n == 1, we just always claim there is a next page. The next page might be empty, but I don't think that counts a correctness issue, just an unnecessary followup request.

In practice, if you actually care about paginating, I don't think you'd ever have a page size of 1 anyways, so I don't think this will ever be problematic. However, the real solution is to fix our graphql library so that we query the requested fields, making this whole dance unnecessary.

@eseliger
Copy link
Member

eseliger commented Nov 1, 2023

Yeah, I didn't mean to block this PR, but returning hasNextPage: true when there's no next page actually feels like incorrect API data to me :) Maybe we can add a comment about that somewhere?

@camdencheek
Copy link
Member Author

Yeah you're right. I added some more comments. I hope we can solve the "we don't know what fields were requested" issue for real...but until then I think this is still worth the hack and slightly incorrect behavior.

@camdencheek camdencheek merged commit d069362 into main Nov 1, 2023
@camdencheek camdencheek deleted the cc/optimize-ancestors branch November 1, 2023 23:00
vovakulikov pushed a commit that referenced this pull request Dec 12, 2023
When fetching the information about the commit that modified a file, we use an ancestors query that is filtered to the file name. This is a paginated interface, so we always request n+1 items so we know whether there is a next page.

However, the extra requested commit can actually be quite expensive to fetch, particularly in the relatively common case that a file has only been modified in one commit: the one that created it. In that case, we have to iterate over the entire history of the repository to look for a commit that doesn't exist to produce information about a next page that will never be fetched.

The ideal solution here would be to only fetch n+1 if the hasNextPage field of the resolver is requested. Unfortunately, our GraphQL package still doesn't support querying for which fields are requested, which is a huge bummer.

Instead, this takes a slightly hackier approach and assumes that if the number requested is 1, it's better to serve that quickly and claim we have a next page without knowing whether that page will contain anything.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants