Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Improve Retry Logic to Only Retry on Server-Side HTTP Errors #1390

Merged
merged 15 commits into from
Dec 23, 2024

Conversation

VishalGawade1
Copy link
Contributor

Changes Implemented

Fixes #861

Selective Retrying in osv.go:

Before: The retry logic did not differentiate between server-side and client-side HTTP errors, potentially leading to unnecessary retries on HTTP 4xx responses.
After: Updated the retry mechanism to only retry when the response status code is in the 500 range (HTTP 5xx). This prevents the system from retrying requests that are likely to fail due to client-side issues, thereby optimizing performance and reducing redundant network calls.

osv_test.go:
Verified that the updated retry logic correctly differentiates between HTTP 5xx and HTTP 4xx responses.
Ensured that retries are only attempted for HTTP 5xx errors by running and passing the TestRetryOn5xx test case.

image

Copy link

google-cla bot commented Nov 9, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Copy link
Collaborator

@G-Rath G-Rath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've not reviewed it in full as I'm on mobile, but first we should refactor the tests to not use a third party assertion library

"testing"
"time"

"github.com/stretchr/testify/assert"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the std libs for asserting rather than pull in a new package

Copy link
Contributor Author

@VishalGawade1 VishalGawade1 Nov 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've switched to standard library assertions for simplicity. Let me know if there’s anything more to address. Thanks!

G-Rath
G-Rath previously requested changes Nov 10, 2024
Copy link
Collaborator

@G-Rath G-Rath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a good start - can you also make sure you've run scripts/run_lints.sh and address anything it brings up?

go.mod Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: this needs to be reverted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the scripts/run_lints.sh script, and the only issues flagged were spacing warnings outside of the code I wrote. Let me know if you'd like me to address those too.

Copy link
Collaborator

@another-rex another-rex Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you run go mod tidy, which should remove these extra additions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

friendly ping to run go mod tidy again here :)

func makeRetryRequest(action func() (*http.Response, error)) (*http.Response, error) {
var resp *http.Response
var err error

for i := range maxRetryAttempts {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: I think you've changed this loop more than you need to

  • for i := range maxRetryAttempts should be equivalent to what you've got currently
  • you've removed the random jitter, which is intentionally present to smooth the back-off

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you were right. Sorry for removing Jitter in the function. Added back!

pkg/osv/osv.go Outdated
@@ -158,7 +158,7 @@ func chunkBy[T any](items []T, chunkSize int) [][]T {

// checkResponseError checks if the response has an error.
func checkResponseError(resp *http.Response) error {
if resp.StatusCode == http.StatusOK {
if resp.StatusCode >= 200 && resp.StatusCode < 300 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: I think we shouldn't change this line, since it's not needed for addressing #861

even though 2xx are success codes, we're only expecting the API currently to return a 200 and since this change isn't strictly needed for the rest of the change to work, I would lean towards not changing it for now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been cargo-culting the HTTP retry logic from the examples in https://github.com/sethvargo/go-retry/blob/main/retry_test.go, for what it's worth, which treats 5xx errors as retryable, only. That said, I also recently saw https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429 in the wild, which could be retried with an exponential backoff approach.

Comment on lines 19 to 22
// Override the QueryEndpoint for testing
originalQueryEndpoint := QueryEndpoint
QueryEndpoint = server.URL
defer func() { QueryEndpoint = originalQueryEndpoint }()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: there's no reason to do this, because QueryEndpoint isn't used directly in the function you're testing

(that'll then mean you can make this test a parallel one)

"time"
)

func TestRetryOn5xx(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: we should be testing for a couple of different status codes and orders

I'd recommend refactoring this into a table-based test, and we should have cases for at least:

  • 200
  • 4xx
  • 5xx
  • 5xx, then a 2xx


log.Printf("TestRetryOn5xx: resp = %v, err = %v", resp, err)

// Assertion: resp should be nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: these comments feel pretty redundant

@VishalGawade1
Copy link
Contributor Author

Hi @G-Rath,
While running run_lints.sh, I found couple of issues, so I made some updates to the makeRetryRequest function and adjusted the osv_test.go file based on your feedback. Now, the script runs without any issues on my end, aside from a few spacing warnings in other functions. Just let me know if you’d like me to fix those up too.

@G-Rath
Copy link
Collaborator

G-Rath commented Nov 10, 2024

aside from a few spacing warnings in other functions

would you mind posting a couple of those here? as there shouldn't be any output, and my local doesn't give any

@VishalGawade1
Copy link
Contributor Author

I'm not sure if this is what you are doing but If you're trying to run the script in PowerShell, it likely won’t work and will just return a blank output. Try using Git Bash instead.
here is the command from script:

go install github.com/golangci/golangci-lint/cmd/[email protected]
golangci-lint run ./... --max-same-issues 0

and here are the warnings:
pkg\osv\osv.go:160:6: func checkResponseError is unused (unused)
pkg\osv\osv.go:322:28: G404: Use of weak random number generator (math/rand or math/rand/v2 instead of crypto/rand)
pkg\osv\osv_test.go:53:3: The copy of the 'for' variable "tc" can be deleted (Go 1.22+) (copyloopvar)
pkg\spdx\verify.go:1: File is not gofmt-ed with -s (gofmt)

I'm fairly certain that running go fmt should resolve these warnings. I'll go ahead and fix the warnings related to osv_test.go. Let me know if you'd like me to address the other warnings as well.

@G-Rath
Copy link
Collaborator

G-Rath commented Nov 11, 2024

@VishalGawade1 only the last one is about spacing, the others are legitimate issues to be addressed in your pull request, along with other changes from my review that you're yet to make such as restoring the for i := range loop.

The pkg\spdx\verify.go warning is interesting - I'm happy if you want to just apply that and push it up so we can take a look at what actually gets changed, even though it won't belong in this PR

If you're trying to run the script in PowerShell

I'm not since the script is a shell script not a powershell script 😅

edit: right I misread - yeah I develop in WSL so it's all going through linux; by "blank output" I mean it will give you this:

osv-scanner on  main [$?] via 🐹 v1.22.7 via  v20.11.0 via 🐍 v3.10.12 took 2s
❯ scripts/run_lints.sh
+ go run github.com/golangci/golangci-lint/cmd/[email protected] run ./... --max-same-issues 0

osv-scanner on  main [$?] via 🐹 v1.22.7 via  v20.11.0 via 🐍 v3.10.12 took 10s
❯

@VishalGawade1
Copy link
Contributor Author

@VishalGawade1 only the last one is about spacing, the others are legitimate issues to be addressed in your pull request, along with other changes from my review that you're yet to make such as restoring the for i := range loop.

The pkg\spdx\verify.go warning is interesting - I'm happy if you want to just apply that and push it up so we can take a look at what actually gets changed, even though it won't belong in this PR

If you're trying to run the script in PowerShell

I'm not since the script is a shell script not a powershell script 😅

@VishalGawade1 only the last one is about spacing, the others are legitimate issues to be addressed in your pull request, along with other changes from my review that you're yet to make such as restoring the for i := range loop.

The pkg\spdx\verify.go warning is interesting - I'm happy if you want to just apply that and push it up so we can take a look at what actually gets changed, even though it won't belong in this PR

If you're trying to run the script in PowerShell

I'm not since the script is a shell script not a powershell script 😅

edit: right I misread - yeah I develop in WSL so it's all going through linux; by "blank output" I mean it will give you this:

osv-scanner on  main [$?] via 🐹 v1.22.7 via  v20.11.0 via 🐍 v3.10.12 took 2s
❯ scripts/run_lints.sh
+ go run github.com/golangci/golangci-lint/cmd/[email protected] run ./... --max-same-issues 0

osv-scanner on  main [$?] via 🐹 v1.22.7 via  v20.11.0 via 🐍 v3.10.12 took 10s
❯

yeah, I assumed just to be safe 😄
Sure, I will take look at other warnings and will update you.

@VishalGawade1
Copy link
Contributor Author

@VishalGawade1 only the last one is about spacing, the others are legitimate issues to be addressed in your pull request, along with other changes from my review that you're yet to make such as restoring the for i := range loop.
The pkg\spdx\verify.go warning is interesting - I'm happy if you want to just apply that and push it up so we can take a look at what actually gets changed, even though it won't belong in this PR

If you're trying to run the script in PowerShell

I'm not since the script is a shell script not a powershell script 😅

@VishalGawade1 only the last one is about spacing, the others are legitimate issues to be addressed in your pull request, along with other changes from my review that you're yet to make such as restoring the for i := range loop.
The pkg\spdx\verify.go warning is interesting - I'm happy if you want to just apply that and push it up so we can take a look at what actually gets changed, even though it won't belong in this PR

If you're trying to run the script in PowerShell

I'm not since the script is a shell script not a powershell script 😅
edit: right I misread - yeah I develop in WSL so it's all going through linux; by "blank output" I mean it will give you this:

osv-scanner on  main [$?] via 🐹 v1.22.7 via  v20.11.0 via 🐍 v3.10.12 took 2s
❯ scripts/run_lints.sh
+ go run github.com/golangci/golangci-lint/cmd/[email protected] run ./... --max-same-issues 0

osv-scanner on  main [$?] via 🐹 v1.22.7 via  v20.11.0 via 🐍 v3.10.12 took 10s
❯

yeah, I assumed just to be safe 😄 Sure, I will take look at other warnings and will update you.

@G-Rath
I need your suggestion, what should I do with the checkResponse function? Should I just remove it? Is that a good Idea?

@G-Rath
Copy link
Collaborator

G-Rath commented Nov 11, 2024

I need your suggestion, what should I do with the checkResponse function? Should I just remove it? Is that a good Idea?

Generally speaking I think your current implementation is a bit ... lets say nil-y (I don't want to say "messy" as I wouldn't go that far, but.. there are a lot of != nil checks in there), and I would recommend you review that to see how you could clean it up.

I believe that's why the checkResponse function existed in the first place, and it's why you've added getStatusCode - ultimately, I think the end solution will probably benefit from one utility function that is likely a combo of both of those functions, and that's why I recommend you review the whole implementation of the loop.

(that is to say, technically "yes it should go because it's not being used anymore", but I might as well frontfoot the introduction of a new similar function while we're at it 🙂)

@VishalGawade1
Copy link
Contributor Author

I need your suggestion, what should I do with the checkResponse function? Should I just remove it? Is that a good Idea?

Generally speaking I think your current implementation is a bit ... lets say nil-y (I don't want to say "messy" as I wouldn't go that far, but.. there are a lot of != nil checks in there), and I would recommend you review that to see how you could clean it up.

I believe that's why the checkResponse function existed in the first place, and it's why you've added getStatusCode - ultimately, I think the end solution will probably benefit from one utility function that is likely a combo of both of those functions, and that's why I recommend you review the whole implementation of the loop.

(that is to say, technically "yes it should go because it's not being removed anymore", but I might as well frontfoot the introduction of a new similar function while we're at it 🙂)

I totally understand, sorry about that, and I really appreciate your feedback. I'll work on making the code cleaner and more readable. I'll remove getStatusCode and checkResponse and incorporate error handling directly into makeRetryRequest. I’ll also make the test cases a bit more comprehensive and address the warnings. I'll update you soon!

@VishalGawade1
Copy link
Contributor Author

Hi @G-Rath ,
I’ve removed both checkResponse and getStatusCode functions and consolidated error handling directly within makeRetryRequest. Additionally, there are no warnings when running run_lints. Let me know if this approach works for you. 😄

…ormat, and restored jitter implementation to avoid code duplication in makeRetryRequest function
@VishalGawade1
Copy link
Contributor Author

VishalGawade1 commented Nov 11, 2024

Please review the changes. Due to the merge conflict, some modifications were added back during the restore in previous commit

Copy link
Collaborator

@another-rex another-rex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! Looking pretty good now, just some minor points.

@VishalGawade1
Copy link
Contributor Author

Added, please have a look!

@cuixq cuixq self-requested a review November 18, 2024 03:58
@cuixq
Copy link
Contributor

cuixq commented Nov 20, 2024

@VishalGawade1 could you modify the commit messages following the guide?

@VishalGawade1
Copy link
Contributor Author

@VishalGawade1 could you modify the commit messages following the guide?

Hi @cuixq,
Do you want me to create a new commit according to contributing guidelines or modify previous commit messages?
Just want to confirm before proceeding

@G-Rath
Copy link
Collaborator

G-Rath commented Nov 24, 2024

@cuixq since pull requests are squashed when merged, all that really matters is that the resulting commit follows the convention which I believe by default comes from the pull request title but you always have a chance to change this before the merge actually happens (after hitting the "squash/merge" button, it'll give you a form with the commit title and message).

It's probably worth updating the contributing docs to reflect that.

@VishalGawade1 I assume you've given maintainers permission to edit and push to your pull request which should mean @cuixq can edit the title themselves, but either way if you want to save them some work, that's what you can do

(my permissions are not enough for me to be able to actually check or do the edit)

@cuixq
Copy link
Contributor

cuixq commented Nov 24, 2024

yes I mean the PR title, and I personally prefer the author to modify the title.

@another-rex another-rex changed the title Improve Retry Logic to Only Retry on Server-Side HTTP Errors fix: Improve Retry Logic to Only Retry on Server-Side HTTP Errors Dec 23, 2024
Copy link
Collaborator

@another-rex another-rex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed the fixes directly to fix the issues in PR comments

@another-rex another-rex requested a review from G-Rath December 23, 2024 01:33
@another-rex another-rex dismissed G-Rath’s stale review December 23, 2024 01:33

Fixed the issues

@codecov-commenter
Copy link

codecov-commenter commented Dec 23, 2024

Codecov Report

Attention: Patch coverage is 23.80952% with 16 lines in your changes missing coverage. Please review.

Project coverage is 67.25%. Comparing base (9d28c7f) to head (cbc9a5c).

Files with missing lines Patch % Lines
pkg/osv/osv.go 23.80% 15 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1390      +/-   ##
==========================================
- Coverage   67.32%   67.25%   -0.07%     
==========================================
  Files         192      192              
  Lines       18161    18162       +1     
==========================================
- Hits        12226    12215      -11     
- Misses       5283     5293      +10     
- Partials      652      654       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@another-rex another-rex merged commit 98f4319 into google:main Dec 23, 2024
13 checks passed
another-rex added a commit that referenced this pull request Dec 23, 2024
Implementation of the OSV.dev client, which we'll want to move to
osv.dev repository at some point.

Part of step 3 of the osv-scanner V2 refactor.

Thanks to VishalGawade1 for the makeRetryRequests testing code in #1390
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve retry logic to only retry for appropriate HTTP error codes
6 participants