Skip to content
This repository has been archived by the owner on Jun 28, 2024. It is now read-only.

Add support for scheduled retry with retryAfter #94

Closed
wants to merge 17 commits into from
Closed

Add support for scheduled retry with retryAfter #94

wants to merge 17 commits into from

Conversation

jeremycolin
Copy link

@jeremycolin jeremycolin commented Oct 15, 2019

Fixes #93

Changes

  1. retry operation now does not pass currentAttempt as argument. currentAttempt is now actually returned to retry() so that we can use it outside of the current retry operation (for multiple retries in parallel).

  2. retry operation now passes scheduledRetry as boolean argument. scheduledRetry helps determine that we need to propagate error when it fails.

  3. support for dropAllRequestsAfter which wraps all requests (initial and potential retries) sent and will drop all of them after that time

  4. support for retryAfter which enables retrying a request without having to wait for first one to fail and still have it in flight while we send second one

lib/client.ts Outdated Show resolved Hide resolved
@jeremycolin
Copy link
Author

jeremycolin commented Oct 29, 2019

Discussion with @shuhei

  • performance.now() or process.hrtime() or Date.now()

  • When using the new params, monitor number of open connections.

  • This will create more connections, can we run into some limit? limit on File descriptor?

  • When aborting request, server (depending on how its implemented) will likely receive and log errors of connections closed prematurely.

  • aborting a request vs ignoring: when we simply ignore the request, how is the data handled if not read, does it generate garbage? @NorthFury

  • timerLeft concepts can be later enhanced by passing down this information to the server serving the request. Probably we also do not want to send the request if timerLeft is quite low (threshold to define)

@jeremycolin jeremycolin changed the title Add support for dropAllRequestsAfter Add support for scheduled retry with retryAfter Oct 31, 2019
this.options.dropAllRequestsAfter -
(performance.now() - timerInitial);
if (params.dropRequestAfter) {
params.dropRequestAfter = Math.min(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This params object is reused across retries. Is it safe to mutate it? I prefer to clone it for each retry unless its reference equality is used somewhere.

README.md Show resolved Hide resolved
if (!hasRequestEnded) {
requestObject.abort();
const err = new UserTimeoutError(options, timings);
logEvent(EventSource.HTTP_REQUEST, EventName.ERROR, err.message);
Copy link
Contributor

@shuhei shuhei Oct 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to mark aborted requests as timeouts when aborting because of a successful request?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging is one thing, but we also need to make sure that this doesn't open the circuit breaker when aborting because of another successful request.

Copy link
Author

@jeremycolin jeremycolin Nov 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once one on the request has succeed, retryOperation will be marked as resolved which will bypass any reject "bubble", see:

catch((error: ServiceClientError) => {
  if (retryOperation.isResolved()) {
    return;
   }
   retryErrors.push(error);
   failure();
   ...

So actually clients are not even aware of the result of this promise. There is a vicious side effect though.: this request is actually never resolved or rejected: is it a problem?

Copy link
Contributor

@shuhei shuhei Nov 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, so the circuit breaker command's success() or failure() are not called. Then, it won't make false failure() calls at least!

But isn't the circuit breaker designed with an assumption that one of success() or failure() is called when an operation ends? It seems that the circuit breaker thinks that it is a timeout if success() or failure() are not called. And timeouts contribute to error rate calculation in the circuit breaker, which means it can open the circuit breaker without actual errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we should stop the timeout timer in the circuit breaker when aborting requests because of another successful request. Because calling success() will lower the actual error rate, how about creating another function (cancel or something) in Command?

We may want to mark failures in some cases though. For example, if the first request doesn't respond and the second request succeeds, it makes more sense to mark the first request as a failure.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow very good catch! Will evaluate adding this new function

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok this approach seems to work. I have to write more tests now to cover all the different cases:

  • both requests work
  • first fails
  • second fails
  • first timeouts
  • second timeouts
  • both fail
  • both timeouts

success();
result.retryErrors = retryErrors;
resolve(result);
if (retryAfter) {
requestsAbortCallbacks.forEach(cb => cb());
requestsAbortCallbacks.length = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this is a cleanup for the happy case. Is this array of callbacks cleaned up in the sad case (all requests fail)? I'm not sure if it causes a memory leak only from the code, but I think it's worth checking.

Copy link
Author

@jeremycolin jeremycolin Nov 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good catch! Will check that

@jeremycolin jeremycolin closed this by deleting the head repository Sep 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow retry without aborting previous request + race retries
3 participants