Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LG-15655: Offload reCAPTCHA annotations to worker job (Part 1 of 2) #11883

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

aduth
Copy link
Contributor

@aduth aduth commented Feb 14, 2025

🎫 Ticket

LG-15655

🛠 Summary of changes

Implements a new worker job to send annotations for Google reCAPTCHA Enterprise.

The intent is to avoid these external API requests slowing user response times in the main application. Since these annotations are already treated as "fire and forget", they're a prime candidate to worker-ize, particularly since we don't rely on any result of the job.

The primary challenge with this is ensuring that annotations for a particular assessment are handled in order in a "last wins" approach. We don't necessarily need all annotations to be sent, as long as the last annotation enqueued is the one to be sent. This is achieved using GoodJob's concurrency helpers with a maximum enqueue_limit of 1 (see docs).

The changes here only include the implementation of the job, to avoid deployment issues. A follow-up pull request will update the RecaptchaAnnotator class to enqueue jobs.

📜 Testing Plan

Verify that build passes.

changelog: Internal, Performance, Offload reCAPTCHA annotations to worker job
@aduth aduth requested review from mitchellhenke and a team February 14, 2025 15:18
@aduth aduth changed the title LG-15655: Offload reCAPTCHA annotations to worker job (Part 1/2) LG-15655: Offload reCAPTCHA annotations to worker job (Part 1 of 2) Feb 14, 2025
@vrajmohan
Copy link
Contributor

It is considered a good practice to keep the job code light - https://github.com/toptal/active-job-style-guide?tab=readme-ov-file#business-logic-in-jobs. Doesn't it make sense to do so and call the RecaptchaAnnotator from the job?

key: -> { "#{self.class.name}-#{queue_name}-#{arguments.last[:assessment_id]}" },
)

def perform(assessment_id:, reason: nil, annotation: nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed this in RecapatchaAnnotation as well - why do we allow this to be nil if we are always supplying a reason?

Copy link
Contributor Author

@aduth aduth Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thought process here was that both are optional arguments in code because both are optional parameters of the annotation API.

I don't have a strong feeling to not make it required based on current usage, though I could imagine that it might cause confusion down the line if someone didn't realize they didn't have to provide a reason.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hear you, but I feel we should take a stand here - I can't imagine sending an annotation without a reason. I remember struggling with this when implementing the previous story.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, one of the values in https://cloud.google.com/recaptcha/docs/reference/rest/v1/projects.assessments/annotate#reason is REASON_UNSPECIFIED Unspecified reason. Do not use. (sic)

@aduth
Copy link
Contributor Author

aduth commented Feb 14, 2025

It is considered a good practice to keep the job code light - https://github.com/toptal/active-job-style-guide?tab=readme-ov-file#business-logic-in-jobs. Doesn't it make sense to do so and call the RecaptchaAnnotator from the job?

That's a good thought. I think we'd have to find another way to generate the logged hash value, but that should be reasonable to inline at the analytics call site I think.

@vrajmohan
Copy link
Contributor

I am actually not seeing anything in https://github.com/bensheldon/good_job?tab=readme-ov-file#concurrency-controls about the last one winning. Can you point me to it?

@aduth
Copy link
Contributor Author

aduth commented Feb 14, 2025

I am actually not seeing anything in https://github.com/bensheldon/good_job?tab=readme-ov-file#concurrency-controls about the last one winning. Can you point me to it?

The "last wins" isn't overtly documented, but it's the behavior I'm expecting out of the enqueue_limit option, with tests intending to validate the intended behavior. This might need a little more validation (manual testing, code inspection) to confirm.

@aduth
Copy link
Contributor Author

aduth commented Feb 18, 2025

On closer testing, it actually appears to be the opposite behavior of what we want: Namely, enqueue_limit will always favor the first perform_later call and discard everything else. I'll revert this back to draft and see if I can find a way to reverse that behavior.

I tested by pausing the worker job execution while I accumulated multiple annotations (initiating and passing 2FA), checking the results by:

  • Running yarn build && rails s independently from the worker Foreman process, so that jobs weren't processed, allowing me to inspect the contents of SELECT * FROM good_jobs WHERE job_class = 'RecaptchaAnnotateJob'; via rails db --db worker_jobs, particularly serialized_params.
  • Adding a binding.pry breakpoint within the RecaptchaAnnotateJob#perform and then running bundle exec good_job start in a separate terminal process to inspect which arguments are passed when the jobs processing resumes.

@aduth aduth marked this pull request as draft February 18, 2025 14:03
@aduth
Copy link
Contributor Author

aduth commented Feb 18, 2025

After some closer examination of the documentation and code internals of good_job, I think perform_limit should achieve the behavior we want, specifically preventing a situation where the default behavior could allow two enqueued jobs to be picked up in the correct order but finished in an unexpected order.

Allowed default behavior:

  1. Job A1 is enqueued
  2. Job A2 is enqueued
  3. Job A1 begins processing
  4. Job A2 begins processing
  5. Job A2 completes
  6. Job A1 completes

Behavior with perform_limit: 1:

  1. Job A1 is enqueued
  2. Job A2 is enqueued
  3. Job A1 begins processing
  4. Job A2 doesn’t start processing because perform_limit prevents it
  5. Job A1 completes
  6. Job A2 begins processing
  7. Job A2 completes

I also tested this locally, verifying that it's processed in order and that subsequent enqueued jobs don't start until after the first job is enqueued. I tested this with a bit of hacky debugging code, combined with the approach in the previous comment of "queueing up" jobs before starting bundle exec good_job start:

diff --git a/app/jobs/recaptcha_annotate_job.rb b/app/jobs/recaptcha_annotate_job.rb
index 9ac0cc218b..09b476e905 100644
--- a/app/jobs/recaptcha_annotate_job.rb
+++ b/app/jobs/recaptcha_annotate_job.rb
@@ -13,2 +13,5 @@ class RecaptchaAnnotateJob < ApplicationJob
   def perform(assessment_id:, reason:, annotation: nil)
+    puts 'run - %s - %s' % [reason, assessment_id]
+    sleep 1
+    puts 'after run - %s - %s' % [reason, assessment_id]
     RecaptchaAnnotator.annotate(assessment_id:, reason:, annotation:)

With this and a breakpoint at good_job's internal ConcurrencyExceededError handling, I verified that the error is raised, but then the job is retried after the first is complete.

@aduth
Copy link
Contributor Author

aduth commented Feb 18, 2025

Another scenario I'll want to test, prompted by good_job's caveat about perform_limit:

First-in-first-out job execution order is not preserved when a job is retried with incremental back-off.

Scenario:

  1. Job A1 is enqueued
  2. Job A2 is enqueued
  3. Job A3 is enqueued
  4. Job A1 begins processing
  5. Jobs A2 and A3 don't start processing because perform_limit prevents it
  6. Job A1 completes
  7. ??? Unclear whether ordering is guaranteed between jobs A2 and A3 at this point

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants