Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use redis worker for run heartbeats and alerts #1669

Merged
merged 6 commits into from
Feb 6, 2025
Merged

Conversation

ericallam
Copy link
Member

@ericallam ericallam commented Feb 5, 2025

Summary by CodeRabbit

  • New Features

    • Enhanced background processing with new worker initialization routines and refined alert delivery for improved reliability.
    • Introduced new tasks for error simulation and updated existing tasks for better error handling.
    • Added new configuration parameters for legacy and common worker setups to allow for more granular control.
  • Refactor

    • Streamlined task and alert enqueuing by simplifying method interfaces and clarifying service responsibilities.
    • Improved worker concurrency management and error handling mechanisms.
    • Updated visibility timeout strategy to enhance task management.
  • Chores

    • Updated configuration options for worker concurrency and timing.
    • Incorporated new dependencies and adjusted test setups to boost performance and stability.

Copy link
Contributor

coderabbitai bot commented Feb 5, 2025

Warning

Rate limit exceeded

@ericallam has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 2 minutes and 22 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between b902ab1 and a2ca3ee.

📒 Files selected for processing (4)
  • .github/workflows/pr_checks.yml (0 hunks)
  • apps/webapp/app/env.server.ts (1 hunks)
  • apps/webapp/app/v3/commonWorker.server.ts (1 hunks)
  • apps/webapp/app/v3/legacyRunEngineWorker.server.ts (1 hunks)

Walkthrough

This pull request introduces extensive configuration and service enhancements. New configuration parameters for both legacy run engine workers and common workers are added, including Redis connectivity and concurrency options. Several worker services and alerting systems have been refactored to remove deprecated task handlers and streamline enqueuing flows by eliminating redundant transaction parameters. Additionally, internal Redis worker packages have been upgraded with enhanced type safety, telemetry, and concurrency management. Changes also include modified TypeScript configurations and dependency additions to support the new Redis worker functionality, alongside updates in task triggers for error simulation.

Changes

File(s) Change Summary
apps/webapp/app/env.server.ts Added new configuration parameters for legacy run engine and common workers, including Redis connection details, concurrency settings, and polling intervals.
apps/webapp/app/services/worker.server.ts Removed the performTaskAttemptAlerts task and updated the requeueTaskRun task handler to an empty function.
apps/webapp/app/v3/commonWorker.server.ts
apps/webapp/app/v3/legacyRunEngineWorker.server.ts
Introduced new initializeWorker functions that configure and export singleton instances for common and legacy workers using Redis settings.
apps/webapp/app/v3/marqs/index.server.ts
apps/webapp/app/v3/marqs/v3VisibilityTimeout.server.ts
Updated visibility timeout strategy: Renamed V3VisibilityTimeout to V3GraphileVisibilityTimeout and introduced V3LegacyRunEngineWorkerVisibilityTimeout; corresponding imports and usage were updated.
apps/webapp/app/v3/services/alerts/*
apps/webapp/app/v3/services/createDeployedBackgroundWorker.server.ts
apps/webapp/app/v3/services/deploymentIndexFailed.server.ts
apps/webapp/app/v3/services/failDeployment.server.ts
apps/webapp/app/v3/services/finalizeDeployment.server.ts
apps/webapp/app/v3/services/finalizeTaskRun.server.ts
apps/webapp/app/v3/services/timeoutDeployment.server.ts
Adjusted alert and deployment services to remove Prisma transaction parameters and to invoke alert enqueuing via commonWorker.enqueue with streamlined signatures.
apps/webapp/app/v3/taskRunHeartbeatFailed.server.ts Renamed the RequeueTaskRunService class to TaskRunHeartbeatFailedService.
apps/webapp/package.json
apps/webapp/remix.config.js
apps/webapp/tsconfig.json
Added dependency on @internal/redis-worker, updated bundled server dependencies (added "p-limit", "yocto-queue"), and inserted new TS path mappings for the Redis worker package.
Internal Packages (internal-packages/redis-worker/):
- package.json
- src/index.ts
- src/queue.ts & queue.test.ts
- src/telemetry.ts
- src/worker.ts & worker.test.ts
- tsconfig.json
Enhanced the Redis worker package with new type definitions (e.g., QueueItem with timestamp), updated dequeue signatures, added a telemetry helper (startSpan), refined the Worker class for concurrency and error handling (including a new stop method), and expanded TS lib support.
references/v3-catalog/src/trigger/simple.ts Introduced a new task (simulateErrorTester) and modified existing tasks (simulateError, immediateReturn, createJsonHeroDoc) to adjust delays, retry configurations, and error simulation behavior.

Possibly related PRs

  • More v1 run engine fixes #1644 – The changes in the main PR, which involve adding new configuration parameters to the EnvironmentSchema object, are related to the modifications in the retrieved PR that also adds new properties to the same EnvironmentSchema object in the same file.
  • Add max queue depth limits #1376 – The changes in the main PR, which add new configuration parameters to the EnvironmentSchema object, are related to the retrieved PR as both involve modifications to the apps/webapp/app/env.server.ts file, specifically enhancing the schema with new properties for managing queue sizes.
  • Feat: configurable deploy timeouts #1519 – The changes in the main PR, which add multiple new parameters to the EnvironmentSchema object in env.server.ts, are related to the changes in the retrieved PR that also modifies the EnvironmentSchema by adding a new property for deployment timeout, indicating both PRs are enhancing the configuration options within the same schema.

Suggested reviewers

  • nicktrn
  • matt-aitken

Poem

Oh, what a hop in the code we see,
New configs and workers as sprightly as can be,
Redis queues and timeouts in a clever array,
Alerts and tasks now dance in a neat ballet,
I nibble on bugs with a joyful heart,
Celebrating smooth changes as I hop and dart!
🥕🐇 Happy coding!


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (9)
internal-packages/redis-worker/src/queue.test.ts (2)

33-42: Use toMatchObject for concise and readable assertions

To simplify the test assertions, consider using expect(...).toMatchObject({...}) instead of expect(...).toEqual(expect.objectContaining({...})). This makes the code more concise and improves readability.

Apply this diff to update the assertion:

-          expect(first).toEqual(
-            expect.objectContaining({
-              id: "1",
-              job: "test",
-              item: { value: 1 },
-              visibilityTimeoutMs: 2000,
-              attempt: 0,
-              timestamp: expect.any(Date),
-            })
-          );
+          expect(first).toMatchObject({
+            id: "1",
+            job: "test",
+            item: { value: 1 },
+            visibilityTimeoutMs: 2000,
+            attempt: 0,
+            timestamp: expect.any(Date),
+          });

232-251: Refactor repeated assertions into a helper function

Multiple test cases use similar assertions to verify dequeued items. Consider refactoring these into a helper function or custom matcher to reduce code duplication and enhance maintainability.

For example, you could create a helper function:

function expectDequeuedItem(item: QueueItem<any>, expected: Partial<QueueItem<any>>) {
  expect(item).toMatchObject({
    visibilityTimeoutMs: expected.visibilityTimeoutMs,
    attempt: expected.attempt,
    timestamp: expect.any(Date),
    ...expected,
  });
}

And use it in your tests:

expectDequeuedItem(first, {
  id: "1",
  job: "test",
  item: { value: 1 },
  visibilityTimeoutMs: 2000,
  attempt: 0,
});

Also applies to: 263-271, 318-327

apps/webapp/app/v3/commonWorker.server.ts (1)

22-23: Avoid logging sensitive connection details

Logging the Redis host in debug statements can expose sensitive infrastructure details. Consider removing or obfuscating such information in logs, especially in production environments.

Apply this diff to modify the log statement:

-  logger.debug(`👨‍🏭 Initializing common worker at host ${env.COMMON_WORKER_REDIS_HOST}`);
+  logger.debug(`👨‍🏭 Initializing common worker`);
internal-packages/redis-worker/src/worker.ts (2)

11-11: Add type annotations for pLimit import

To maintain type safety, specify the type for the pLimit import from the p-limit library. This helps prevent any TypeScript compilation issues and improves code clarity.

Apply this diff to add type annotations:

-import pLimit from "p-limit";
+import pLimit, { Limit } from "p-limit";

75-99: Review concurrency limiter configuration for performance

The implementation introduces a global concurrency limiter using p-limit. Ensure that the limit value is appropriately set to balance performance and resource utilization. An excessively low limit might hinder throughput, while a high limit could strain system resources.

apps/webapp/app/v3/legacyRunEngineWorker.server.ts (1)

28-36: Consider adjusting visibility timeout and retry settings.

The current settings might need adjustment:

  • 60 seconds visibility timeout might be too short for long-running tasks
  • 3 retry attempts might not be sufficient for transient failures

Consider:

-        visibilityTimeoutMs: 60_000,
-        retry: {
-          maxAttempts: 3,
-        },
+        visibilityTimeoutMs: 300_000, // 5 minutes
+        retry: {
+          maxAttempts: 5,
+          backoff: {
+            type: "exponential",
+            minMs: 1000,
+            maxMs: 60000,
+          },
+        },
apps/webapp/app/v3/taskRunHeartbeatFailed.server.ts (1)

37-39: Update log messages to use new class name.

The log messages still reference the old class name RequeueTaskRunService. Update them to maintain consistency.

-      logger.error("[RequeueTaskRunService] Task run not found", {
+      logger.error("[TaskRunHeartbeatFailedService] Task run not found", {
         runId,
       });

-            "[RequeueTaskRunService] Failing task run because the heartbeat failed and it's PENDING but locked",
+            "[TaskRunHeartbeatFailedService] Failing task run because the heartbeat failed and it's PENDING but locked",
             { taskRun }

-          logger.debug("[RequeueTaskRunService] Nacking task run", { taskRun });
+          logger.debug("[TaskRunHeartbeatFailedService] Nacking task run", { taskRun });

-        logger.debug("[RequeueTaskRunService] Failing task run", { taskRun });
+        logger.debug("[TaskRunHeartbeatFailedService] Failing task run", { taskRun });

-        logger.debug("[RequeueTaskRunService] Removing task run from queue", { taskRun });
+        logger.debug("[TaskRunHeartbeatFailedService] Removing task run from queue", { taskRun });

-        logger.debug("[RequeueTaskRunService] Requeueing task run", { taskRun });
+        logger.debug("[TaskRunHeartbeatFailedService] Requeueing task run", { taskRun });

-          logger.error("[RequeueTaskRunService] Error signaling run cancellation", {
+          logger.error("[TaskRunHeartbeatFailedService] Error signaling run cancellation", {

Also applies to: 48-50, 65-65, 74-74, 93-93, 104-104, 138-141

references/v3-catalog/src/trigger/simple.ts (1)

109-111: Rename task to reflect delayed return.

The task name immediateReturn is misleading as it now includes a 20-second delay.

-export const immediateReturn = task({
-  id: "immediateReturn",
+export const delayedReturn = task({
+  id: "delayedReturn",
internal-packages/redis-worker/package.json (1)

24-24: Consider documenting the reason for disabling file parallelism.

While disabling file parallelism can help with test stability, it would be helpful to document the specific reason in a comment.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b946b9f and ac71200.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (29)
  • apps/webapp/app/env.server.ts (1 hunks)
  • apps/webapp/app/services/worker.server.ts (1 hunks)
  • apps/webapp/app/v3/commonWorker.server.ts (1 hunks)
  • apps/webapp/app/v3/legacyRunEngineWorker.server.ts (1 hunks)
  • apps/webapp/app/v3/marqs/index.server.ts (2 hunks)
  • apps/webapp/app/v3/marqs/v3VisibilityTimeout.server.ts (1 hunks)
  • apps/webapp/app/v3/services/alerts/deliverAlert.server.ts (2 hunks)
  • apps/webapp/app/v3/services/alerts/performDeploymentAlerts.server.ts (2 hunks)
  • apps/webapp/app/v3/services/alerts/performTaskAttemptAlerts.server.ts (0 hunks)
  • apps/webapp/app/v3/services/alerts/performTaskRunAlerts.server.ts (2 hunks)
  • apps/webapp/app/v3/services/createDeployedBackgroundWorker.server.ts (1 hunks)
  • apps/webapp/app/v3/services/deploymentIndexFailed.server.ts (1 hunks)
  • apps/webapp/app/v3/services/failDeployment.server.ts (1 hunks)
  • apps/webapp/app/v3/services/finalizeDeployment.server.ts (1 hunks)
  • apps/webapp/app/v3/services/finalizeTaskRun.server.ts (1 hunks)
  • apps/webapp/app/v3/services/timeoutDeployment.server.ts (1 hunks)
  • apps/webapp/app/v3/taskRunHeartbeatFailed.server.ts (1 hunks)
  • apps/webapp/package.json (1 hunks)
  • apps/webapp/remix.config.js (1 hunks)
  • apps/webapp/tsconfig.json (1 hunks)
  • internal-packages/redis-worker/package.json (2 hunks)
  • internal-packages/redis-worker/src/index.ts (1 hunks)
  • internal-packages/redis-worker/src/queue.test.ts (7 hunks)
  • internal-packages/redis-worker/src/queue.ts (11 hunks)
  • internal-packages/redis-worker/src/telemetry.ts (1 hunks)
  • internal-packages/redis-worker/src/worker.test.ts (5 hunks)
  • internal-packages/redis-worker/src/worker.ts (4 hunks)
  • internal-packages/redis-worker/tsconfig.json (1 hunks)
  • references/v3-catalog/src/trigger/simple.ts (1 hunks)
💤 Files with no reviewable changes (1)
  • apps/webapp/app/v3/services/alerts/performTaskAttemptAlerts.server.ts
✅ Files skipped from review due to trivial changes (1)
  • internal-packages/redis-worker/src/index.ts
⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: units / 🧪 Unit Tests
  • GitHub Check: typecheck / typecheck
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (48)
internal-packages/redis-worker/src/queue.ts (9)

16-24: New type definitions enhance type safety and clarity

The addition of AnyMessageCatalog and QueueItem types improves type safety and readability. This allows for better abstraction and more robust type checking throughout the queue implementation.


26-34: Addition of AnyQueueItem type for flexibility

Introducing the AnyQueueItem type provides a non-generic version of QueueItem, which increases flexibility when dealing with queues that may handle various message types without specified catalogs.


55-55: Ensure consistent key hashing in Redis clusters

Using curly braces in the keyPrefix ({queue:${name}:}) ensures that all Redis keys for this queue are hashed to the same slot in a clustered environment. This is crucial for data consistency and efficient key retrieval.


129-129: Enhanced type safety in dequeue method

Updating the return type of the dequeue method to Promise<Array<QueueItem<TMessageCatalog>>> enhances type safety and clarifies the expected structure of dequeued items.


141-149: Include timestamp in dequeued items for better tracking

By extracting the score from Redis and converting it to a Date, the dequeued items now include a timestamp. This enhancement allows for improved tracking of when items were enqueued or became available.


355-355: Modify Lua script to return score for timestamp inclusion

Updating the dequeueItems Lua script to include score in the returned results ensures that the timestamp can be accurately set for each dequeued item. This change aligns the script with the updated TypeScript logic.


396-397: Accurate timestamp retrieval using Redis server time

Using redis.call('TIME') in the moveToDeadLetterQueue Lua script ensures that the current server time is accurately captured, which is important for setting timestamps consistently across different nodes.


425-427: Consistent timestamp calculation in redriveFromDeadLetterQueue

Similar to the previous change, using redis.call('TIME') here ensures that items redriven from the dead letter queue have accurate timestamps based on the server time.


460-461: Update TypeScript definitions to match updated Lua script

Modifying the dequeueItems method's TypeScript definitions to include the score in the returned array aligns the type definitions with the changes in the Lua script. This ensures type consistency and prevents potential runtime errors.

internal-packages/redis-worker/src/queue.test.ts (5)

50-59: Use toMatchObject for concise and readable assertions

As previously mentioned, replacing toEqual(expect.objectContaining({...})) with toMatchObject({...}) simplifies the assertion.


90-99: Use toMatchObject for concise and readable assertions

As previously mentioned, consider updating the assertion to use toMatchObject for better readability.


141-149: Use toMatchObject for concise and readable assertions

As previously mentioned, updating the assertion to toMatchObject enhances code clarity.


176-184: Use toMatchObject for concise and readable assertions

As earlier suggested, consider using toMatchObject to simplify the test assertion.


193-201: Use toMatchObject for concise and readable assertions

Repeating the suggestion to utilize toMatchObject for a more concise assertion.

apps/webapp/app/v3/services/alerts/performTaskRunAlerts.server.ts (2)

7-7: Update import to reflect new worker implementation

The import statement now brings in commonWorker from ~/v3/commonWorker.server, replacing the previous workerQueue. Ensure that this change is consistently applied throughout the codebase to prevent any discrepancies.


64-70: Confirm correct job registration in commonWorker

Ensure that the job "v3.performTaskRunAlerts" is correctly registered in the commonWorker catalog and that it has the appropriate handler defined. This is crucial for the enqueued job to be processed successfully.

Run the following script to verify job registration:

✅ Verification successful

🏁 Script executed:

#!/bin/bash
# Description: Check if 'v3.performTaskRunAlerts' is registered in 'commonWorker'

ast-grep --pattern $'commonWorker = new RedisWorker({
  $$$
  catalog: {
    $$$
    "v3.performTaskRunAlerts": $_
    $$$
  },
  $$$
})' apps/webapp/app/v3/commonWorker.server.ts

Length of output: 180


I'll generate a new shell script using rg to search for "v3.performTaskRunAlerts" in the file to gather more definitive information about its registration. This will help verify whether the job is indeed registered in the commonWorker catalog.


🏁 Script executed:

#!/bin/bash
# Searching for the registration of 'v3.performTaskRunAlerts' in commonWorker.server.ts using ripgrep
rg -A 5 -B 5 '"v3\.performTaskRunAlerts"' apps/webapp/app/v3/commonWorker.server.ts

Length of output: 720


Job Registration Verified for v3.performTaskRunAlerts

The script output shows that the job is properly registered in the commonWorker catalog with the expected schema and handler. Specifically, the registration includes:

  • A schema validating that runId is a string.
  • A handler that invokes PerformTaskRunAlertsService with the provided runId.
apps/webapp/app/v3/commonWorker.server.ts (2)

11-91: Ensure secure handling of Redis credentials and TLS settings

Sensitive information such as Redis passwords and TLS configurations should be handled securely. Verify that environment variables like COMMON_WORKER_REDIS_PASSWORD are encrypted or stored securely, and consider enforcing TLS connections by default to enhance security.

[security]


82-88: Validate worker startup configuration

The worker starts only if env.WORKER_ENABLED === "true". Ensure that this environment variable is correctly set in all deployment environments to prevent unintended behavior where the worker is not processing jobs.

internal-packages/redis-worker/src/worker.ts (1)

262-329: Ensure accurate calculation in retry logic

In the processItem method, verify that the retry delay calculation correctly respects both the default and catalog-specific retry settings. Edge cases where calculateNextRetryDelay might return undefined or unexpected values should be handled to prevent infinite retries or immediate movement to the dead-letter queue.

Consider adding unit tests to validate calculateNextRetryDelay behavior under various scenarios.

internal-packages/redis-worker/src/telemetry.ts (1)

3-31: LGTM! Well-structured telemetry implementation.

The startSpan function is well-implemented with:

  • Comprehensive error handling for different error types
  • Proper span cleanup in finally block
  • Type-safe implementation using generics
apps/webapp/remix.config.js (1)

14-15: LGTM! Required dependencies added for Redis worker implementation.

The additions of p-limit and yocto-queue are appropriate for implementing concurrency control in the Redis worker.

apps/webapp/app/v3/marqs/v3VisibilityTimeout.server.ts (2)

5-13: LGTM! Clean implementation of Graphile visibility timeout.

The renamed class maintains the same interface while using the new TaskRunHeartbeatFailedService.


15-28: LGTM! Well-structured implementation for legacy run engine worker.

The new class properly:

  • Formats heartbeat job IDs with a prefix
  • Sets correct availability time for heartbeats
  • Handles cleanup through worker acknowledgment
apps/webapp/app/v3/services/failDeployment.server.ts (1)

52-52: LGTM! Simplified service interface.

Removed unnecessary this._prisma parameter as part of the Redis worker refactor.

apps/webapp/app/v3/services/timeoutDeployment.server.ts (1)

42-42: LGTM!

The simplified call to PerformDeploymentAlertsService.enqueue aligns with the broader refactoring of worker services.

apps/webapp/app/v3/services/deploymentIndexFailed.server.ts (1)

69-69: LGTM!

The simplified call to PerformDeploymentAlertsService.enqueue maintains consistency with other services.

apps/webapp/app/v3/services/alerts/performDeploymentAlerts.server.ts (2)

49-59: LGTM!

The simplified alert creation and delivery process maintains data integrity while reducing complexity.

Also applies to: 61-61


64-70: Verify Redis worker configuration.

The implementation using commonWorker looks good, but ensure Redis connection settings are properly configured in the environment.

✅ Verification successful

🏁 Script executed:

#!/bin/bash
# Description: Check for Redis configuration in environment files
rg -A 1 'REDIS_(HOST|PORT|USERNAME|PASSWORD|TLS)'

Length of output: 21950


Action: Redis Worker Configuration is Properly Set Up

The Redis connection settings are correctly defined across the codebase. In particular:

  • The environment configuration in apps/webapp/app/env.server.ts defines all necessary Redis variables (e.g., REDIS_HOST, REDIS_PORT) and maps them to worker-specific settings (e.g., COMMON_WORKER_REDIS_HOST).
  • The configurations in apps/webapp/app/v3/commonWorker.server.ts and related files use these environment variables, including appropriate fallbacks and TLS options.

Please ensure that the production environment provides the proper values, but from the code and configuration standpoint, everything is in order.

apps/webapp/app/v3/legacyRunEngineWorker.server.ts (1)

10-18: Ensure secure Redis connection configuration.

The Redis configuration looks good with TLS support, but consider the following security aspects:

  1. TLS is enabled by default (disabled only when explicitly set)
  2. Connection details are properly sourced from environment variables
✅ Verification successful

🏁 Script executed:

#!/bin/bash
# Description: Check for secure Redis configuration
rg 'REDIS_TLS_DISABLED.*true' 

Length of output: 2025


Secure Redis configuration verified in legacyRunEngineWorker.server.ts.

The TLS setup is handled properly by enabling TLS by default unless the worker-specific environment variable explicitly disables it. This configuration matches our security expectations.

apps/webapp/app/v3/services/finalizeDeployment.server.ts (1)

127-127: LGTM! Simplified alert enqueuing.

The removal of the Prisma transaction parameter aligns with the migration to Redis worker for alerts.

apps/webapp/app/v3/taskRunHeartbeatFailed.server.ts (1)

12-12: LGTM! Class name better reflects its purpose.

The rename from RequeueTaskRunService to TaskRunHeartbeatFailedService improves code clarity.

apps/webapp/app/v3/services/createDeployedBackgroundWorker.server.ts (1)

151-151: LGTM! Consistent with alert processing changes.

The removal of the Prisma transaction parameter aligns with the migration to Redis worker for alerts.

references/v3-catalog/src/trigger/simple.ts (1)

132-134: LGTM! Explicit retry configuration.

Adding explicit retry configuration with maxAttempts: 1 improves error handling clarity.

internal-packages/redis-worker/src/worker.test.ts (2)

37-37: LGTM! Streamlined worker initialization.

The change improves code readability by chaining the .start() call directly to worker initialization.


51-51: Verify if the increased timeout is sufficient.

The timeout has been increased from 600ms to 2000ms. While this should reduce flaky tests, please ensure this value is sufficient across different environments and load conditions.

✅ Verification successful

🏁 Script executed:

#!/bin/bash
# Description: Check for any test retries or timeouts in recent test runs
git log -n 10 --pretty=format:"%H" | while read -r commit; do
  echo "Checking test results for commit: $commit"
  git show $commit:internal-packages/redis-worker/src/worker.test.ts 2>/dev/null | grep -A 5 -B 5 "timeout"
done

Length of output: 1546


The 2‑second delay appears sufficient given the current test configuration.

  • The tests in internal-packages/redis-worker/src/worker.test.ts use a generous per‑test timeout (30,000 ms), so a 2‑second wait is well within the safe range.
  • No evidence was found (or reported manually) of test flakiness or failures due to the new delay value.
apps/webapp/app/v3/services/finalizeTaskRun.server.ts (1)

132-132: LGTM! Simplified alert enqueuing.

The removal of the Prisma instance aligns with the migration to Redis-based workers for alerts processing.

apps/webapp/app/env.server.ts (4)

378-408: LGTM! Consistent Redis configuration pattern.

The Redis configuration for the legacy run engine worker follows the established pattern of falling back to global Redis settings.


410-414: Verify the default concurrency settings for common worker.

The default values are:

  • 2 workers
  • 10 tasks per worker (higher than legacy worker)
  • 1000ms poll interval
  • 50ms immediate poll interval
  • 100 concurrent tasks limit

The higher tasks per worker setting (10 vs 1) suggests different performance characteristics. Please validate these settings under load.

Consider documenting the reasoning behind the different tasks-per-worker settings between legacy and common workers.


416-444: LGTM! Consistent Redis configuration pattern.

The Redis configuration for the common worker follows the same pattern as other workers, ensuring consistency across the codebase.


372-376: Verify the default concurrency settings for legacy run engine worker.

The default values are:

  • 2 workers
  • 1 task per worker
  • 1000ms poll interval
  • 50ms immediate poll interval
  • 100 concurrent tasks limit

Please ensure these values are optimal for your production workload.

apps/webapp/app/services/worker.server.ts (1)

647-647: LGTM! Task handler moved to Redis worker.

The empty handler with the comment clearly indicates that this functionality is now handled by the Redis worker, which aligns with the PR's objective of using Redis workers for run management.

apps/webapp/app/v3/services/alerts/deliverAlert.server.ts (2)

31-31: LGTM: Import of commonWorker.

The import of commonWorker aligns with the broader shift towards using Redis workers for job processing.


1096-1103: LGTM: Simplified enqueue method.

The refactored enqueue method:

  • Removes the transaction parameter, simplifying the interface
  • Uses a consistent ID format with alert:${alertId}
  • Properly handles optional runAt parameter
apps/webapp/app/v3/marqs/index.server.ts (1)

33-33: LGTM: Updated visibility timeout strategy.

The change to use V3LegacyRunEngineWorkerVisibilityTimeout aligns with the broader shift towards Redis workers for job processing and heartbeat management.

Also applies to: 1614-1614

internal-packages/redis-worker/package.json (1)

14-14: LGTM: Added p-limit dependency.

The addition of p-limit is appropriate for managing concurrency in Redis worker job processing.

internal-packages/redis-worker/tsconfig.json (1)

4-4: LGTM: Added DOM.AsyncIterable support.

The addition of DOM.AsyncIterable to the TypeScript compiler options enables proper typing for async iteration in DOM contexts.

apps/webapp/tsconfig.json (1)

36-39: Validate New Path Mappings for Worker Modules.
The new entries for "@internal/redis-worker" and its submodules have been added alongside the existing "@internal/zod-worker" mappings. Please verify that the relative paths ("../../internal-packages/redis-worker/src/index" and "../../internal-packages/redis-worker/src/*") correctly reflect the project’s folder structure and that TypeScript can resolve the modules as expected.

apps/webapp/package.json (1)

53-55: Confirm Consistency of Redis Worker Dependency.
The dependency "@internal/redis-worker": "workspace:*" has been added, which aligns with the updated path mappings in tsconfig.json. Ensure that the workspace version being used is up to date and that its API surface is fully compatible with the consumers in the web application.

internal-packages/redis-worker/src/worker.ts Show resolved Hide resolved
Comment on lines +47 to +51
runHeartbeat: async ({ payload }) => {
const service = new TaskRunHeartbeatFailedService();

await service.call(payload.runId);
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for task failures.

The job handler should include proper error handling and logging.

       runHeartbeat: async ({ payload }) => {
+        try {
           const service = new TaskRunHeartbeatFailedService();
           await service.call(payload.runId);
+        } catch (error) {
+          logger.error("Failed to process run heartbeat", {
+            error,
+            runId: payload.runId,
+          });
+          throw error; // Re-throw to trigger retry
+        }
       },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
runHeartbeat: async ({ payload }) => {
const service = new TaskRunHeartbeatFailedService();
await service.call(payload.runId);
},
runHeartbeat: async ({ payload }) => {
try {
const service = new TaskRunHeartbeatFailedService();
await service.call(payload.runId);
} catch (error) {
logger.error("Failed to process run heartbeat", {
error,
runId: payload.runId,
});
throw error; // Re-throw to trigger retry
}
},

references/v3-catalog/src/trigger/simple.ts Show resolved Hide resolved
Copy link

changeset-bot bot commented Feb 6, 2025

⚠️ No Changeset found

Latest commit: a2ca3ee

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
internal-packages/redis-worker/vitest.config.ts (1)

7-12: LGTM! Sequential test execution is appropriate for Redis operations.

The changes to disable parallel execution and enforce single-threaded testing are well-suited for Redis-based functionality:

  • Prevents potential race conditions in Redis operations
  • Ensures consistent test execution order
  • Helps maintain data consistency during tests

Consider documenting these configuration choices in the README to help other developers understand why parallel execution is disabled for Redis worker tests.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f3cd6cb and b902ab1.

📒 Files selected for processing (1)
  • internal-packages/redis-worker/vitest.config.ts (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (7)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: units / 🧪 Unit Tests
  • GitHub Check: Analyze (javascript-typescript)

@ericallam ericallam merged commit a2c2d92 into main Feb 6, 2025
2 checks passed
@ericallam ericallam deleted the ea-branch-13 branch February 6, 2025 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant