Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: feat: send data by chunk in websocket #3988

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

bytemain
Copy link
Member

@bytemain bytemain commented Sep 3, 2024

Types

  • 🎉 New Features

Background or solution

一次性发送几十M的文件会导致线程卡顿,大文件分 chunk 发送

Changelog

Summary by CodeRabbit

  • 新功能

    • 引入了 LengthFieldBasedFrameDecoder 实例以改进 WebSocket 消息的处理。
    • 新增常量 chunkSize,设定为 8MB,用于数据处理和传输。
  • 功能增强

    • 优化了 WebSocket 连接的数据发送和接收逻辑,支持分块发送大消息。
    • 简化了 WebSocket 连接的处理流程,提高了代码可读性。
  • 修复

    • 改进了资源管理,确保在数据处理后正确释放资源。

Copy link
Contributor

coderabbitai bot commented Sep 3, 2024

Warning

Rate limit exceeded

@bytemain has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 25 minutes and 45 seconds before requesting another review.

How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Commits

Files that changed from the base of the PR and between d3c2cfb and 9d104a8.

Walkthrough

此次更改涉及多个文件,主要集中在 LengthFieldBasedFrameDecoder 类的实现上,修改了数据监听器的管理方式,简化了事件处理机制。此外,新增了 chunkSize 常量,用于数据传输中的分片处理。多个类(如 ReconnectingWebSocketConnectionWSWebSocketConnection)的消息处理逻辑也进行了重构,以提高数据传输的可靠性和效率。

Changes

文件路径 更改摘要
packages/connection/__test__/common/frame-decoder.test.ts 修改了数据包的构建和使用方式,新增 .dump() 方法调用,更新了测试用例以反映新逻辑。
packages/connection/src/common/connection/drivers/frame-decoder.ts 简化了数据监听器管理,从多个监听器改为单个监听器,更新了 onData 方法和 dispose 方法的实现。
packages/connection/src/common/connection/drivers/reconnecting-websocket.ts 引入 LengthFieldBasedFrameDecoder 实例,重构了 sendonMessage 方法以改进消息处理。
packages/connection/src/common/connection/drivers/stream.ts 更新了 send 方法,使用 dumpAndOwn() 处理数据,改进了资源管理。
packages/connection/src/common/connection/drivers/ws-websocket.ts 增强了 WebSocket 消息处理,重构了 sendonMessage 方法以利用解码器。
packages/connection/src/common/constants.ts 新增了 chunkSize 常量,设置为 8MB,用于数据处理中的分片。
packages/connection/src/node/common-channel-handler.ts 简化了 WebSocket 连接的处理,直接传递新实例化的 WSWebSocketConnection 对象。

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

railway-app bot commented Sep 3, 2024

🚅 Previously deployed to Railway in the core project. Environment has been deleted.

@opensumi opensumi bot added the 🎨 feature feature required label Sep 3, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Outside diff range, codebase verification and nitpick comments (7)
packages/connection/src/common/constants.ts (1)

3-6: 代码变更看起来不错!

新增的 chunkSize 常量与 PR 的目标一致,用于实现 WebSocket 连接的分块数据传输。8MB 的大小选择合理。

建议为注释添加英文翻译,以便于国际协作。例如:

 /**
  * 分片大小, 8MB
+ * Chunk size, 8MB
  */
 export const chunkSize = 8 * 1024 * 1024;
packages/connection/src/common/connection/drivers/ws-websocket.ts (1)

11-11: 新增的 decoder 成员看起来不错,但可以考虑添加注释。

新增的 decoder 成员用于处理传入的消息,这是一个很好的改进。

建议为 decoder 成员添加简短的注释,解释其用途和重要性。例如:

// 用于解码和处理传入的 WebSocket 消息
protected decoder = new LengthFieldBasedFrameDecoder();
packages/connection/src/common/connection/drivers/reconnecting-websocket.ts (2)

20-30: 更新构造函数和 send 方法

构造函数中新增的事件监听器和 send 方法的重写都很好地实现了分块数据传输的目标。这些更改与 PR 的目标一致,有助于提高大文件传输的性能。

然而,我建议在 send 方法中添加一个注释,解释为什么要使用分块传输,以及 chunkSize 的值是多少。这将有助于其他开发者理解这个实现的目的。

建议在 send 方法开始处添加如下注释:

/**
 * 发送数据,使用分块传输以避免大文件传输时的线程阻塞。
 * 每个块的大小为 ${chunkSize} 字节。
 */

87-103: 新增 dataHandler 方法和更新 dispose 方法

新增的 dataHandler 方法很好地处理了不同类型的传入数据,包括 Blob、ArrayBuffer 和 Buffer。这种实现提高了代码的健壮性。dispose 方法的更新确保了正确的资源清理。

然而,我建议在 dataHandler 方法中添加错误处理,以防在数据处理过程中出现异常。

建议在 dataHandler 方法中添加错误处理:

private dataHandler = (e: MessageEvent) => {
  // ... 现有代码 ...
  buffer.then((v) => this.decoder.push(new Uint8Array(v, 0, v.byteLength)))
    .catch((error) => {
      console.error('处理传入消息时出错:', error);
      // 可以在这里添加额外的错误处理逻辑
    });
};
packages/connection/src/common/connection/drivers/frame-decoder.ts (2)

67-69: 数据发送逻辑的更新

这个更改与新的监听器管理方法保持一致,直接调用 _onDataListener 而不是发出事件。

建议添加一个空值检查,以增加代码的健壮性:

if (this._onDataListener && typeof this._onDataListener === 'function') {
  this._onDataListener(binary);
}

这样可以避免在 _onDataListener 不是函数时可能出现的运行时错误。


166-166: dispose 方法的更新

这个更改与新的监听器管理方法保持一致,但是使用 undefined 而不是 nullonData 方法中的做法不一致。

为了保持一致性,建议将这行改为:

this._onDataListener = null;

这样可以确保在整个类中使用相同的方式来表示监听器被移除。

packages/connection/__test__/common/frame-decoder.test.ts (1)

Line range hint 38-121: 总结:数据包构造方法的系统性更新

整个文件中,所有数据包构造的地方都一致地添加了 .dump() 方法的调用。这种系统性的更改可能反映了 LengthFieldBasedFrameDecoder 类实现的变化。

建议:

  1. 确保这些更改与 LengthFieldBasedFrameDecoder 类的最新实现保持同步。
  2. 考虑在测试文件的开头添加一个注释,解释为什么需要使用 .dump() 方法,以及它对测试结果的影响。
  3. 如果 .dump() 方法改变了数据包的结构或内容,可能需要更新其他依赖于这些测试的部分。

Comment on lines +21 to +27
const handle = LengthFieldBasedFrameDecoder.construct(data).dumpAndOwn();
const packet = handle.get();
for (let i = 0; i < packet.byteLength; i += chunkSize) {
this.socket.send(packet.subarray(i, i + chunkSize));
}

handle.dispose();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

send 方法的改进很好,但可以考虑进一步优化。

新的 send 方法使用 LengthFieldBasedFrameDecoder 构造数据包并分块发送,这是一个很好的改进,可以更好地处理大型消息。

为了进一步优化性能,您可以考虑以下建议:

  1. 使用 ArrayBufferSharedArrayBuffer 来减少内存复制。
  2. 考虑使用 WebSocket.bufferedAmount 来控制发送速率,避免缓冲区溢出。

示例实现:

send(data: Uint8Array): void {
  const handle = LengthFieldBasedFrameDecoder.construct(data).dumpAndOwn();
  const packet = handle.get();
  
  const sendChunk = (start: number) => {
    while (start < packet.byteLength && this.socket.bufferedAmount < 1024 * 1024) { // 1MB buffer threshold
      const end = Math.min(start + chunkSize, packet.byteLength);
      this.socket.send(packet.subarray(start, end));
      start = end;
    }
    if (start < packet.byteLength) {
      setTimeout(() => sendChunk(start), 0);
    } else {
      handle.dispose();
    }
  };

  sendChunk(0);
}

这个实现使用了递归的方式来控制发送速率,避免一次性将所有数据推入缓冲区。

Comment on lines +17 to +25
private _onDataListener: MaybeNull<(data: Uint8Array) => void>;
onData(listener: (data: Uint8Array) => void) {
this._onDataListener = listener;
return {
dispose: () => {
this._onDataListener = null;
},
};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

监听器管理方式的重大变更

这个变更将之前基于 Emitter 的多监听器方法改为了单一监听器方法。虽然这简化了事件处理机制,但也限制了类只能同时处理一个监听器。

考虑以下几点:

  1. 这种改变可能会影响依赖多个监听器的现有代码。
  2. 单一监听器模式可能会在某些使用场景下造成限制。

建议考虑以下改进:

  1. 如果确实需要多个监听器,可以考虑使用数组来存储多个监听器函数。
  2. 添加清晰的文档注释,说明这个类现在只支持单一监听器,以防止误用。
  3. 考虑添加一个 removeListener 方法,使 API 更加完整和直观。
private _onDataListeners: Array<(data: Uint8Array) => void> = [];

onData(listener: (data: Uint8Array) => void) {
  this._onDataListeners.push(listener);
  return {
    dispose: () => {
      const index = this._onDataListeners.indexOf(listener);
      if (index > -1) {
        this._onDataListeners.splice(index, 1);
      }
    },
  };
}

removeListener(listener: (data: Uint8Array) => void) {
  const index = this._onDataListeners.indexOf(listener);
  if (index > -1) {
    this._onDataListeners.splice(index, 1);
  }
}

这样的实现既保持了简单性,又提供了更大的灵活性。

@bytemain bytemain changed the title feat: send data by chunk in websocket WIP: feat: send data by chunk in websocket Sep 3, 2024
@bytemain bytemain closed this Sep 13, 2024
@Aaaaash Aaaaash reopened this Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🎨 feature feature required
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants