Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

💄 style: show token generate performance #6959

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

cy948
Copy link
Contributor

@cy948 cy948 commented Mar 14, 2025

💻 变更类型 | Change Type

  • ✨ feat
  • 🐛 fix
  • ♻️ refactor
  • 💄 style
  • 👷 build
  • ⚡️ perf
  • 📝 docs
  • 🔨 chore

🔀 变更说明 | Description of Change

  • src/features/Conversation/Extras/Usage/UsageDetail/index.tsx: 添加 tps, ttft 信息显示
  • src/features/Conversation/Extras/Usage/index.tsx: 将 extra 信息传递到 UsageDetail 中
  • src/libs/agent-runtime/utils/openaiCompatibleFactory/index.ts: 将发起请求的时间戳传递到流处理function中
  • src/libs/agent-runtime/utils/streams/openai.ts: 使流经过 性能计算中间件
  • src/libs/agent-runtime/utils/streams/protocol.ts: 性能计算中间件实现
  • src/utils/fetch/fetchSSE.ts: 接收并处理 agentRuntime的性能计算结果
  • src/store/chat/slices/aiChat/actions/generateAIChat.ts: 将性能计算信息和usage信息合并为metadata回传前端

📝 补充信息 | Additional Information

image

Copy link

vercel bot commented Mar 14, 2025

@cy948 is attempting to deploy a commit to the LobeChat Desktop Team on Vercel.

A member of the Team first needs to authorize it.

@lobehubbot
Copy link
Member

👍 @cy948

Thank you for raising your pull request and contributing to our Community
Please make sure you have followed our contributing guidelines. We will review it as soon as possible.
If you encounter any problems, please feel free to connect with us.
非常感谢您提出拉取请求并为我们的社区做出贡献,请确保您已经遵循了我们的贡献指南,我们会尽快审查它。
如果您遇到任何问题,请随时与我们联系。

@cy948 cy948 force-pushed the style/output-speed branch from f269e48 to 432c724 Compare March 15, 2025 02:00
@cy948 cy948 force-pushed the style/output-speed branch from 432c724 to bfd9a10 Compare March 15, 2025 02:39
Copy link

codecov bot commented Mar 15, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.20%. Comparing base (30a23ec) to head (ef2f0e0).

Additional details and impacted files
@@           Coverage Diff            @@
##             main    #6959    +/-   ##
========================================
  Coverage   97.20%   97.20%            
========================================
  Files          13       13            
  Lines        2359     2359            
  Branches      215      415   +200     
========================================
  Hits         2293     2293            
  Misses         66       66            
Flag Coverage Δ
server 97.20% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我感觉这个实现好像不是最理想的?之前想的是应该是最好在 .pipeThrough(createFirstErrorHandleTransformer(bizErrorTypeTransformer, provider)) 这些区域就能收掉,有没有可能不干涉到 openai的实现呢?这样一来意味着所有只要用到了标准协议的runtime就都能自动带有 tps和 ttft,而不用再一个个去适配

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

发现即使在这里实现的 ttft 的计算,也是不完全准确的,这里实现的延时包含了: 预检请求 + 实际请求 。分情况讨论:

  1. 预检请求 + 长 实际请求 : “用户使用API中转” 或 “模型服务商不马上响应” 时,请求的时间接近准确,因为实际请求占大部分。
  2. 预检请求实际请求 : 则实际计算时的延时为实际请求的2倍。

🫠 要想实现准确的计时,要剔除 OPTION 请求的延时,只计算 POST 请求的延时。接下来该怎么办呢?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果在 .pipeThrough(createFirstErrorHandleTransformer(bizErrorTypeTransformer, provider)) 这些区域进行计算的话,那就是完全不准确的,因为这些区域在接收到第一个 chunk 后才被创建。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不过现在这种计时也有合理的地方,因为用户实际上感受到的 ttft 就是 OPTION + POST 两个网络请求的延时和 + 客户端处理延时。这种 ttft 的计算方法反映了用户感知到的真实情况。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我觉得不用考虑剔除预检请求,按终态用户体感为准。

Copy link
Contributor Author

@cy948 cy948 Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

然后就是首token的时间戳生成没法放到 src/libs/agent-runtime/utils/streams/openai.ts 里,因为它在接收到 chunk 的时候才被调用

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants