-
-
Notifications
You must be signed in to change notification settings - Fork 12.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
💄 style: show token generate performance #6959
base: main
Are you sure you want to change the base?
Conversation
@cy948 is attempting to deploy a commit to the LobeChat Desktop Team on Vercel. A member of the Team first needs to authorize it. |
👍 @cy948 Thank you for raising your pull request and contributing to our Community |
f269e48
to
432c724
Compare
432c724
to
bfd9a10
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #6959 +/- ##
========================================
Coverage 97.20% 97.20%
========================================
Files 13 13
Lines 2359 2359
Branches 215 415 +200
========================================
Hits 2293 2293
Misses 66 66
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我感觉这个实现好像不是最理想的?之前想的是应该是最好在 .pipeThrough(createFirstErrorHandleTransformer(bizErrorTypeTransformer, provider))
这些区域就能收掉,有没有可能不干涉到 openai的实现呢?这样一来意味着所有只要用到了标准协议的runtime就都能自动带有 tps和 ttft,而不用再一个个去适配
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
发现即使在这里实现的 ttft
的计算,也是不完全准确的,这里实现的延时包含了: 预检请求
+ 实际请求
。分情况讨论:
- 短
预检请求
+ 长实际请求
: “用户使用API中转” 或 “模型服务商不马上响应” 时,请求的时间接近准确,因为实际请求
占大部分。 预检请求
≈实际请求
: 则实际计算时的延时为实际请求
的2倍。
🫠 要想实现准确的计时,要剔除 OPTION
请求的延时,只计算 POST
请求的延时。接下来该怎么办呢?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果在 .pipeThrough(createFirstErrorHandleTransformer(bizErrorTypeTransformer, provider))
这些区域进行计算的话,那就是完全不准确的,因为这些区域在接收到第一个 chunk
后才被创建。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不过现在这种计时也有合理的地方,因为用户实际上感受到的 ttft
就是 OPTION
+ POST
两个网络请求的延时和 + 客户端处理延时。这种 ttft
的计算方法反映了用户感知到的真实情况。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我觉得不用考虑剔除预检请求,按终态用户体感为准。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
然后就是首token的时间戳生成没法放到 src/libs/agent-runtime/utils/streams/openai.ts
里,因为它在接收到 chunk 的时候才被调用
💻 变更类型 | Change Type
🔀 变更说明 | Description of Change
src/features/Conversation/Extras/Usage/UsageDetail/index.tsx
: 添加 tps, ttft 信息显示src/features/Conversation/Extras/Usage/index.tsx
: 将 extra 信息传递到 UsageDetail 中src/libs/agent-runtime/utils/openaiCompatibleFactory/index.ts
: 将发起请求的时间戳传递到流处理function中src/libs/agent-runtime/utils/streams/openai.ts
: 使流经过 性能计算中间件src/libs/agent-runtime/utils/streams/protocol.ts
: 性能计算中间件实现src/utils/fetch/fetchSSE.ts
: 接收并处理 agentRuntime的性能计算结果src/store/chat/slices/aiChat/actions/generateAIChat.ts
: 将性能计算信息和usage信息合并为metadata回传前端📝 补充信息 | Additional Information