Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xlsx,pdf均会分块失败 #7060

Open
minemine-m opened this issue Mar 21, 2025 · 8 comments
Open

xlsx,pdf均会分块失败 #7060

minemine-m opened this issue Mar 21, 2025 · 8 comments
Labels
files 上传文件/知识库 unconfirm 未被维护者确认的问题

Comments

@minemine-m
Copy link

📦 部署环境

Docker

📦 部署模式

服务端模式(lobe-chat-database 镜像)

📌 软件版本

1.73.0

💻 系统环境

Other Linux

🌐 浏览器

Edge

🐛 问题描述

xlsx,pdf均会分块失败

📷 复现步骤

No response

🚦 期望结果

No response

📝 补充信息

No response

@minemine-m minemine-m added the unconfirm 未被维护者确认的问题 label Mar 21, 2025
@lobehubbot
Copy link
Member

👀 @minemine-m

Thank you for raising an issue. We will investigate into the matter and get back to you as soon as possible.
Please make sure you have given us as much context as possible.
非常感谢您提交 issue。我们会尽快调查此事,并尽快回复您。 请确保您已经提供了尽可能多的背景信息。

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


📦 Deployment environment

Docker

📦 Deployment mode

Server-side mode (lobe-chat-database mirror)

📌 Software version

1.73.0

💻 System environment

Other Linux

🌐 Browser

Edge

🐛 Question description

xlsx, pdf will fail in chunking

📷 Reproduction steps

No response

🚦 Expected results

No response

📝 Supplementary information

No response

@dosubot dosubot bot added the files 上传文件/知识库 label Mar 21, 2025
@soulgod001
Copy link

正常的,不支持xlsx,pdf不记得了,文档里有写要集成Unstructured,不过我到现在还没找到咋集成

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Normal, does not support xlsx, I don't remember pdf, I wrote in the document to integrate Unstructed, but I haven't found how to integrate

@jackywang75
Copy link

感觉文件向量化处理这块,lobechat明显不如cherry studio。
目前两个工具都用,感觉lobechat还有很多的待提高空间。

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


It feels that lobechat is obviously not as good as cherry studio in file vector processing.
Currently both tools are used, and I feel that lobechat still has a lot of room to improve.

@Steve235lab
Copy link
Contributor

PDF必须是做过OCR处理的才能分块成功,你可以试试先用ocrmypdf在本地处理一下再上传分块

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


PDF must be processed by OCR before blocking successfully. You can try to use ocrmypdf to process locally before uploading chunking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
files 上传文件/知识库 unconfirm 未被维护者确认的问题
Projects
None yet
Development

No branches or pull requests

5 participants