Skip to content

Commit 8288290

Browse files
feat: Text check before synchronization (labring#689)
* fix: icon * fix: web selector * fix: web selector * perf: link sync * dev doc * chomd doc * perf: git intro * 466 intro * intro img * add json editor (#5) * team limit * websync limit * json editor * text editor * perf: search test * change cq value type * doc * intro img --------- Co-authored-by: heheer <[email protected]>
1 parent c2abbb5 commit 8288290

File tree

64 files changed

+1789
-1489
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

64 files changed

+1789
-1489
lines changed

.github/imgs/intro1.png

-383 KB
Loading

.github/imgs/intro2.png

-191 KB
Loading

.github/imgs/intro3.png

-323 KB
Loading

.github/imgs/intro4.png

-87.9 KB
Loading

docSite/content/docs/development/intro.md

+6
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,8 @@ git clone [email protected]:<github_username>/FastGPT.git
7171
### 5. 运行
7272

7373
```bash
74+
# 给脚本代码执行权限
75+
chmod -R +x ./scripts/
7476
# 代码根目录下执行,会安装根 package、projects 和 packages 内所有依赖
7577
pnpm i
7678
# 切换到应用目录
@@ -105,6 +107,10 @@ docker build -t dockername/fastgpt:tag --build-arg name=app --build-arg proxy=ta
105107
1. 如果你是连接远程的数据库,先检查对应的端口是否开放。
106108
2. 如果是本地运行的数据库,可尝试`host`改成`localhost``127.0.0.1`
107109

110+
### sh ./scripts/postinstall.sh 没权限
111+
112+
FastGPT 在`pnpm i`后会执行`postinstall`脚本,用于自动生成`ChakraUI``Type`。如果没有权限,可以先执行`chmod -R +x ./scripts/`,再执行`pnpm i`
113+
108114
### 加入社区
109115

110116
遇到困难了吗?有任何问题吗? 加入微信群与开发者和用户保持沟通。

docSite/content/docs/development/qa.md

+16
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,10 @@ OneAPI 中没有配置该模型渠道。
3131

3232
页面中是用 stream=true 模式,所以API也需要设置 stream=true 来进行测试。部分模型接口(国产居多)非 Stream 的兼容有点垃圾。
3333

34+
### Incorrect API key provided: sk-xxxx.You can find your api Key at xxx
35+
36+
OneAPI 的 API Key 配置错误,需要修改`OPENAI_API_KEY`环境变量,并重启容器(先 stop 然后 rm 掉,最后再 up -d 运行一次)。可以`exec`进入容器,`env`查看环境变量是否生效。
37+
3438
## Docker 部署常见问题
3539

3640
### 如何更新?
@@ -87,3 +91,15 @@ PG 数据库没有连接上/初始化失败,可以查看日志。FastGPT 会
8791
mongo连接失败,检查
8892
1. mongo 服务有没有起来(有些 cpu 不支持 AVX,无法用 mongo5,需要换成 mongo4.x,可以dockerhub找个最新的4.x,修改镜像版本,重新运行)
8993
2. 环境变量(账号密码,注意host和port)
94+
95+
## 本地开发问题
96+
97+
### TypeError: Cannot read properties of null (reading 'useMemo' )
98+
99+
用 Node18 试试,可能最新的 Node 有问题。 本地开发流程:
100+
101+
1. 根目录: `pnpm i`
102+
2. 复制 `config.json` -> `config.local.json`
103+
3. 复制 `.env.template` -> `.env.local`
104+
4. `cd projects/app`
105+
5. `pnpm dev`

docSite/content/docs/development/upgrading/466.md

+8-5
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,11 @@ weight: 830
1313

1414
## V4.6.6 更新说明
1515

16-
1. 新增 - 搜索方式:分离向量语义检索,全文检索和重排,通过 RRF 进行排序合并。
17-
2. 优化 - 问题分类提示词,id引导。测试国产商用 api 模型(百度阿里智谱讯飞)使用 Prompt 模式均可分类。
18-
3. UI 优化,未来将逐步替换新的UI设计。
19-
4. 优化代码:Icon 抽离和自动化获取。
20-
5. 查看 [FastGPT 2024 RoadMap](https://github.com/labring/FastGPT?tab=readme-ov-file#-%E5%9C%A8%E7%BA%BF%E4%BD%BF%E7%94%A8)
16+
1. 查看 [FastGPT 2024 RoadMap](https://github.com/labring/FastGPT?tab=readme-ov-file#-%E5%9C%A8%E7%BA%BF%E4%BD%BF%E7%94%A8)
17+
2. 新增 - Http 模块请求头支持 Json 编辑器。
18+
3. 新增 - [ReRank模型部署](/docs/development/custom-models/reranker/)
19+
4. 新增 - 搜索方式:分离向量语义检索,全文检索和重排,通过 RRF 进行排序合并。
20+
5. 优化 - 问题分类提示词,id引导。测试国产商用 api 模型(百度阿里智谱讯飞)使用 Prompt 模式均可分类。
21+
6. UI 优化,未来将逐步替换新的UI设计。
22+
7. 优化代码:Icon 抽离和自动化获取。
23+
8. 修复 - 链接读取的数据集,未保存选择器,导致同步时不使用选择器。

packages/global/common/file/api.d.ts

+1
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,5 @@ export type UrlFetchParams = {
1212
export type UrlFetchResponse = {
1313
url: string;
1414
content: string;
15+
selector?: string;
1516
}[];

packages/global/common/string/tiktoken/index.ts

+1-1
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ export function countPromptTokens(
3535
const text = `${role}\n${prompt}`;
3636
try {
3737
const encodeText = enc.encode(text);
38-
return encodeText.length + 3; // 补充 role 估算值
38+
return encodeText.length + role.length; // 补充 role 估算值
3939
} catch (error) {
4040
return text.length;
4141
}

packages/global/common/system/types/index.d.ts

+2-1
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,8 @@ export type FastGPTFeConfigsType = {
4444
google?: string;
4545
};
4646
limit?: {
47-
exportLimitMinutes?: number;
47+
exportDatasetLimitMinutes?: number;
48+
websiteSyncLimitMinuted?: number;
4849
};
4950
scripts?: { [key: string]: string }[];
5051
favicon?: string;

packages/global/core/dataset/constant.ts

+25-4
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,19 @@ export const DatasetCollectionTrainingTypeMap = {
7373
}
7474
};
7575

76+
export enum DatasetCollectionSyncResultEnum {
77+
sameRaw = 'sameRaw',
78+
success = 'success'
79+
}
80+
export const DatasetCollectionSyncResultMap = {
81+
[DatasetCollectionSyncResultEnum.sameRaw]: {
82+
label: 'core.dataset.collection.sync.result.sameRaw'
83+
},
84+
[DatasetCollectionSyncResultEnum.success]: {
85+
label: 'core.dataset.collection.sync.result.success'
86+
}
87+
};
88+
7689
/* ------------ data -------------- */
7790
export enum DatasetDataIndexTypeEnum {
7891
chunk = 'chunk',
@@ -150,16 +163,24 @@ export enum SearchScoreTypeEnum {
150163
}
151164
export const SearchScoreTypeMap = {
152165
[SearchScoreTypeEnum.embedding]: {
153-
label: 'core.dataset.search.score.embedding'
166+
label: 'core.dataset.search.score.embedding',
167+
desc: 'core.dataset.search.score.embedding desc',
168+
showScore: true
154169
},
155170
[SearchScoreTypeEnum.fullText]: {
156-
label: 'core.dataset.search.score.fullText'
171+
label: 'core.dataset.search.score.fullText',
172+
desc: 'core.dataset.search.score.fullText desc',
173+
showScore: false
157174
},
158175
[SearchScoreTypeEnum.reRank]: {
159-
label: 'core.dataset.search.score.reRank'
176+
label: 'core.dataset.search.score.reRank',
177+
desc: 'core.dataset.search.score.reRank desc',
178+
showScore: true
160179
},
161180
[SearchScoreTypeEnum.rrf]: {
162-
label: 'core.dataset.search.score.rrf'
181+
label: 'core.dataset.search.score.rrf',
182+
desc: 'core.dataset.search.score.rrf desc',
183+
showScore: false
163184
}
164185
};
165186

packages/global/core/dataset/type.d.ts

+4-1
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,10 @@ export type DatasetCollectionSchemaType = {
4949
qaPrompt?: string;
5050
rawTextLength?: number;
5151
hashRawText?: string;
52-
metadata?: Record<string, any>;
52+
metadata?: {
53+
webPageSelector?: string;
54+
[key: string]: any;
55+
};
5356
};
5457

5558
export type DatasetDataIndexItemType = {

packages/global/core/module/node/constant.ts

+3
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,10 @@ export enum FlowNodeInputTypeEnum {
77
slider = 'slider',
88
target = 'target', // data input
99
switch = 'switch',
10+
11+
// editor
1012
textarea = 'textarea',
13+
JSONEditor = 'JSONEditor',
1114

1215
addInputParam = 'addInputParam', // params input
1316

packages/global/core/module/template/system/http.ts

+1-1
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ export const HttpModule: FlowModuleTemplateType = {
5555
},
5656
{
5757
key: ModuleInputKeyEnum.httpHeader,
58-
type: FlowNodeInputTypeEnum.textarea,
58+
type: FlowNodeInputTypeEnum.JSONEditor,
5959
valueType: ModuleIOValueTypeEnum.string,
6060
label: 'core.module.input.label.Http Request Header',
6161
description: 'core.module.input.description.Http Request Header',

packages/global/support/user/team/type.d.ts

+4
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,10 @@ export type TeamSchema = {
1010
balance: number;
1111
maxSize: number;
1212
lastDatasetBillTime: Date;
13+
limit: {
14+
lastExportDatasetTime: Date;
15+
lastWebsiteSyncTime: Date;
16+
};
1317
};
1418

1519
export type TeamMemberSchema = {

packages/global/support/user/type.d.ts

-4
Original file line numberDiff line numberDiff line change
@@ -17,10 +17,6 @@ export type UserModelSchema = {
1717
key: string;
1818
baseUrl: string;
1919
};
20-
limit: {
21-
exportKbTime?: Date;
22-
datasetMaxCount?: number;
23-
};
2420
};
2521

2622
export type UserType = {

packages/service/common/string/cheerio.ts

+16-12
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,8 @@ export const cheerioToHtml = ({
1515
// get origin url
1616
const originUrl = new URL(fetchUrl).origin;
1717

18-
const selectDom = $(selector || 'body');
18+
const usedSelector = selector || 'body';
19+
const selectDom = $(usedSelector);
1920

2021
// remove i element
2122
selectDom.find('i,script').remove();
@@ -49,7 +50,10 @@ export const cheerioToHtml = ({
4950
.get()
5051
.join('\n');
5152

52-
return html;
53+
return {
54+
html,
55+
usedSelector
56+
};
5357
};
5458
export const urlsFetch = async ({
5559
urlList,
@@ -66,25 +70,25 @@ export const urlsFetch = async ({
6670
});
6771

6872
const $ = cheerio.load(fetchRes.data);
69-
70-
const md = await htmlToMarkdown(
71-
cheerioToHtml({
72-
fetchUrl: url,
73-
$,
74-
selector
75-
})
76-
);
73+
const { html, usedSelector } = cheerioToHtml({
74+
fetchUrl: url,
75+
$,
76+
selector
77+
});
78+
const md = await htmlToMarkdown(html);
7779

7880
return {
7981
url,
80-
content: md
82+
content: md,
83+
selector: usedSelector
8184
};
8285
} catch (error) {
8386
console.log(error, 'fetch error');
8487

8588
return {
8689
url,
87-
content: ''
90+
content: '',
91+
selector: ''
8892
};
8993
}
9094
})

packages/service/common/string/markdown.ts

+3
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,9 @@ export const htmlToMarkdown = (html?: string | null) =>
2121
worker.terminate();
2222
reject(err);
2323
});
24+
worker.on('exit', (code) => {
25+
console.log('html 2 md finish', code);
26+
});
2427

2528
worker.postMessage(html);
2629
});

packages/service/core/dataset/collection/controller.ts

+6-4
Original file line numberDiff line numberDiff line change
@@ -19,14 +19,16 @@ export async function createOneCollection({
1919
qaPrompt,
2020
hashRawText,
2121
rawTextLength,
22-
metadata = {}
23-
}: CreateDatasetCollectionParams & { teamId: string; tmbId: string }) {
22+
metadata = {},
23+
...props
24+
}: CreateDatasetCollectionParams & { teamId: string; tmbId: string; [key: string]: any }) {
2425
const { _id } = await MongoDatasetCollection.create({
25-
name,
26+
...props,
2627
teamId,
2728
tmbId,
28-
datasetId,
2929
parentId: parentId || null,
30+
datasetId,
31+
name,
3032
type,
3133
trainingType,
3234
chunkSize,

packages/service/core/dataset/collection/schema.ts

+1
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,7 @@ const DatasetCollectionSchema = new Schema({
7575
qaPrompt: {
7676
type: String
7777
},
78+
7879
rawTextLength: {
7980
type: Number
8081
},

0 commit comments

Comments
 (0)