Skip to content

Commit 623925e

Browse files
author
rui.tao
committed
添加模型和数据信息
1 parent 05c957b commit 623925e

36 files changed

+199064
-19
lines changed

README.md

+11-19
Original file line numberDiff line numberDiff line change
@@ -20,35 +20,27 @@ Best acc on val set: 0.986000
2020
100%|██████████████████████████████████████████████████████████████████████████████████| 16/16 [00:06<00:00, 2.64it/s, acc=0.986]
2121
Accuracy on test set: 0.986
2222
## 测试结果
23-
>>> model.infer({'text': '场照片事后将发给媒体,避免采访时出现混乱,[3]举行婚礼侯佩岑黄伯俊婚纱照2011年4月17日下午2点,70名亲友见 证下,侯佩', 'h': {'pos': (28, 30)}, 't': {'pos': (31, 33)}})
23+
model.infer({'text': '场照片事后将发给媒体,避免采访时出现混乱,[3]举行婚礼侯佩岑黄伯俊婚纱照2011年4月17日下午2点,70名亲友见 证下,侯佩', 'h': {'pos': (28, 30)}, 't': {'pos': (31, 33)}})
2424
('夫妻', 0.9995878338813782)
25-
>>>
26-
>>> model.infer({'text': '及他们的女儿小苹果与汪峰感情纠葛2004年,葛荟婕在欧洲杯期间录制节目时与汪峰相识并相恋,汪峰那首《我如此爱你', 'h': {'pos': (10, 11)}, 't': {'pos': (22, 24)}})
25+
26+
model.infer({'text': '及他们的女儿小苹果与汪峰感情纠葛2004年,葛荟婕在欧洲杯期间录制节目时与汪峰相识并相恋,汪峰那首《我如此爱你', 'h': {'pos': (10, 11)}, 't': {'pos': (22, 24)}})
2727
('情侣', 0.9992896318435669)
28-
>>>
29-
>>> model.infer({'text': '14日,彭加木的侄女彭丹凝打通了彭加木儿子彭海的电话,“堂哥已经知道了,他说这些年传得太多,他不相信是真的', 'h': {'pos': (4, 6)}, 't': {'pos': (22, 21)}})
28+
29+
model.infer({'text': '14日,彭加木的侄女彭丹凝打通了彭加木儿子彭海的电话,“堂哥已经知道了,他说这些年传得太多,他不相信是真的', 'h': {'pos': (4, 6)}, 't': {'pos': (22, 21)}})
3030
('父母', 0.8954808712005615)
31-
>>>
32-
>>> model.infer({'text': '名旦吴菱仙是位列“同治十三绝”的名旦时小福的弟子,算得梅兰芳的开蒙老师,早年曾搭过梅巧玲的四喜班,旧谊', 'h': {'pos': (2, 4)}, 't': {'pos': (27, 29)}})
33-
('师生', 0.996309220790863)
3431

35-
# 使用前准备
36-
1.bert模型下载:在./pretrain/下面放置chinese_wwm_pytorch模型,下载地址:https://github.com/ymcui/Chinese-BERT-wwm
32+
model.infer({'text': '名旦吴菱仙是位列“同治十三绝”的名旦时小福的弟子,算得梅兰芳的开蒙老师,早年曾搭过梅巧玲的四喜班,旧谊', 'h': {'pos': (2, 4)}, 't': {'pos': (27, 29)}})
33+
('师生', 0.996309220790863)
34+
# 使用方式
35+
1.bert模型下载:在./pretrain/下面放置chinese_wwm_pytorch模型
3736
2.数据下载:在./benchmark/people-relation/下执行gen.py,生产中文人物关系数据,具体脚本中有说明。
3837
3.配置环境变量:vim ~/.bash_profile 添加
3938
# openNRE
4039
export openNRE=项目位置
41-
42-
43-
# 注意事项
44-
如果自己训练了tensorflow 的bert,可以通过https://github.com/huggingface/transformers 里面的convert_bert_original_tf_checkpoint_to_pytorch.py 脚本转换为pytorch版。
45-
踩坑:
46-
1.安装tensorflow 2.0,最终用的都是PyTorch模型,但TensorFlow也得安装
47-
2.构造checkpoint文件
48-
3.报错:Embedding' object has no attribute 'shape' ,解决:将报错位置assert那几行直接删除
49-
5040

5141

42+
-----
43+
以下是原工程内容
5244
# OpenNRE
5345

5446
OpenNRE is an open-source and extensible toolkit that provides a unified framework to implement relation extraction models. This package is designed for the following groups:

benchmark/people-relation/gen.py

+52
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# -*- coding:utf-8 -*-
2+
# Copyright @rui.tao
3+
import codecs
4+
import sys
5+
import pandas as pd
6+
import numpy as np
7+
from collections import deque
8+
import pdb
9+
import json
10+
11+
12+
def get_pose(text, E_name):
13+
result = []
14+
pose_1 = text.find(E_name)
15+
pose_2 = pose_1+len(E_name)-1
16+
result.append(pose_1)
17+
result.append(pose_2)
18+
return result
19+
20+
# 处理后数据,处理完了再分训练集和验证集
21+
f = open('people-relation_all.txt', "w+")
22+
23+
# 原始数据,数据来源:https://github.com/buppt/ChineseNRE
24+
people_relation = ""
25+
26+
with codecs.open(people_relation,'r','utf-8') as tfc:
27+
for lines in tfc:
28+
line = lines.split()
29+
data = {}
30+
print(line)
31+
E1, E2, relation, text = line
32+
data["token"] = list(text)
33+
h = {}
34+
h["name"] = E1
35+
h["pos"] = get_pose(text, E1)
36+
data["h"] = h
37+
t = {}
38+
t["name"] = E2
39+
t["pos"] = get_pose(text, E2)
40+
data["t"] = t
41+
data["relation"] = relation
42+
json_data = json.dumps(data, ensure_ascii=False)
43+
f.write(json_data)
44+
f.write("\n")
45+
#print(json_data)
46+
#import sys
47+
#sys.exit()
48+
49+
50+
51+
52+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
{"父母":0, "夫妻":1, "师生":2, "兄弟姐妹":3, "合作":4, "情侣":5, "祖孙":6 ,"好友":7, "亲戚":8, "同门":9, "上下级":10, "unknown":11}

0 commit comments

Comments
 (0)