Skip to content

Commit e8245ad

Browse files
authored
add custom classes python notebook (#144)
1 parent 403dbdb commit e8245ad

File tree

2 files changed

+266
-0
lines changed

2 files changed

+266
-0
lines changed

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ docs/build/
33
docs/modules/*.rst
44
docs/api-methods.rst
55
docs/node_modules/
6+
docs/tutorial/.ipynb_checkpoints/
67

78
# is auto generated:
89
src/benchmarkstt/__meta__.py
+265
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,265 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Custom Classes\n",
8+
"\n",
9+
"Some rules might not fit in the existing classes, or it might be preferred to implement your own.\n",
10+
"\n",
11+
"This is fairly simple and will be explained in this notebook."
12+
]
13+
},
14+
{
15+
"cell_type": "markdown",
16+
"metadata": {},
17+
"source": [
18+
"## Normalization\n",
19+
"\n",
20+
"### Structure"
21+
]
22+
},
23+
{
24+
"cell_type": "markdown",
25+
"metadata": {},
26+
"source": [
27+
"For a class to be considered a \"normalization\" class, all it needs to do is provide a `normalize` method with the following signature:\n",
28+
"\n",
29+
"```python\n",
30+
"def normalize(self, text: str) -> str\n",
31+
"```\n",
32+
"\n",
33+
"E.g."
34+
]
35+
},
36+
{
37+
"cell_type": "code",
38+
"execution_count": 1,
39+
"metadata": {},
40+
"outputs": [],
41+
"source": [
42+
"class MyCustomNormalizer:\n",
43+
" def normalize(self, text):\n",
44+
" return text.strip().lower().replace('apples', 'oranges')"
45+
]
46+
},
47+
{
48+
"cell_type": "markdown",
49+
"metadata": {},
50+
"source": [
51+
"This can be used without any need for `benchmarkstt`. E.g."
52+
]
53+
},
54+
{
55+
"cell_type": "code",
56+
"execution_count": 2,
57+
"metadata": {},
58+
"outputs": [
59+
{
60+
"name": "stdout",
61+
"output_type": "stream",
62+
"text": [
63+
"comparing oranges to oranges\n"
64+
]
65+
}
66+
],
67+
"source": [
68+
"normalizer = MyCustomNormalizer()\n",
69+
"\n",
70+
"print(normalizer.normalize(\"Comparing apples to oranges\"))"
71+
]
72+
},
73+
{
74+
"cell_type": "markdown",
75+
"metadata": {},
76+
"source": [
77+
"### Usage"
78+
]
79+
},
80+
{
81+
"cell_type": "markdown",
82+
"metadata": {},
83+
"source": [
84+
"The normalizer class can be used directly with e.g. input classes."
85+
]
86+
},
87+
{
88+
"cell_type": "code",
89+
"execution_count": 3,
90+
"metadata": {},
91+
"outputs": [
92+
{
93+
"name": "stdout",
94+
"output_type": "stream",
95+
"text": [
96+
"Color key: Unchanged \u001b[31mReference\u001b[0m \u001b[32mHypothesis\u001b[0m\n",
97+
"\n",
98+
"·comparing·oranges·to\u001b[31m·oranges\u001b[0m\u001b[32m·pears\u001b[0m\n"
99+
]
100+
}
101+
],
102+
"source": [
103+
"from benchmarkstt.metrics.core import WordDiffs\n",
104+
"from benchmarkstt.input.core import PlainText\n",
105+
"word_diffs = WordDiffs('ansi')\n",
106+
"\n",
107+
"plaintext_1 = PlainText(\"Comparing apples to ORANGES\", normalizer)\n",
108+
"plaintext_2 = PlainText(\"COMPARING apples to pears\", normalizer)\n",
109+
"\n",
110+
"print(word_diffs.compare(plaintext_1, plaintext_2))"
111+
]
112+
},
113+
{
114+
"cell_type": "markdown",
115+
"metadata": {},
116+
"source": [
117+
"## Metrics\n",
118+
"\n",
119+
"### Structure\n",
120+
"\n",
121+
"For a class to be considered a \"metrics\" class, all it needs to do is provide a compare method with the following signature:\n",
122+
"\n",
123+
"```python\n",
124+
"def compare(self, ref: benchmarkstt.schema.Schema, hyp: benchmarkstt.schema.Schema) -> Any\n",
125+
"```\n",
126+
"\n",
127+
"(`benchmarkstt.schema.Schema` should be treated as an iterable)\n",
128+
"\n",
129+
"E.g."
130+
]
131+
},
132+
{
133+
"cell_type": "code",
134+
"execution_count": 4,
135+
"metadata": {},
136+
"outputs": [],
137+
"source": [
138+
"class IsTheSame:\n",
139+
" def compare(self, ref, hyp):\n",
140+
" return ref == hyp"
141+
]
142+
},
143+
{
144+
"cell_type": "markdown",
145+
"metadata": {},
146+
"source": [
147+
"or"
148+
]
149+
},
150+
{
151+
"cell_type": "code",
152+
"execution_count": 5,
153+
"metadata": {},
154+
"outputs": [],
155+
"source": [
156+
"class FirstDifference:\n",
157+
" def compare(self, ref, hyp):\n",
158+
" n = 0\n",
159+
" ihyp = iter(hyp)\n",
160+
" for n, ref_n in enumerate(ref):\n",
161+
" hyp_n = next(ihyp, None)\n",
162+
" if hyp_n != ref_n:\n",
163+
" return (n, ref_n, hyp_n)\n",
164+
" \n",
165+
" hyp_n = next(ihyp, None)\n",
166+
" if hyp_n is None:\n",
167+
" return False\n",
168+
" return (n+1, None, hyp_n)"
169+
]
170+
},
171+
{
172+
"cell_type": "markdown",
173+
"metadata": {},
174+
"source": [
175+
"This can be used and tested directly without any need for `benchmarkstt`. E.g."
176+
]
177+
},
178+
{
179+
"cell_type": "code",
180+
"execution_count": 6,
181+
"metadata": {},
182+
"outputs": [
183+
{
184+
"name": "stdout",
185+
"output_type": "stream",
186+
"text": [
187+
"IsTheSame\n",
188+
"False\n",
189+
"\n",
190+
"FirstDifference\n",
191+
"(1, 'apples', 'oranges')\n"
192+
]
193+
}
194+
],
195+
"source": [
196+
"is_the_same = IsTheSame()\n",
197+
"a = iter(\"comparing apples to oranges\".split())\n",
198+
"b = iter(\"comparing oranges to pears\".split())\n",
199+
"\n",
200+
"print(\"IsTheSame\")\n",
201+
"print(is_the_same.compare(a, b))\n",
202+
"\n",
203+
"first_difference = FirstDifference()\n",
204+
"print(\"\\nFirstDifference\")\n",
205+
"print(first_difference.compare(a, b))"
206+
]
207+
},
208+
{
209+
"cell_type": "markdown",
210+
"metadata": {},
211+
"source": [
212+
"### Usage"
213+
]
214+
},
215+
{
216+
"cell_type": "code",
217+
"execution_count": 7,
218+
"metadata": {},
219+
"outputs": [
220+
{
221+
"name": "stdout",
222+
"output_type": "stream",
223+
"text": [
224+
"IsTheSame\n",
225+
"False\n",
226+
"\n",
227+
"FirstDifference\n",
228+
"(3, Item({\"item\": \"oranges\", \"type\": \"word\", \"@raw\": \"oranges\"}), Item({\"item\": \"pears\", \"type\": \"word\", \"@raw\": \"pears\"}))\n"
229+
]
230+
}
231+
],
232+
"source": [
233+
"plaintext_1 = PlainText(\"Comparing apples to ORANGES\", normalizer)\n",
234+
"plaintext_2 = PlainText(\"COMPARING apples to pears\", normalizer)\n",
235+
"\n",
236+
"print(\"IsTheSame\")\n",
237+
"print(is_the_same.compare(plaintext_1, plaintext_2))\n",
238+
"\n",
239+
"print(\"\\nFirstDifference\")\n",
240+
"print(first_difference.compare(plaintext_1, plaintext_2))"
241+
]
242+
}
243+
],
244+
"metadata": {
245+
"kernelspec": {
246+
"display_name": "Python 3",
247+
"language": "python",
248+
"name": "python3"
249+
},
250+
"language_info": {
251+
"codemirror_mode": {
252+
"name": "ipython",
253+
"version": 3
254+
},
255+
"file_extension": ".py",
256+
"mimetype": "text/x-python",
257+
"name": "python",
258+
"nbconvert_exporter": "python",
259+
"pygments_lexer": "ipython3",
260+
"version": "3.8.5"
261+
}
262+
},
263+
"nbformat": 4,
264+
"nbformat_minor": 4
265+
}

0 commit comments

Comments
 (0)