add custom classes python notebook (#144)

MikeSmithEU · web-flow · commit e8245ad0c02b · 2020-10-19T22:10:00.000+02:00
diff --git a/.gitignore b/.gitignore
@@ -3,6 +3,7 @@ docs/build/
 docs/modules/*.rst
 docs/api-methods.rst
 docs/node_modules/
+docs/tutorial/.ipynb_checkpoints/
 
 # is auto generated:
 src/benchmarkstt/__meta__.py
diff --git a/docs/tutorial/03 - Custom Classes.ipynb b/docs/tutorial/03 - Custom Classes.ipynb
@@ -0,0 +1,265 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Custom Classes\n",
+    "\n",
+    "Some rules might not fit in the existing classes, or it might be preferred to implement your own.\n",
+    "\n",
+    "This is fairly simple and will be explained in this notebook."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Normalization\n",
+    "\n",
+    "### Structure"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "For a class to be considered a \"normalization\" class, all it needs to do is provide a `normalize` method with the following signature:\n",
+    "\n",
+    "```python\n",
+    "def normalize(self, text: str) -> str\n",
+    "```\n",
+    "\n",
+    "E.g."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class MyCustomNormalizer:\n",
+    "    def normalize(self, text):\n",
+    "        return text.strip().lower().replace('apples', 'oranges')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This can be used without any need for `benchmarkstt`. E.g."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "comparing oranges to oranges\n"
+     ]
+    }
+   ],
+   "source": [
+    "normalizer = MyCustomNormalizer()\n",
+    "\n",
+    "print(normalizer.normalize(\"Comparing apples to oranges\"))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Usage"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The normalizer class can be used directly with e.g. input classes."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Color key: Unchanged \u001b[31mReference\u001b[0m \u001b[32mHypothesis\u001b[0m\n",
+      "\n",
+      "·comparing·oranges·to\u001b[31m·oranges\u001b[0m\u001b[32m·pears\u001b[0m\n"
+     ]
+    }
+   ],
+   "source": [
+    "from benchmarkstt.metrics.core import WordDiffs\n",
+    "from benchmarkstt.input.core import PlainText\n",
+    "word_diffs = WordDiffs('ansi')\n",
+    "\n",
+    "plaintext_1 = PlainText(\"Comparing apples to ORANGES\", normalizer)\n",
+    "plaintext_2 = PlainText(\"COMPARING apples to pears\", normalizer)\n",
+    "\n",
+    "print(word_diffs.compare(plaintext_1, plaintext_2))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Metrics\n",
+    "\n",
+    "### Structure\n",
+    "\n",
+    "For a class to be considered a \"metrics\" class, all it needs to do is provide a compare method with the following signature:\n",
+    "\n",
+    "```python\n",
+    "def compare(self, ref: benchmarkstt.schema.Schema, hyp: benchmarkstt.schema.Schema) -> Any\n",
+    "```\n",
+    "\n",
+    "(`benchmarkstt.schema.Schema` should be treated as an iterable)\n",
+    "\n",
+    "E.g."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class IsTheSame:\n",
+    "    def compare(self, ref, hyp):\n",
+    "        return ref == hyp"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "or"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class FirstDifference:\n",
+    "    def compare(self, ref, hyp):\n",
+    "        n = 0\n",
+    "        ihyp = iter(hyp)\n",
+    "        for n, ref_n in enumerate(ref):\n",
+    "            hyp_n = next(ihyp, None)\n",
+    "            if hyp_n != ref_n:\n",
+    "                return (n, ref_n, hyp_n)\n",
+    "        \n",
+    "        hyp_n = next(ihyp, None)\n",
+    "        if hyp_n is None:\n",
+    "            return False\n",
+    "        return (n+1, None, hyp_n)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This can be used and tested directly without any need for `benchmarkstt`. E.g."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "IsTheSame\n",
+      "False\n",
+      "\n",
+      "FirstDifference\n",
+      "(1, 'apples', 'oranges')\n"
+     ]
+    }
+   ],
+   "source": [
+    "is_the_same = IsTheSame()\n",
+    "a = iter(\"comparing apples to oranges\".split())\n",
+    "b = iter(\"comparing oranges to pears\".split())\n",
+    "\n",
+    "print(\"IsTheSame\")\n",
+    "print(is_the_same.compare(a, b))\n",
+    "\n",
+    "first_difference = FirstDifference()\n",
+    "print(\"\\nFirstDifference\")\n",
+    "print(first_difference.compare(a, b))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Usage"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "IsTheSame\n",
+      "False\n",
+      "\n",
+      "FirstDifference\n",
+      "(3, Item({\"item\": \"oranges\", \"type\": \"word\", \"@raw\": \"oranges\"}), Item({\"item\": \"pears\", \"type\": \"word\", \"@raw\": \"pears\"}))\n"
+     ]
+    }
+   ],
+   "source": [
+    "plaintext_1 = PlainText(\"Comparing apples to ORANGES\", normalizer)\n",
+    "plaintext_2 = PlainText(\"COMPARING apples to pears\", normalizer)\n",
+    "\n",
+    "print(\"IsTheSame\")\n",
+    "print(is_the_same.compare(plaintext_1, plaintext_2))\n",
+    "\n",
+    "print(\"\\nFirstDifference\")\n",
+    "print(first_difference.compare(plaintext_1, plaintext_2))"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}