Skip to content

Commit

Permalink
add scenario diagrams
Browse files Browse the repository at this point in the history
  • Loading branch information
cjer committed May 6, 2021
1 parent 86c923e commit 1fbafe9
Show file tree
Hide file tree
Showing 8 changed files with 19 additions and 6 deletions.
25 changes: 19 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Code and models for neural modeling of Hebrew NER. Described in the TACL paper [
- ```python nemo.py run_ner_model token-single example.txt example_output.txt```
* the `morph_hybrid` command runs the end-to-end segmentation and NER pipeline which provided our best performing morpheme-level NER boundaries:
- ```python nemo.py morph_yap morph example.txt example_output_MORPH.txt```
1. For a full list of the available commands please consult the inline documentation at the end of `nemo.py`.
1. For a full list of the available commands please consult the [next section](#models-and-scenarios) and the inline documentation at the end of `nemo.py`.
1. Please use only the regular and not the `*_oov` models (which contain embeddings only for words that appear in the NEMO corpus). Unless you use the model to replicate our results on the Hebrew treebank, always use e.g. `token-multi` and not `token-multi_oov`.


Expand All @@ -50,21 +50,34 @@ Morphemes must be predicted. This is done by performing morphological disambigua
1. **Hybrid pipeline**: MD using our best performing *Hybrid* approach, which uses the output of the `token-multi` model to reduce the MD option space. This is used in `morph_hybrid`, `multi_align_hybrid` and `morph_hybrid_align_tokens`. We will explain these scenarios next.

MD Approach | Commands
:-------------------------:|:---------------------:
--:|:---------------------:
Standard <img src="./docs/standard_diagram.png" alt="Standard MD" width="345" /> | `morph_yap`
Hybrid <img src="./docs/hybrid_diagram.png" alt="Hybrid MD" width="345" /> <br> <img src="./docs/lattice_pruning.png" alt="Hybrid MD" width="345" /> | `morph_hybrid`,<br>`multi_align_hybrid`,<br>`morph_hybrid_align_tokens`

Finally, to get our desired output (tokens/morphemes), we can choose between different scenarios, some involving extra post-processing alignments:
1. To get morpheme-level labels we have two options:
* Run our `morph` NER model on predicted morphemes: Commands: `morph_yap` or `morph_hybrid` (better).
* `token-multi` labels can be aligned with predicted morphemes to get morpheme-level boundaries. Command: `multi_align_hybrid`.
1. To get token-level labels we have three options:

`morph` NER on Predicted Morphemes | Multi Predictions Aligned with Predicted Morpheme
: --:|:---------------------:
<img src="./docs/morph_ner.png" alt="Morph NER on Predicted Morphemes" width="175" /> | <img src="./docs/multi_align_morph.png" alt="Multi Predictions Aligned with Predicted Morpheme" width="345" />
`morph_yap`,`morph_hybrid` | `multi_align_hybrid`

2. To get token-level labels we have three options:
* `run_ner_model` command with `token-single` model.
* `token-multi` labels can be mapped to `token-single` labels to get standard token-lingle output. Command: `multi_to_single`.
* Morpheme-levl output can be aligned back to token-level boundaries. Command: `morph_hybrid_align_tokens` (achieved best token-level results in our experiments).
1. Note: while the `morph_hybrid*` scenarios offer the best performance, they are less efficient since they requires running both `morph` and `token-multi` NER models.
* Morpheme-level output can be aligned back to token-level boundaries. Command: `morph_hybrid_align_tokens` (achieved best token-level results in our experiments).

Run `token-single` | Map `token-multi` to `token-single` | Align `morph` NER with Tokens
: --:|:---------------------:
<img src="./docs/token_single.png" alt="Run token-single" width="175" /> | <img src="./docs/multi_to_single.png" alt="Map token-multi to token-single" width="345" /> | <img src="./docs/morph_align_tokens.png" alt="Align morph NER with Tokens" width="345" />
`run_ner_model token-single` | `multi_to_single` | `morph_hybrid_align_tokens`

* Note: while the `morph_hybrid*` scenarios offer the best performance, they are less efficient since they requires running both `morph` and `token-multi` NER models.



TODO:Table with alignment scenario figures with relevant commands

## Important Notes
1. NCRFpp was great for our experiments on the NEMO corpus (which is given, constant, data), but it holds some caveats for real life scenarios of arbitrary text:
Expand Down
Binary file added docs/morph_align_tokens.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/morph_ner.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/morph_ner_diagram.png
Binary file not shown.
Binary file removed docs/multi_align_diagram.png
Binary file not shown.
Binary file added docs/multi_align_morph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/multi_to_single.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/token_single.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 1fbafe9

Please sign in to comment.