Skip to content

CogStack/MedCAT2

Repository files navigation

Medical oncept Annotation Tool (version 2)

Build Status Documentation Status Latest release

MedCAT can be used to extract information from Electronic Health Records (EHRs) and link it to biomedical ontologies like SNOMED-CT, UMLS, or HPO (and potentially other ontologies). Original paper for v1 on arXiv.

Discussion Forum discourse

Available Models

As MedCAT v2 is still in Beta, we do not currently have any models publically available. You can still use models for v1, however (see the README there). If you wish you can also convert the v1 models into the v2 format (see tutorial (TODO + link)).

News

Installation

Currently MedCAT v2 is in Beta. As such, we're not yet pushing to PyPI. And because of that the current installation command for (only) core MedCAT v2 is:

pip install "install git+https://github.com/CogStack/[email protected]#egg=medcat2"

Do note that this installs only the core MedCAT v2. It does not necessary dependencies for spacy-based tokenizing or MetaCATs or DeID. However, all of those are supported as well. You can install them as follows:

pip install "git+https://[email protected]/CogStack/[email protected]#egg=medcat2[spacy]"  # for spacy-based tokenizer
pip install "git+https://[email protected]/CogStack/[email protected]#egg=medcat2[meta_cat]"  # for MetaCAT
pip install "git+https://[email protected]/CogStack/[email protected]#egg=medcat2[deid]"  # for DeID models
pip install "git+https://[email protected]/CogStack/[email protected]#egg=medcat2[spacy,meta_cat,deid,dict_ner]"  # for all of the sbove

PS: For in the above example, we're installing the MedCAT v2 BETA version of v0.1.5. The README is unlikely to change after every new release. If another version is available / required, substitute the version tag as appropriate.

Demo

Demo for v2 is upcoming

Tutorials

A guide on how to use MedCAT v2 is available at MedCATv2 Tutorials. However, the tutorials are a bit of a work in progress at this point in time.

Acknowledgements

Entity extraction was trained on MedMentions In total it has ~ 35K entites from UMLS

The vocabulary was compiled from Wiktionary In total ~ 800K unique words

Powered By

A big thank you goes to spaCy and Hugging Face - who made life a million times easier.