Parallel named entity recognition (NER) and normalisation (NEN) based on sequence labeling with either BiLSTM or BioBERT.
This repository hosts the code of our participation in the CRAFT shared task 2019. If you are interested in training a similar system for biomedical concept recognition, keep reading. If you rather want to use the trained system to predict entities in other texts, have a look at our Zenodo deposit.
- Get CRAFT v4.
- Convert concept annotations to CoNLL format (see below).
- Create dictionary-based predictions using OGER (optional part of both the BiLSTM and BioBERT system).
- Train models with the code in bilstm/ or biobert/.
- Convert the predictions from .conll back to .bionlp.
- Evaluate with the official evaluation suite.
- Follow the instructions to create standoff-annotations in BioNLP format. Place them in a bionlp subdirectory for each entity type.
- Run
git submodule init
to get a clone of Pyysalo's standoff2conll converter. - Make sure the CRAFT corpus is available as a directory or link named CRAFT in this directory.
- Run
./bionlp2conll.sh <NAME> <PATH>
, where NAME is "CHEBI", "CL" etc. and PATH is the target directory. This creates a 4-column CoNLL file for each article. - If you use dictionary-based predictions, add them as a 5th column.
- For the BioBERT system, use biobert/biocodes/conll2conll.py to convert the documents to 2-column CoNLL, and the same script again to convert the predictions back to 4-column format (including prediction harmonisation).
- For converting predicted CoNLL files back to standoff, run
standoff2conll/conll2standoff.py < path/to/doc.conll > path/to/doc.bionlp
for each document.
The labels (IDs) selected for ontology pretraining (yCP in the paper) are listed in this archive.
keras
(BiLSTM)tensorflow
(BioBERT)
The code in this repository is licensed under the AGPL-3.0.
However, the code in the biobert subdirectory uses an Apache License.
If you use this code, please cite us:
Lenz Furrer, Joseph Cornelius, and Fabio Rinaldi (2019): UZH@CRAFT-ST: a Sequence-labeling Approach to Concept Recognition. In: Proceedings of The 5th Workshop on BioNLP Open Shared Tasks (BioNLP-OST 2019). | PDF | bibtex |