baukearends's picture
Update README.md
98e9d7c verified
---
tags:
- spacy
- arxiv:2408.06930
- medical
language:
- nl
license: cc-by-sa-4.0
model-index:
- name: Echocardiogram_SpanCategorizer_mitral_regurgitation
results:
- task:
type: token-classification
dataset:
type: test
name: "internal test set"
metrics:
- name: "Weighted f1"
type: f1
value: 0.935
verified: false
- name: "Weighted precision"
type: precision
value: 0.969
verified: false
- name: "Weighted recall"
type: recall
value: 0.903
verified: false
pipeline_tag: token-classification
metrics:
- f1
- precision
- recall
---
# Description
This model is a spaCy SpanCategorizer model trained from scratch on Dutch echocardiogram reports sourced from Electronic Health Records. The publication associated with the span classification task can be found at https://arxiv.org/abs/2408.06930. The config file for training the model can be found at https://github.com/umcu/echolabeler.
# Minimum working example
```python
!pip install https://huggingface.co/baukearends/Echocardiogram-SpanCategorizer-mitral-regurgitation/resolve/main/nl_Echocardiogram_SpanCategorizer_mitral_regurgitation-any-py3-none-any.whl
```
```python
import spacy
nlp = spacy.load("nl_Echocardiogram_SpanCategorizer_mitral_regurgitation")
```
```python
prediction = nlp("Op dit echo geen duidelijke WMA te zien, goede systolische L.V. functie, wel L.V.H., diastolische dysfunctie graad 1A tot 2. Geringe aortastenose en - matige -insufficientie. Geringe M.I.")
for span, score in zip(prediction.spans['sc'], prediction.spans['sc'].attrs['scores']):
print(f"Span: {span}, label: {span.label_}, score: {score[0]:.3f}")
```
# Label Scheme
<details>
<summary>View label scheme (4 labels for 1 components)</summary>
| Component | Labels |
| --- | --- |
| **`spancat`** | `mitral_valve_native_regurgitation_not_present`, `mitral_valve_native_regurgitation_mild`, `mitral_valve_native_regurgitation_moderate`, `mitral_valve_native_regurgitation_severe` |
</details>
# Intended use
The model is developed for span classification on Dutch clinical text. Since it is a domain-specific model trained on medical data, it is meant to be used on medical NLP tasks for Dutch.
# Data
The model was trained on approximately 4,000 manually annotated echocardiogram reports from the University Medical Centre Utrecht. The training data was anonymized before starting the training procedure.
| Feature | Description |
| --- | --- |
| **Name** | `Echocardiogram_SpanCategorizer_mitral_regurgitation` |
| **Version** | `1.0.0` |
| **spaCy** | `>=3.7.4,<3.8.0` |
| **Default Pipeline** | `tok2vec`, `spancat` |
| **Components** | `tok2vec`, `spancat` |
| **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
| **Sources** | n/a |
| **License** | `cc-by-sa-4.0` |
| **Author** | [Bauke Arends]() |
# Contact
If you are having problems with this model please add an issue on our git: https://github.com/umcu/echolabeler/issues
# Usage
If you use the model in your work please use the following referral; https://doi.org/10.48550/arXiv.2408.06930
# References
Paper: Bauke Arends, Melle Vessies, Dirk van Osch, Arco Teske, Pim van der Harst, René van Es, Bram van Es (2024): Diagnosis extraction from unstructured Dutch echocardiogram reports using span- and document-level characteristic classification, Arxiv https://arxiv.org/abs/2408.06930