Edit model card

bert-base-cased-literary-NER

A NER model trained on a literary dataset of the first chapter of 40 novels. The model supports the following NER class: PER, ORG and LOC. If you use the model in a huggingface pipeline, pass aggregation_strategy="first".

Dataset

We corrected the dataset of Dekker et al. (2019) and added LOC and ORG annotations.

Citation

If you use this model in your research, please cite:

@InProceedings{amalvy:hal-03972448,
  title	       = {{Data Augmentation for Robust Character Detection in
                  Fantasy Novels}},
  author       = {Amalvy, Arthur and Labatut, Vincent and Dufour,
                  Richard},
  url	       = {https://hal.science/hal-03972448},
  booktitle    = {{Workshop on Computational Methods in the Humanities
                  2022}},
  YEAR	       = {2022},
  hal_id       = {hal-03972448},
  hal_version  = {v1},
}

The dataset was originally published and annotated by Dekker et al (2019):

@Article{dekker-2019-evaluation_ner_social_networks_novels,
  author       = {Dekker, N. and Kuhn, T. and van Erp, M.},
  journal      = {PeerJ Computer Science},
  title        = {Evaluating named entity recognition tools for extracting social networks from novels},
  year         = {2019},
  pages        = {e189},
  volume       = {5},
  doi          = {10.7717/peerj-cs.189},
}
Downloads last month
34