--- license: cc-by-sa-4.0 tags: - t5 - coreference - digital humanities --- ## Literary Coreference Annotations with T5 Coreference annotation is a critical task for much research in the digital humanities. However, literary texts differ in several key ways from those used to train general coreference annotation systems, meaning that they usually underperform. **t5-literary-coreference** is an easy-to-use text-to-text model for sentence-level coreference annotation for literary texts that acheives state-of-the-art performance. It is an adapted version of `t5-3b` trained using the coreference annotations from the [LitBank corpus](https://github.com/dbamman/litbank). The model takes as input plain sentences (e.g. `"'The visitor sat and listened to her retreating feet."`), and returns the sentence with inline coreference annotations (e.g. `"[The visitor: 1] sat and listened to [her: 2] retreating feet."`). For an example of how to use the model, see [this tutorial on Google Colab](https://colab.research.google.com/drive/1G7ziZvFo_sZoygxlS2Nyt_CejRs0r9Jw?usp=sharing) or look at `the get_annotations.py` file under the **Files and versions** tab above. If you want to convert the annotated sentences into coreference clusters, the `get_ent_clusters.py` script also under the **Files and versions** tab will output clusters of the form `{1: ['The visitor|0'], 2: ['her|6']}` where the keys are the cluster indices and each value is a list of entities in that cluster. Each entity consists of the representative substring and the index of the first word in the substring combined with a pipe.