File size: 1,517 Bytes
322b199 9b906a6 322b199 9b906a6 63fb159 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
---
license: cc-by-sa-4.0
tags:
- t5
- coreference
- digital humanities
---
## Literary Coreference Annotations with T5
Coreference annotation is a critical task for much research in the digital humanities. However, literary texts differ in several key ways from those used to train general coreference annotation systems, meaning that they usually underperform. **t5-literary-coreference** is an easy-to-use text-to-text model for sentence-level coreference annotation for literary texts that acheives state-of-the-art performance. It is an adapted version of `t5-3b` trained using the coreference annotations from the [LitBank corpus](https://github.com/dbamman/litbank).
The model takes as input plain sentences (e.g. `"'The visitor sat and listened to her retreating feet."`), and returns the sentence with inline coreference annotations (e.g. `"[The visitor: 1] sat and listened to [her: 2] retreating feet."`).
For an example of how to use the model, see this tutorial on Google Colab or look at `the get_annotations.py` file under the **Files and versions** tab above. If you want to convert the annotated sentences into coreference clusters, the `get_ent_clusters.py` script also under the **Files and versions** tab will output clusters of the form `{1: ['The visitor|0'], 2: ['her|6']}` where the keys are the cluster indices and each value is a list of entities in that cluster. Each entity consists of the representative substring and the index of the first word in the substring combined with a pipe. |