|
--- |
|
language: en |
|
datasets: |
|
- wnut_17 |
|
license: mit |
|
metrics: |
|
- f1 |
|
widget: |
|
- text: "Manchester played Liverpool last night in Liverpool." |
|
example_title: "Metonyms" |
|
- text: "i live in brum - slang for birmingham" |
|
example_title: "Slang / informal text" |
|
--- |
|
|
|
# Reddit NER for place names |
|
|
|
Fine-tuned `bert-base-uncased` for named entity recognition, trained using `wnut_17` with 498 additional comments from Reddit. This model is intended solely for place name extraction from social media text, other entities have therefore been removed. |
|
|
|
This model was created with two key goals: |
|
|
|
1. Improved NER results on social media |
|
2. Target only place names |
|
|
|
## Model code |
|
|
|
For the model code please see the following [Model GitHub Repository](https://github.com/cjber/reddit-model). |
|
|
|
## Metonymy |
|
|
|
In theory this model should be able to detect and ignore metonyms. For example in the sentence: |
|
|
|
`Manchester played Liverpool last night in Liverpool.` |
|
|
|
Both Manchester and the first Liverpool mention refer to football teams, therefore the model outputs: |
|
|
|
```python |
|
[ |
|
{ |
|
"entity_group": "location", |
|
"score": 0.9975672, |
|
"word": "liverpool", |
|
"start": 42, |
|
"end": 51, |
|
} |
|
] |
|
``` |
|
|
|
## Use in `transformers` |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
generator = pipeline( |
|
task="ner", |
|
model="cjber/reddit-ner-place_names", |
|
tokenizer="cjber/reddit-ner-place_names", |
|
aggregation_strategy="first", |
|
) |
|
|
|
out = generator("I like reading books. I live in Reading.") |
|
``` |
|
|
|
`out` gives: |
|
|
|
```python |
|
[ |
|
{ |
|
"entity_group": "location", |
|
"score": 0.94123614, |
|
"word": "reading", |
|
"start": 32, |
|
"end": 39, |
|
} |
|
] |
|
|
|
``` |