|
--- |
|
language: en |
|
datasets: |
|
- wnut_17 |
|
license: mit |
|
metrics: |
|
- f1 |
|
widget: |
|
- text: "Manchester played Liverpool last night in London." |
|
example_title: "Metonyms" |
|
- text: "i live in brum - slang for birmingham" |
|
example_title: "Slang / informal text" |
|
--- |
|
|
|
# Reddit NER for place names |
|
|
|
Fine-tuned `bert-base-uncased` for named entity recognition, trained using `wnut_17` with 498 additional comments from Reddit. This model is intended solely for place name extraction from social media text, other entities have therefore been removed. |
|
|
|
This model was created with two key goals: |
|
|
|
1. Improved NER results on social media |
|
2. Target only place names |
|
|
|
In theory this model should be able to detect and ignore metonyms. For example in the sentence: |
|
|
|
`Manchester played Liverpool last night in London.` |
|
|
|
Both Manchester and Liverpool refer to football teams, therefore the model outputs: |
|
|
|
`[ |
|
{ |
|
"entity_group": "location", |
|
"score": 0.99784255027771, |
|
"word": "london", |
|
"start": 42, |
|
"end": 48 |
|
} |
|
]` |
|
|
|
|
|
|
|
|
|
## Use in `transformers` |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
generator = pipeline( |
|
task="ner", |
|
model="cjber/reddit-ner-place_names", |
|
tokenizer="cjber/reddit-ner-place_names", |
|
aggregation_strategy="first", |
|
) |
|
|
|
out = generator("I live north of liverpool in Waterloo") |
|
``` |
|
|
|
Out gives: |
|
|
|
```python |
|
[{'entity_group': 'location', |
|
'score': 0.94054973, |
|
'word': 'liverpool', |
|
'start': 16, |
|
'end': 25}, |
|
{'entity_group': 'location', |
|
'score': 0.99520856, |
|
'word': 'waterloo', |
|
'start': 29, |
|
'end': 37}] |
|
``` |