File size: 1,533 Bytes
c769b40
 
 
dd41067
c769b40
 
 
c1fb75b
 
 
 
 
c769b40
 
6337b65
 
2cf2398
eb59026
07aaf0e
 
 
 
6337b65
82c0d3e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6337b65
 
eef9733
6337b65
 
 
 
 
 
9dc4d19
6337b65
 
cff8bc2
df1521a
 
083a5a6
df1521a
 
083a5a6
 
d69bd00
083a5a6
 
 
 
d69bd00
083a5a6
 
ed15c9a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
language: en
datasets:
- wnut_17
license: mit
metrics:
  - f1
widget:
  - text: "My name is Sylvain and I live in Paris"
    example_title: "Parisian"
  - text: "My name is Sarah and I live in London"
    example_title: "Londoner"
---

# Reddit NER for place names

Fine-tuned `bert-base-uncased` for named entity recognition, trained using `wnut_17` with 498 additional comments from Reddit. This model is intended solely for place name extraction from social media text, other entities have therefore been removed.

This model was created with two key goals:

1. Improved NER results on social media
2. Target only place names

In theory this model should be able to detect and ignore metonyms. For example in the sentence:

`Manchester played Liverpool last night in London.`

Both Manchester and Liverpool refer to football teams, therefore the model outputs:

`[
  {
    "entity_group": "location",
    "score": 0.99784255027771,
    "word": "london",
    "start": 42,
    "end": 48
  }
]`




## Use in `transformers`

```python
from transformers import pipeline

generator = pipeline(
    task="ner",
    model="cjber/reddit-ner-place_names",
    tokenizer="cjber/reddit-ner-place_names",
    aggregation_strategy="first",
)

out = generator("I live north of liverpool in Waterloo")
```

Out gives:

```python
[{'entity_group': 'location',
  'score': 0.94054973,
  'word': 'liverpool',
  'start': 16,
  'end': 25},
 {'entity_group': 'location',
  'score': 0.99520856,
  'word': 'waterloo',
  'start': 29,
  'end': 37}]
```