File size: 1,481 Bytes
87b5fdc
 
 
 
 
 
 
 
 
 
388f7dd
87b5fdc
 
 
 
 
cc20e16
f310503
15a9f86
f310503
3228651
 
15a9f86
cc20e16
 
87b5fdc
 
 
 
f5afbf0
 
 
 
 
87b5fdc
f5afbf0
adaf860
f5afbf0
 
 
 
 
 
 
 
08cb518
f5afbf0
08cb518
f5afbf0
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
---
language:
- en
tags:
- token-classification
- address-NER
- NER
- bert-base-uncased

datasets:
- Ultra Fine Entity Typing
metrics:
- Precision
- Recall
- F1 Score

widget:
- text: "Hi, I am Kermit and I live in Berlin"
- text: "It is very difficult to find a house in Berlin, Germany."
- text: "ML6 is a very cool company from Belgium"
- text: "Samuel ppops in a happy plce called Berlin which happens to be Kazakhstan"
- text: "My family and I visited Montreal, Canada last week and the flight from Amsterdam took 9 hours"



---



## City-Country-NER

A `bert-base-uncased` model finetuned on a custom dataset to detect `Country` and `City` names from a given sentence. 

### Custom Dataset
We weakly supervised the [Ultra-Fine Entity Typing](https://www.cs.utexas.edu/~eunsol/html_pages/open_entity.html) dataset to include the `City` and `Country` information. We also did some extra preprocessing to remove false labels. 

The model predicts 3 different tags: `OTHER`, `CITY` and `COUNTRY`



### How to use the finetuned model?

```
from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("ml6team/bert-base-uncased-city-country-ner")

model = AutoModelForTokenClassification.from_pretrained("ml6team/bert-base-uncased-city-country-ner")

from transformers import pipeline

nlp = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="simple")
nlp("My name is Kermit and I live in London.")
```