cjber
/

reddit-ner-place_names

Token Classification

Inference Endpoints

Model card Files Files and versions Community

reddit-ner-place_names / README.md

cjber's picture

Update README.md

e32961e 7 months ago

|

raw history blame contribute delete

No virus

1.7 kB

	---
	language: en
	datasets:
	- wnut_17
	license: mit
	metrics:
	- f1
	widget:
	- text: "Manchester played Liverpool last night in Liverpool."
	example_title: "Metonyms"
	- text: "i live in brum - slang for birmingham"
	example_title: "Slang / informal text"
	---

	# Reddit NER for place names

	Fine-tuned `bert-base-uncased` for named entity recognition, trained using `wnut_17` with 498 additional comments from Reddit. This model is intended solely for place name extraction from social media text, other entities have therefore been removed.

	This model was created with two key goals:

	1. Improved NER results on social media
	2. Target only place names

	## Model code

	For the model code please see the following [Model GitHub Repository](https://github.com/cjber/reddit-model).

	## Metonymy

	In theory this model should be able to detect and ignore metonyms. For example in the sentence:

	`Manchester played Liverpool last night in Liverpool.`

	Both Manchester and the first Liverpool mention refer to football teams, therefore the model outputs:

	```python
	[
	{
	"entity_group": "location",
	"score": 0.9975672,
	"word": "liverpool",
	"start": 42,
	"end": 51,
	}
	]
	```

	## Use in `transformers`

	```python
	from transformers import pipeline

	generator = pipeline(
	task="ner",
	model="cjber/reddit-ner-place_names",
	tokenizer="cjber/reddit-ner-place_names",
	aggregation_strategy="first",
	)

	out = generator("I like reading books. I live in Reading.")
	```

	`out` gives:

	```python
	[
	{
	"entity_group": "location",
	"score": 0.94123614,
	"word": "reading",
	"start": 32,
	"end": 39,
	}
	]

	```