panigrah
/

wineberto-ner

Token Classification

Inference Endpoints

Model card Files Files and versions Community

wineberto-ner / README.md

panigrah's picture

Update README.md

c13d4ec 8 months ago

|

history blame contribute delete

No virus

2.99 kB

	---
	license: unknown
	pipeline_tag: token-classification
	tags:
	- wine
	- ner
	widget:
	- text: 'Heitz Cabernet Sauvignon California Napa Valley Napa US this tremendous 100% varietal wine hails from oakville and was aged over three years in oak. juicy red-cherry fruit and a compelling hint of caramel greet the palate, framed by elegant, fine tannins and a subtle minty tone in the background. balanced and rewarding from start to finish, it has years ahead of it to develop further nuance. enjoy 2022'
	example_title: 'California Cab'

	---

	# Wineberto ner model

	Pretrained model on on wine labels and descriptions for named entity recognition that uses bert-base-uncased as the base model. This tries to recognize both the wine label and also description about the wine.
	<b>The label discovery doesnt work as well as just using the panigrah/winberto-labels model. </b>

	* Updated to remove bias on position of wine label in the training inputs.
	* also updated to remove trying to get the wine classification. e.g. Grand Cru etc because training data is not reliable.


	## Model description


	## How to use

	You can use this model directly for named entity recognition like so

	```python
	>>> from transformers import pipeline
	>>> ner = pipeline('ner', model='winberto-ner-uncased')
	>>> tokens = ner("Heitz Cabernet Sauvignon California Napa Valley Napa US this tremendous 100% varietal wine hails from oakville and was aged over three years in oak. juicy red-cherry fruit and a compelling hint of caramel greet the palate, framed by elegant, fine tannins and a subtle minty tone in the background. balanced and rewarding from start to finish, it has years ahead of it to develop further nuance. enjoy 2022")
	>>> for t in toks:
	>>> print(f"{t['word']}: {t['entity_group']}: {t['score']:.5}")

	heitz: producer: 0.99988
	cab: wine: 0.9999
	##ernet sauvignon: wine: 0.95893
	california: province: 0.99992
	napa valley: region: 0.99991
	napa: subregion: 0.99987
	us: country: 0.99996
	oak: flavor: 0.99992
	juicy: mouthfeel: 0.99992
	cherry: flavor: 0.99994
	fruit: flavor: 0.99994
	cara: flavor: 0.99993
	##mel: flavor: 0.99731
	mint: flavor: 0.99994
	balanced: mouthfeel: 0.99992
	```

	## Training data

	The BERT model was trained on 20K reviews and wine labels derived from https://huggingface.co/datasets/james-burton/wine_reviews_all_text and manually annotated to capture the following tokens

	```
	adjective: nice, exciting, strong etc
	country: countries specified in label or description
	flavor: fruit, apple, toast, smoke etc
	grape: Cab, Cabernet Sauvignon, etc
	mouthfeel: lucious, smooth, textured, rough etc
	producer: wine maker
	province, region: province and region of wine - sometimes these get mixed up
	```

	## Training procedure
	```
	model_id = 'bert-base-uncased'
	arguments = TrainingArguments(
	evaluation_strategy="epoch",
	learning_rate=2e-5,
	per_device_train_batch_size=8,
	per_device_eval_batch_size=8,
	num_train_epochs=5,
	weight_decay=0.01,
	)
	...
	trainer.train()
	```