wineberto-ner / README.md
panigrah's picture
Update README.md
c13d4ec
---
license: unknown
pipeline_tag: token-classification
tags:
- wine
- ner
widget:
- text: 'Heitz Cabernet Sauvignon California Napa Valley Napa US this tremendous 100% varietal wine hails from oakville and was aged over three years in oak. juicy red-cherry fruit and a compelling hint of caramel greet the palate, framed by elegant, fine tannins and a subtle minty tone in the background. balanced and rewarding from start to finish, it has years ahead of it to develop further nuance. enjoy 2022'
example_title: 'California Cab'
---
# Wineberto ner model
Pretrained model on on wine labels and descriptions for named entity recognition that uses bert-base-uncased as the base model. This tries to recognize both the wine label and also description about the wine.
<b>The label discovery doesnt work as well as just using the panigrah/winberto-labels model. </b>
* Updated to remove bias on position of wine label in the training inputs.
* also updated to remove trying to get the wine classification. e.g. Grand Cru etc because training data is not reliable.
## Model description
## How to use
You can use this model directly for named entity recognition like so
```python
>>> from transformers import pipeline
>>> ner = pipeline('ner', model='winberto-ner-uncased')
>>> tokens = ner("Heitz Cabernet Sauvignon California Napa Valley Napa US this tremendous 100% varietal wine hails from oakville and was aged over three years in oak. juicy red-cherry fruit and a compelling hint of caramel greet the palate, framed by elegant, fine tannins and a subtle minty tone in the background. balanced and rewarding from start to finish, it has years ahead of it to develop further nuance. enjoy 2022")
>>> for t in toks:
>>> print(f"{t['word']}: {t['entity_group']}: {t['score']:.5}")
heitz: producer: 0.99988
cab: wine: 0.9999
##ernet sauvignon: wine: 0.95893
california: province: 0.99992
napa valley: region: 0.99991
napa: subregion: 0.99987
us: country: 0.99996
oak: flavor: 0.99992
juicy: mouthfeel: 0.99992
cherry: flavor: 0.99994
fruit: flavor: 0.99994
cara: flavor: 0.99993
##mel: flavor: 0.99731
mint: flavor: 0.99994
balanced: mouthfeel: 0.99992
```
## Training data
The BERT model was trained on 20K reviews and wine labels derived from https://huggingface.co/datasets/james-burton/wine_reviews_all_text and manually annotated to capture the following tokens
```
adjective: nice, exciting, strong etc
country: countries specified in label or description
flavor: fruit, apple, toast, smoke etc
grape: Cab, Cabernet Sauvignon, etc
mouthfeel: lucious, smooth, textured, rough etc
producer: wine maker
province, region: province and region of wine - sometimes these get mixed up
```
## Training procedure
```
model_id = 'bert-base-uncased'
arguments = TrainingArguments(
evaluation_strategy="epoch",
learning_rate=2e-5,
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
num_train_epochs=5,
weight_decay=0.01,
)
...
trainer.train()
```