|
--- |
|
license: unknown |
|
pipeline_tag: token-classification |
|
tags: |
|
- wine |
|
- ner |
|
widget: |
|
- text: 'Heitz Cabernet Sauvignon California Napa Valley Napa US this tremendous 100% varietal wine hails from oakville and was aged over three years in oak. juicy red-cherry fruit and a compelling hint of caramel greet the palate, framed by elegant, fine tannins and a subtle minty tone in the background. balanced and rewarding from start to finish, it has years ahead of it to develop further nuance. enjoy 2022' |
|
example_title: 'California Cab' |
|
|
|
--- |
|
|
|
# Wineberto ner model |
|
|
|
Pretrained model on on wine labels and descriptions for named entity recognition that uses bert-base-uncased as the base model. This tries to recognize both the wine label and also description about the wine. |
|
<b>The label discovery doesnt work as well as just using the panigrah/winberto-labels model. </b> |
|
|
|
* Updated to remove bias on position of wine label in the training inputs. |
|
* also updated to remove trying to get the wine classification. e.g. Grand Cru etc because training data is not reliable. |
|
|
|
|
|
## Model description |
|
|
|
|
|
## How to use |
|
|
|
You can use this model directly for named entity recognition like so |
|
|
|
```python |
|
>>> from transformers import pipeline |
|
>>> ner = pipeline('ner', model='winberto-ner-uncased') |
|
>>> tokens = ner("Heitz Cabernet Sauvignon California Napa Valley Napa US this tremendous 100% varietal wine hails from oakville and was aged over three years in oak. juicy red-cherry fruit and a compelling hint of caramel greet the palate, framed by elegant, fine tannins and a subtle minty tone in the background. balanced and rewarding from start to finish, it has years ahead of it to develop further nuance. enjoy 2022") |
|
>>> for t in toks: |
|
>>> print(f"{t['word']}: {t['entity_group']}: {t['score']:.5}") |
|
|
|
heitz: producer: 0.99988 |
|
cab: wine: 0.9999 |
|
##ernet sauvignon: wine: 0.95893 |
|
california: province: 0.99992 |
|
napa valley: region: 0.99991 |
|
napa: subregion: 0.99987 |
|
us: country: 0.99996 |
|
oak: flavor: 0.99992 |
|
juicy: mouthfeel: 0.99992 |
|
cherry: flavor: 0.99994 |
|
fruit: flavor: 0.99994 |
|
cara: flavor: 0.99993 |
|
##mel: flavor: 0.99731 |
|
mint: flavor: 0.99994 |
|
balanced: mouthfeel: 0.99992 |
|
``` |
|
|
|
## Training data |
|
|
|
The BERT model was trained on 20K reviews and wine labels derived from https://huggingface.co/datasets/james-burton/wine_reviews_all_text and manually annotated to capture the following tokens |
|
|
|
``` |
|
adjective: nice, exciting, strong etc |
|
country: countries specified in label or description |
|
flavor: fruit, apple, toast, smoke etc |
|
grape: Cab, Cabernet Sauvignon, etc |
|
mouthfeel: lucious, smooth, textured, rough etc |
|
producer: wine maker |
|
province, region: province and region of wine - sometimes these get mixed up |
|
``` |
|
|
|
## Training procedure |
|
``` |
|
model_id = 'bert-base-uncased' |
|
arguments = TrainingArguments( |
|
evaluation_strategy="epoch", |
|
learning_rate=2e-5, |
|
per_device_train_batch_size=8, |
|
per_device_eval_batch_size=8, |
|
num_train_epochs=5, |
|
weight_decay=0.01, |
|
) |
|
... |
|
trainer.train() |
|
``` |
|
|