File size: 2,991 Bytes
d5dff4b
 
c0b4486
 
 
 
c13d4ec
 
 
 
d5dff4b
c0b4486
 
 
c0ce103
 
c0b4486
9adf6e1
 
 
c0ce103
c0b4486
 
 
 
 
 
 
 
 
 
402f230
c0b4486
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9adf6e1
 
 
 
 
 
 
c0b4486
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
license: unknown
pipeline_tag: token-classification
tags:
- wine
- ner
widget:
- text: 'Heitz Cabernet Sauvignon California Napa Valley Napa US this tremendous 100% varietal wine hails from oakville and was aged over three years in oak. juicy red-cherry fruit and a compelling hint of caramel greet the palate, framed by elegant, fine tannins and a subtle minty tone in the background. balanced and rewarding from start to finish, it has years ahead of it to develop further nuance. enjoy 2022'
  example_title: 'California Cab'

---

# Wineberto ner model

Pretrained model on on wine labels and descriptions for named entity recognition that uses bert-base-uncased as the base model. This tries to recognize both the wine label and also description about the wine. 
<b>The label discovery doesnt work as well as just using the panigrah/winberto-labels model. </b>

* Updated to remove bias on position of wine label in the training inputs.
* also updated to remove trying to get the wine classification. e.g. Grand Cru etc because training data is not reliable.


## Model description


## How to use

You can use this model directly for named entity recognition like so

```python
>>> from transformers import pipeline
>>> ner = pipeline('ner', model='winberto-ner-uncased')
>>> tokens = ner("Heitz Cabernet Sauvignon California Napa Valley Napa US this tremendous 100% varietal wine hails from oakville and was aged over three years in oak. juicy red-cherry fruit and a compelling hint of caramel greet the palate, framed by elegant, fine tannins and a subtle minty tone in the background. balanced and rewarding from start to finish, it has years ahead of it to develop further nuance. enjoy 2022")
>>> for t in toks:
>>>    print(f"{t['word']}: {t['entity_group']}: {t['score']:.5}")

heitz: producer: 0.99988
cab: wine: 0.9999
##ernet sauvignon: wine: 0.95893
california: province: 0.99992
napa valley: region: 0.99991
napa: subregion: 0.99987
us: country: 0.99996
oak: flavor: 0.99992
juicy: mouthfeel: 0.99992
cherry: flavor: 0.99994
fruit: flavor: 0.99994
cara: flavor: 0.99993
##mel: flavor: 0.99731
mint: flavor: 0.99994
balanced: mouthfeel: 0.99992
```

## Training data

The BERT model was trained on 20K reviews and wine labels derived from https://huggingface.co/datasets/james-burton/wine_reviews_all_text and manually annotated to capture the following tokens

```
adjective: nice, exciting, strong etc
country: countries specified in label or description
flavor: fruit, apple, toast, smoke etc
grape: Cab, Cabernet Sauvignon, etc
mouthfeel: lucious, smooth, textured, rough etc
producer: wine maker
province, region: province and region of wine - sometimes these get mixed up
```

## Training procedure
```
model_id = 'bert-base-uncased'
arguments = TrainingArguments(
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=5,
    weight_decay=0.01,
)
...
trainer.train()
```