--- license: unknown pipeline_tag: token-classification tags: - wine - ner widget: - text: 'Heitz Cabernet Sauvignon California Napa Valley Napa US this tremendous 100% varietal wine hails from oakville and was aged over three years in oak. juicy red-cherry fruit and a compelling hint of caramel greet the palate, framed by elegant, fine tannins and a subtle minty tone in the background. balanced and rewarding from start to finish, it has years ahead of it to develop further nuance. enjoy 2022' example_title: 'California Cab' --- # Wineberto ner model Pretrained model on on wine labels and descriptions for named entity recognition that uses bert-base-uncased as the base model. This tries to recognize both the wine label and also description about the wine. The label discovery doesnt work as well as just using the panigrah/winberto-labels model. * Updated to remove bias on position of wine label in the training inputs. * also updated to remove trying to get the wine classification. e.g. Grand Cru etc because training data is not reliable. ## Model description ## How to use You can use this model directly for named entity recognition like so ```python >>> from transformers import pipeline >>> ner = pipeline('ner', model='winberto-ner-uncased') >>> tokens = ner("Heitz Cabernet Sauvignon California Napa Valley Napa US this tremendous 100% varietal wine hails from oakville and was aged over three years in oak. juicy red-cherry fruit and a compelling hint of caramel greet the palate, framed by elegant, fine tannins and a subtle minty tone in the background. balanced and rewarding from start to finish, it has years ahead of it to develop further nuance. enjoy 2022") >>> for t in toks: >>> print(f"{t['word']}: {t['entity_group']}: {t['score']:.5}") heitz: producer: 0.99988 cab: wine: 0.9999 ##ernet sauvignon: wine: 0.95893 california: province: 0.99992 napa valley: region: 0.99991 napa: subregion: 0.99987 us: country: 0.99996 oak: flavor: 0.99992 juicy: mouthfeel: 0.99992 cherry: flavor: 0.99994 fruit: flavor: 0.99994 cara: flavor: 0.99993 ##mel: flavor: 0.99731 mint: flavor: 0.99994 balanced: mouthfeel: 0.99992 ``` ## Training data The BERT model was trained on 20K reviews and wine labels derived from https://huggingface.co/datasets/james-burton/wine_reviews_all_text and manually annotated to capture the following tokens ``` adjective: nice, exciting, strong etc country: countries specified in label or description flavor: fruit, apple, toast, smoke etc grape: Cab, Cabernet Sauvignon, etc mouthfeel: lucious, smooth, textured, rough etc producer: wine maker province, region: province and region of wine - sometimes these get mixed up ``` ## Training procedure ``` model_id = 'bert-base-uncased' arguments = TrainingArguments( evaluation_strategy="epoch", learning_rate=2e-5, per_device_train_batch_size=8, per_device_eval_batch_size=8, num_train_epochs=5, weight_decay=0.01, ) ... trainer.train() ```