Add SetFit model

Browse files

Files changed (13) hide show

1_Pooling/config.json +10 -0
README.md +195 -0
config.json +24 -0
config_sentence_transformers.json +9 -0
config_setfit.json +4 -0
model.safetensors +3 -0
model_head.pkl +3 -0
modules.json +14 -0
sentence_bert_config.json +4 -0
special_tokens_map.json +51 -0
tokenizer.json +0 -0
tokenizer_config.json +59 -0
vocab.txt +0 -0

1_Pooling/config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "word_embedding_dimension": 768,
+  "pooling_mode_cls_token": false,
+  "pooling_mode_mean_tokens": true,
+  "pooling_mode_max_tokens": false,
+  "pooling_mode_mean_sqrt_len_tokens": false,
+  "pooling_mode_weightedmean_tokens": false,
+  "pooling_mode_lasttoken": false,
+  "include_prompt": true
+}

README.md ADDED Viewed

	@@ -0,0 +1,195 @@

+---
+library_name: setfit
+tags:
+- setfit
+- sentence-transformers
+- text-classification
+- generated_from_setfit_trainer
+metrics:
+- accuracy
+widget:
+- text: 'I made hubby some ginger syrup this afternoon. He loves a whiskey and ginger.
+    Instead of buying ginger ale in plastic bottles we use a soda stream and #homemade
+    ginger syrup. #ecofriendly #sustainable #plasticfree pic.twitter.com/kEHHYaSuXr'
+- text: Four roses small batch select. Milk chocolate on the nose and palate for me
+- text: 'strong. bit old fashioned. always satisfying.. smoking hot is good… wait
+    are we talking about my old fashion or how I like my men? ?????? Smoky Wakashi
+    old fashioned ?? #comeonbabylightmyfire #oldfashioned #classic #whiskey #strong
+    #stiff #neverdisappoints #happiness #smoky #hawaii #hawaiilife'
+- text: Pineapple Demerara Old Fashioned by highproofpreacher made with his house
+    pineapple demerara syrup
+- text: Ordered a pink drink & smoked old fashioned & both were delicious & had nice
+    presentations
+pipeline_tag: text-classification
+inference: true
+base_model: sentence-transformers/paraphrase-mpnet-base-v2
+---
+# SetFit with sentence-transformers/paraphrase-mpnet-base-v2
+This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [sentence-transformers/paraphrase-mpnet-base-v2](https://huggingface.co/sentence-transformers/paraphrase-mpnet-base-v2) as the Sentence Transformer embedding model. A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification.
+The model has been trained using an efficient few-shot learning technique that involves:
+1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
+2. Training a classification head with features from the fine-tuned Sentence Transformer.
+## Model Details
+### Model Description
+- **Model Type:** SetFit
+- **Sentence Transformer body:** [sentence-transformers/paraphrase-mpnet-base-v2](https://huggingface.co/sentence-transformers/paraphrase-mpnet-base-v2)
+- **Classification head:** a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance
+- **Maximum Sequence Length:** 512 tokens
+- **Number of Classes:** 6 classes
+<!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
+<!-- - **Language:** Unknown -->
+<!-- - **License:** Unknown -->
+### Model Sources
+- **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit)
+- **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
+- **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
+### Model Labels
+| Label | Examples                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
+|:------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| 4     | <ul><li>"@vurnt22 Ginger beer and bourbon is one of two times I actually drink anything ginger-y. The other is ginger ale on an airplane because it just seems like you're supposed to."</li><li>"It's not for everyone, but almost everywhere sells Moscow mules. Ginger ale is also good. If there were Ginger wines, Ginger stouts, and Ginger Whiskeys, I'd probably drink those too."</li><li>'I just like the smell of cinnamon. I like the taste too. My favorite candy is cinnamon-flavored, my favorite tea is cinnamon-flavored, my favorite whiskey is cinnamon-flavored.'</li></ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
+| 0     | <ul><li>"Bourbon Chocolate Ice Cream. It fluffs up beautifully, doesn't melt rapidly during serving and it is one of the best chocolate flavours I have ever had. You know where the recipe is, go and get it."</li><li>'Beautiful Lady.... Now the question is, what did you put in it? I prefer Chocolate Whiskey myself....lol'</li><li>"Bourbon S'mores~Maybe it’s the promise of summer, the nostalgia of a campfire and roasting marshmallows, or the memories of childhood… but S’mores are one of my favorite treats. The way the toasty marshmallow melts the chocolate and the texture of them sandwiched between graham crackers just makes me happy. The Bourbon S’mores Bundt is a grown up version of a childhood favorite. Chocolate graham cracker cake, soaked with bourbon and topped with marshmallow sauce, a fudgy bourbon glaze and toasted marshmallows. One bite and you’ll want “some more."</li></ul>                                                                                                                   |
+| 1     | <ul><li>'That could be it. Helps the smoke stick to the meat and it almost doesn’t matter what you use. I use apple cider vinegar with a little bourbon mixed in. I have zero evidence the bourbon has any effect, it just sounds cool, lol. Try that next time. Just a quick spritz to keep the edges from drying out every hour or so until you wrap it or wherever it’s about 165F.'</li><li>'My brother in law makes the absolute best smoky old fashioned. #whiskey #oldfashioned #smoky #drinks'</li><li>'Smoked to perfection ?? Bridge Street BBQ Platter | House Smoked Beef Brisket, Baked Mac n’ Cheese, Bourbon Baked Beans, Fresh Cornbread and Honey Butter, House B&B Pickles, House Pickled Onion $29 Suggested Drink Pairing: Burnt Orange And Vanilla Old Fashioned. #eatgr #grandrapidsmichigan #grandrapids #happyhour #eatlocal #bridgestreet #beercitywasmissingthebourbon #beercity #westsideisthebestside #grandrapidsmi #whiskey #grnowfood #grnow #supportlocal #grandrapidsblogger #localbusiness #iheartgr'</li></ul> |
+| 3     | <ul><li>'A pastry that not only looks like the fruit it’s meant to showcase but also bursts with the fresh flavor of it. In my mind it is a fusion of two classics - a cocktail Whiskey Sour and a Lemon Meringue pie. ▫️Candied lemon and orange peel is suspended in a lemon gel made with freshly squeezed lemon juice and bourbon. ▫️the fruity core is surrounded by white chocolate ganache made with Italian meringue.'</li><li>'They also do some interesting stuff like they have a summer whiskey where it is infused tea and lemon.'</li><li>'Cheers to Peach Whiskey! This peach whiskey from olesmoky goes perfect with BBQ as a refreshing cocktail or on the rocks. I mixed mine with pineapple juice and ginger beer. The perfect refreshing smooth texture, and all the citrus notes of the peach come through. I love drinking Ole Smoky Whiskey, as it’s the best on the market. '</li></ul>                                                                                                                                   |
+| 2     | <ul><li>'I soaked walnuts in like 4 shots of bourbon with brown sugar and cinnamon'</li><li>'Figured pecans and bourbon both like a little smoke so decided to smoke my Bourbon Pecan pie recipe for tomorrow. Lick test on the thermometer probe says its delicious. Will find out for sure tomorrow.'</li><li>'I looooove pecan pie. I found a delicious recipe for bourbon pecan pie with homemade bourbon whip cream. I may need to make one soon'</li></ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
+| 5     | <ul><li>'Just in Nola we have Roulaison, Lula, seven three, wetlands sake, Atelier Vie and happy raptor. A lot of bars have one or two of these available but I rarely see them featured in cocktails. I’d especially love to try flights of local rums or whiskeys alongside common brands so you can see what makes the local stuff unique'</li><li>'I have some Milk Chocolate Truffle right now and that shit is good.'</li><li>'We celebrated our one-year anniversary here and the staff made us feel so loved and celebrated. The butternut bisque for the fall menu was incredible. My whiskey sour was also phenomenal. The room was loud and cold but not uncommon for indoor restaurant.'</li></ul>                                                                                                                                                                                                                                                                                                                                    |
+## Uses
+### Direct Use for Inference
+First install the SetFit library:
+```bash
+pip install setfit
+```
+Then you can load this model and run inference.
+```python
+from setfit import SetFitModel
+# Download from the 🤗 Hub
+model = SetFitModel.from_pretrained("bhaskars113/whiskey-recipe-type-model")
+# Run inference
+preds = model("Four roses small batch select. Milk chocolate on the nose and palate for me")
+```
+<!--
+### Downstream Use
+*List how someone could finetune this model on their own dataset.*
+-->
+<!--
+### Out-of-Scope Use
+*List how the model may foreseeably be misused and address what users ought not to do with the model.*
+-->
+<!--
+## Bias, Risks and Limitations
+*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
+-->
+<!--
+### Recommendations
+*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
+-->
+## Training Details
+### Training Set Metrics
+| Training set | Min | Median  | Max |
+|:-------------|:----|:--------|:----|
+| Word count   | 7   | 50.0446 | 362 |
+| Label | Training Sample Count |
+|:------|:----------------------|
+| 0     | 20                    |
+| 1     | 20                    |
+| 2     | 20                    |
+| 3     | 20                    |
+| 4     | 16                    |
+| 5     | 16                    |
+### Training Hyperparameters
+- batch_size: (16, 16)
+- num_epochs: (1, 1)
+- max_steps: -1
+- sampling_strategy: oversampling
+- num_iterations: 20
+- body_learning_rate: (2e-05, 2e-05)
+- head_learning_rate: 2e-05
+- loss: CosineSimilarityLoss
+- distance_metric: cosine_distance
+- margin: 0.25
+- end_to_end: False
+- use_amp: False
+- warmup_proportion: 0.1
+- seed: 42
+- eval_max_steps: -1
+- load_best_model_at_end: False
+### Training Results
+| Epoch  | Step | Training Loss | Validation Loss |
+|:------:|:----:|:-------------:|:---------------:|
+| 0.0036 | 1    | 0.239         | -               |
+| 0.1786 | 50   | 0.1855        | -               |
+| 0.3571 | 100  | 0.0275        | -               |
+| 0.5357 | 150  | 0.0397        | -               |
+| 0.7143 | 200  | 0.0063        | -               |
+| 0.8929 | 250  | 0.0034        | -               |
+### Framework Versions
+- Python: 3.10.12
+- SetFit: 1.0.3
+- Sentence Transformers: 2.6.1
+- Transformers: 4.38.2
+- PyTorch: 2.2.1+cu121
+- Datasets: 2.18.0
+- Tokenizers: 0.15.2
+## Citation
+### BibTeX
+```bibtex
+@article{https://doi.org/10.48550/arxiv.2209.11055,
+    doi = {10.48550/ARXIV.2209.11055},
+    url = {https://arxiv.org/abs/2209.11055},
+    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
+    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
+    title = {Efficient Few-Shot Learning Without Prompts},
+    publisher = {arXiv},
+    year = {2022},
+    copyright = {Creative Commons Attribution 4.0 International}
+}
+```
+<!--
+## Glossary
+*Clearly define terms in order to be accessible across audiences.*
+-->
+<!--
+## Model Card Authors
+*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
+-->
+<!--
+## Model Card Contact
+*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
+-->

config.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "_name_or_path": "sentence-transformers/paraphrase-mpnet-base-v2",
+  "architectures": [
+    "MPNetModel"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "bos_token_id": 0,
+  "eos_token_id": 2,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "layer_norm_eps": 1e-05,
+  "max_position_embeddings": 514,
+  "model_type": "mpnet",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 1,
+  "relative_attention_num_buckets": 32,
+  "torch_dtype": "float32",
+  "transformers_version": "4.38.2",
+  "vocab_size": 30527
+}

config_sentence_transformers.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "__version__": {
+    "sentence_transformers": "2.0.0",
+    "transformers": "4.7.0",
+    "pytorch": "1.9.0+cu102"
+  },
+  "prompts": {},
+  "default_prompt_name": null
+}

config_setfit.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "normalize_embeddings": false,
+  "labels": null
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1eb8bae1a8acf29af010a45f66e5b43adf3cae37dfba6310a734177565cb96bf
+size 437967672

model_head.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cf5fb3d01a19d1b3302c79f3ee58d7ddaa1ec9e8cc4db85893c79d81c8717c5b
+size 37799

modules.json ADDED Viewed

	@@ -0,0 +1,14 @@

+[
+  {
+    "idx": 0,
+    "name": "0",
+    "path": "",
+    "type": "sentence_transformers.models.Transformer"
+  },
+  {
+    "idx": 1,
+    "name": "1",
+    "path": "1_Pooling",
+    "type": "sentence_transformers.models.Pooling"
+  }
+]

sentence_bert_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "max_seq_length": 512,
+  "do_lower_case": false
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,51 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "cls_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "<mask>",
+    "lstrip": true,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,59 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "104": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "30526": {
+      "content": "<mask>",
+      "lstrip": true,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "<s>",
+  "do_basic_tokenize": true,
+  "do_lower_case": true,
+  "eos_token": "</s>",
+  "mask_token": "<mask>",
+  "model_max_length": 512,
+  "never_split": null,
+  "pad_token": "<pad>",
+  "sep_token": "</s>",
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "MPNetTokenizer",
+  "unk_token": "[UNK]"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff