--- library_name: setfit tags: - setfit - sentence-transformers - text-classification - generated_from_setfit_trainer metrics: - accuracy widget: - text: How does cannibalization within the RTEC category compare to other product categories within the MT channel, influencing the overall volumelift? - text: Can you identify the specific factors or challenges that contributed to the decline in ROI within TT in 2022 compared to 2021? - text: Which Sku cannibalizes higher margin Skus the most for CHEDRAUI channel_name? - text: Can you compare the overall market share and competitive landscape of the category more sensitive to internal cannibalization with other categories? - text: Can you identify the key factors or challenges that have contributed to the ROI decline within TT pipeline_tag: text-classification inference: true base_model: intfloat/multilingual-e5-large model-index: - name: SetFit with intfloat/multilingual-e5-large results: - task: type: text-classification name: Text Classification dataset: name: Unknown type: unknown split: test metrics: - type: accuracy value: 0.9130434782608695 name: Accuracy --- # SetFit with intfloat/multilingual-e5-large This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [intfloat/multilingual-e5-large](https://huggingface.co/intfloat/multilingual-e5-large) as the Sentence Transformer embedding model. A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification. The model has been trained using an efficient few-shot learning technique that involves: 1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning. 2. Training a classification head with features from the fine-tuned Sentence Transformer. ## Model Details ### Model Description - **Model Type:** SetFit - **Sentence Transformer body:** [intfloat/multilingual-e5-large](https://huggingface.co/intfloat/multilingual-e5-large) - **Classification head:** a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance - **Maximum Sequence Length:** 512 tokens - **Number of Classes:** 3 classes ### Model Sources - **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit) - **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055) - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit) ### Model Labels | Label | Examples | |:------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 2 |

'Are there particular factors or trends contributing to the high level of cannibalization for certain brands in the SS category?'
'How does the degree of cannibalization vary among different SKUs in the RTEC ?'
'Which Sku cannibalizes higher margin Skus the most?'

| | 1 |

'Are there plans to enhance promotional activities specific to the MT to mitigate the ROI decline in 2023?'
'What are the main reasons for ROI decline in 2022 in MT compared to 2021?'
'Are there changes in consumer preferences or trends that have impacted the Lift of Zucaritas, and how does this compare to other brands like Pringles or Frutela?'

| | 0 |

'What type of promotions worked best for MT Walmart in 2022?'
'Which channel has the max ROI and Vol Lift when we run the Promotion for RTEC category?'
'Which sub_catg_nm have the highest ROI in 2022?'

| ## Evaluation ### Metrics | Label | Accuracy | |:--------|:---------| | **all** | 0.9130 | ## Uses ### Direct Use for Inference First install the SetFit library: ```bash pip install setfit ``` Then you can load this model and run inference. ```python from setfit import SetFitModel # Download from the 🤗 Hub model = SetFitModel.from_pretrained("vgarg/promo_prescriptive_gpt_28_02_2024_v1") # Run inference preds = model("Which Sku cannibalizes higher margin Skus the most for CHEDRAUI channel_name?") ``` ## Training Details ### Training Set Metrics | Training set | Min | Median | Max | |:-------------|:----|:--------|:----| | Word count | 7 | 15.8333 | 30 | | Label | Training Sample Count | |:------|:----------------------| | 0 | 10 | | 1 | 10 | | 2 | 10 | ### Training Hyperparameters - batch_size: (16, 16) - num_epochs: (3, 3) - max_steps: -1 - sampling_strategy: oversampling - num_iterations: 20 - body_learning_rate: (2e-05, 2e-05) - head_learning_rate: 2e-05 - loss: CosineSimilarityLoss - distance_metric: cosine_distance - margin: 0.25 - end_to_end: False - use_amp: False - warmup_proportion: 0.1 - seed: 42 - eval_max_steps: -1 - load_best_model_at_end: False ### Training Results | Epoch | Step | Training Loss | Validation Loss | |:------:|:----:|:-------------:|:---------------:| | 0.0133 | 1 | 0.3582 | - | | 0.6667 | 50 | 0.0024 | - | | 1.3333 | 100 | 0.0005 | - | | 2.0 | 150 | 0.0004 | - | | 2.6667 | 200 | 0.0002 | - | ### Framework Versions - Python: 3.10.12 - SetFit: 1.0.3 - Sentence Transformers: 2.4.0 - Transformers: 4.37.2 - PyTorch: 2.1.0+cu121 - Datasets: 2.17.1 - Tokenizers: 0.15.2 ## Citation ### BibTeX ```bibtex @article{https://doi.org/10.48550/arxiv.2209.11055, doi = {10.48550/ARXIV.2209.11055}, url = {https://arxiv.org/abs/2209.11055}, author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren}, keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences}, title = {Efficient Few-Shot Learning Without Prompts}, publisher = {arXiv}, year = {2022}, copyright = {Creative Commons Attribution 4.0 International} } ```