--- tags: - multi-label-classification - multi-intent-detection - huggingface - deberta-v3 - transformers library_name: transformers task: - text-classification license: apache-2.0 --- # Multi-Intent Detection (MID) Model This model was fine-tuned for the task of **Multi-Intent Detection (MID)**, a type of multi-label classification where each input can have multiple labels assigned. The dataset used for fine-tuning is specifically designed to simplify the MID task, with the number of labels limited to two per instance. ## Model Details - **Base Model:** DeBERTa-v3-base - **Task:** Multi-label classification - **Number of Labels:** 2 - **Fine-tuning Framework:** Hugging Face Transformers ## Training Configuration - **Training Arguments:** - **Learning Rate:** 2e-5 - **Batch Size (Train):** 16 - **Batch Size (Eval):** 16 - **Gradient Accumulation Steps:** 2 - **Number of Epochs:** 8 - **Weight Decay:** 0.01 - **Warmup Ratio:** 10% - **Learning Rate Scheduler Type:** Cosine - **Mixed Precision Training:** Enabled (FP16) - **Logging Steps:** 50 ## Performance Metrics | Epoch | Training Loss | Validation Loss | Precision | Recall | F1 Score | Accuracy | |-------|---------------|-----------------|-----------|--------|----------|----------| | 0 | 0.069100 | 0.069115 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | | 2 | 0.024100 | 0.022929 | 0.952334 | 0.316920 | 0.475576 | 0.078652 | | 4 | 0.009200 | 0.010799 | 0.959768 | 0.819894 | 0.884334 | 0.653668 | | 6 | 0.006300 | 0.008773 | 0.963243 | 0.883344 | 0.921565 | 0.770654 | | 7 | 0.006200 | 0.008707 | 0.961635 | 0.886319 | 0.922442 | 0.775281 | ### Final Evaluation Metrics (Epoch 8): - **Validation Loss:** 0.0087 - **Precision:** 0.9616 - **Recall:** 0.8863 - **F1 Score:** 0.9224 - **Accuracy:** 0.7753 ## Limitations - **Simplified Multi-Label Setting:** This model assumes a fixed number of two labels per instance, which may not generalize to datasets with more complex multi-label settings. - **Performance on Unseen Data:** The model's performance may degrade if applied to data distributions significantly different from the training dataset.