|
--- |
|
library_name: transformers |
|
license: mit |
|
base_model: roberta-base |
|
tags: |
|
- generated_from_trainer |
|
model-index: |
|
- name: roberta-student-fine-tuned |
|
results: [] |
|
language: |
|
- en |
|
metrics: |
|
- exact_match |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# roberta-student-fined-tunned |
|
|
|
This model is a fine-tuned version of roberta-base on a dataset provided by Kim Taeuk (김태욱), NLP teacher at Hanyang University. |
|
|
|
The model was trained for multi-intent detection using the BlendX dataset, focusing on complex utterances containing multiple intents. |
|
|
|
It achieves the following results on the evaluation set: |
|
- Loss: 0.0053 |
|
- Exact Match Accuracy: 0.9075 |
|
|
|
|
|
## Model description |
|
|
|
The model is based on roberta-base, a robust transformer model pretrained on a large corpus of English text. |
|
|
|
Fine-tuning was conducted on a specialized dataset focusing on multi-intent detection in utterances with complex intent structures. |
|
|
|
|
|
### Model Architecture |
|
|
|
- **Base Model:** roberta-base |
|
- **Task:** Multi-Intent Detection |
|
- **Languages:** English |
|
|
|
|
|
### Strengths |
|
|
|
High accuracy on evaluation data. |
|
|
|
Capable of detecting multiple intents within a single utterance. |
|
|
|
|
|
### Limitations |
|
|
|
Fine-tuned on a specific dataset; performance may vary on other tasks. |
|
|
|
Limited to English text. |
|
|
|
|
|
## Intended uses & limitations |
|
|
|
### Use Cases |
|
|
|
Multi-intent detection tasks such as customer service queries, virtual assistants, and dialogue systems. |
|
|
|
Academic research and educational projects. |
|
|
|
|
|
### Limitations |
|
|
|
May require additional fine-tuning for domain-specific applications. |
|
|
|
Not designed for multilingual tasks. |
|
|
|
|
|
## Training and evaluation data |
|
|
|
The model was trained on the BlendX dataset, a multi-intent detection benchmark focusing on realistic combinations of user intents in task-oriented dialogues. |
|
|
|
|
|
### Data Details: |
|
|
|
The dataset used for training this model is based on the BlendX dataset, focusing on multi-intent detection in task-oriented dialogues. |
|
While the actual BlendX dataset covers instances that can have varying number of intents between 1 to 3, |
|
the dataset for this assignment only includes instances where there are 2 intents for simplicity. |
|
|
|
|
|
## Dataset License and Source |
|
|
|
The dataset used for training this model is licensed under the **[GNU General Public License v2](https://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html)**. |
|
|
|
### Important Notes: |
|
- Any use, distribution, or modification of this dataset must comply with the terms of the GPL v2 license. |
|
- The dataset source and its original license can be found in its [official GitHub repository](https://github.com/HYU-NLP/BlendX/). |
|
- **Dataset File:** [Download Here](https://huggingface.co/datasets/Meruem/BlendX_simplified/resolve/main/BlendX_simplified.json) |
|
|
|
|
|
### Dataset Format: |
|
- **File Type:** JSON |
|
- **Size:** 28,815 training samples, 1,513 validation samples |
|
- **Data Fields:** |
|
- `split` (string): Indicates if the sample belongs to the training or validation set. |
|
- `utterance` (string): The text input containing multiple intents. |
|
- `intent` (list of strings): The associated intents. |
|
|
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 1e-05 |
|
- train_batch_size: 32 |
|
- eval_batch_size: 32 |
|
- seed: 42 |
|
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments |
|
- lr_scheduler_type: cosine_with_restarts |
|
- warmup_steps: 200 |
|
- num_epochs: 20 |
|
- save_total_limit: 3 |
|
- weight_decay: 0.01 |
|
- eval_strategy: epoch |
|
- save_strategy: epoch |
|
- metric_for_best_model: eval_exact_match_accuracy |
|
- load_best_model_at_end: True |
|
- dataloader_pin_memory: True |
|
- fp16: False |
|
- greater_is_better: True |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Exact Match Accuracy | |
|
|:-------------:|:-----:|:-----:|:---------------:|:--------------------:| |
|
| 0.0723 | 1.0 | 2297 | 0.0720 | 0.0 | |
|
| 0.0576 | 2.0 | 4594 | 0.0516 | 0.0 | |
|
| 0.0328 | 3.0 | 6891 | 0.0264 | 0.0839 | |
|
| 0.015 | 4.0 | 9188 | 0.0141 | 0.6907 | |
|
| 0.0086 | 5.0 | 11485 | 0.0092 | 0.8771 | |
|
| 0.0046 | 6.0 | 13782 | 0.0069 | 0.8929 | |
|
| 0.0027 | 7.0 | 16079 | 0.0061 | 0.9002 | |
|
| 0.0018 | 8.0 | 18376 | 0.0059 | 0.8936 | |
|
| 0.0012 | 9.0 | 20673 | 0.0056 | 0.8995 | |
|
| 0.0009 | 10.0 | 22970 | 0.0053 | 0.9075 | |
|
| 0.0007 | 11.0 | 25267 | 0.0055 | 0.9055 | |
|
| 0.0005 | 12.0 | 27564 | 0.0061 | 0.8976 | |
|
| 0.0004 | 13.0 | 29861 | 0.0057 | 0.9061 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.47.0 |
|
- Pytorch 2.5.1+cu124 |
|
- Datasets 3.2.0 |
|
- Tokenizers 0.21.0 |
|
|
|
## Improvement Perspectives |
|
|
|
To achieve better results, several improvement strategies could be explored: |
|
|
|
- **Model Capacity Expansion:** Test larger models like roberta-large or other bigger models. |
|
- **Batch Size Increase:** Use larger batches for more stable updates. |
|
- **Gradient accumulation steps parameter:** Play with the number of updates steps to accumulate the gradients for, before performing a backward/update pass. |
|
- **Learning Rate Management:** |
|
- Experiment with strategies like polynomial or others, with dynamic adjustment. |
|
- Further reduce the learning rate |
|
- **Enhanced Preprocessing:** |
|
- Test data augmentation techniques such as random masking or synonym replacement. |
|
- Further reduce the gap between the different categories. |
|
- Change the weights according to the representativeness of the category. |
|
- Use another dataset. |
|
- **Longer Training Duration:** Increase the number of epochs and refine stopping criteria for more precise convergence. |
|
- **Model Ensembling:** Use multiple models to improve prediction robustness. |
|
- **Advanced Attention Mechanisms:** Test models using hierarchical attention or enhanced multi-head architectures. |
|
- **Metric:** Choosing the best metric based on our problem. |
|
|
|
These strategies require significant computational resources and extended training time but offer substantial potential for performance improvement. |