Update README.md

dbf342c verified about 2 months ago

6.51 kB

	---
	library_name: transformers
	license: mit
	base_model: roberta-base
	tags:
	- generated_from_trainer
	model-index:
	- name: roberta-student-fine-tuned
	results: []
	language:
	- en
	metrics:
	- exact_match
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# roberta-student-fined-tunned

	This model is a fine-tuned version of roberta-base on a dataset provided by Kim Taeuk (김태욱), NLP teacher at Hanyang University.

	The model was trained for multi-intent detection using the BlendX dataset, focusing on complex utterances containing multiple intents.

	It achieves the following results on the evaluation set:
	- Loss: 0.0053
	- Exact Match Accuracy: 0.9075


	## Model description

	The model is based on roberta-base, a robust transformer model pretrained on a large corpus of English text.

	Fine-tuning was conducted on a specialized dataset focusing on multi-intent detection in utterances with complex intent structures.


	### Model Architecture

	- Base Model: roberta-base
	- Task: Multi-Intent Detection
	- Languages: English


	### Strengths

	High accuracy on evaluation data.

	Capable of detecting multiple intents within a single utterance.


	### Limitations

	Fine-tuned on a specific dataset; performance may vary on other tasks.

	Limited to English text.


	## Intended uses & limitations

	### Use Cases

	Multi-intent detection tasks such as customer service queries, virtual assistants, and dialogue systems.

	Academic research and educational projects.


	### Limitations

	May require additional fine-tuning for domain-specific applications.

	Not designed for multilingual tasks.


	## Training and evaluation data

	The model was trained on the BlendX dataset, a multi-intent detection benchmark focusing on realistic combinations of user intents in task-oriented dialogues.


	### Data Details:

	The dataset used for training this model is based on the BlendX dataset, focusing on multi-intent detection in task-oriented dialogues.
	While the actual BlendX dataset covers instances that can have varying number of intents between 1 to 3,
	the dataset for this assignment only includes instances where there are 2 intents for simplicity.


	## Dataset License and Source

	The dataset used for training this model is licensed under the [GNU General Public License v2](https://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html).

	### Important Notes:
	- Any use, distribution, or modification of this dataset must comply with the terms of the GPL v2 license.
	- The dataset source and its original license can be found in its [official GitHub repository](https://github.com/HYU-NLP/BlendX/).
	- Dataset File: [Download Here](https://huggingface.co/datasets/Meruem/BlendX_simplified/resolve/main/BlendX_simplified.json)


	### Dataset Format:
	- File Type: JSON
	- Size: 28,815 training samples, 1,513 validation samples
	- Data Fields:
	- `split` (string): Indicates if the sample belongs to the training or validation set.
	- `utterance` (string): The text input containing multiple intents.
	- `intent` (list of strings): The associated intents.


	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 32
	- eval_batch_size: 32
	- seed: 42
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: cosine_with_restarts
	- warmup_steps: 200
	- num_epochs: 20
	- save_total_limit: 3
	- weight_decay: 0.01
	- eval_strategy: epoch
	- save_strategy: epoch
	- metric_for_best_model: eval_exact_match_accuracy
	- load_best_model_at_end: True
	- dataloader_pin_memory: True
	- fp16: False
	- greater_is_better: True

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Exact Match Accuracy \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|:--------------------:\|
	\| 0.0723 \| 1.0 \| 2297 \| 0.0720 \| 0.0 \|
	\| 0.0576 \| 2.0 \| 4594 \| 0.0516 \| 0.0 \|
	\| 0.0328 \| 3.0 \| 6891 \| 0.0264 \| 0.0839 \|
	\| 0.015 \| 4.0 \| 9188 \| 0.0141 \| 0.6907 \|
	\| 0.0086 \| 5.0 \| 11485 \| 0.0092 \| 0.8771 \|
	\| 0.0046 \| 6.0 \| 13782 \| 0.0069 \| 0.8929 \|
	\| 0.0027 \| 7.0 \| 16079 \| 0.0061 \| 0.9002 \|
	\| 0.0018 \| 8.0 \| 18376 \| 0.0059 \| 0.8936 \|
	\| 0.0012 \| 9.0 \| 20673 \| 0.0056 \| 0.8995 \|
	\| 0.0009 \| 10.0 \| 22970 \| 0.0053 \| 0.9075 \|
	\| 0.0007 \| 11.0 \| 25267 \| 0.0055 \| 0.9055 \|
	\| 0.0005 \| 12.0 \| 27564 \| 0.0061 \| 0.8976 \|
	\| 0.0004 \| 13.0 \| 29861 \| 0.0057 \| 0.9061 \|


	### Framework versions

	- Transformers 4.47.0
	- Pytorch 2.5.1+cu124
	- Datasets 3.2.0
	- Tokenizers 0.21.0

	## Improvement Perspectives

	To achieve better results, several improvement strategies could be explored:

	- Model Capacity Expansion: Test larger models like roberta-large or other bigger models.
	- Batch Size Increase: Use larger batches for more stable updates.
	- Gradient accumulation steps parameter: Play with the number of updates steps to accumulate the gradients for, before performing a backward/update pass.
	- Learning Rate Management:
	- Experiment with strategies like polynomial or others, with dynamic adjustment.
	- Further reduce the learning rate
	- Enhanced Preprocessing:
	- Test data augmentation techniques such as random masking or synonym replacement.
	- Further reduce the gap between the different categories.
	- Change the weights according to the representativeness of the category.
	- Use another dataset.
	- Longer Training Duration: Increase the number of epochs and refine stopping criteria for more precise convergence.
	- Model Ensembling: Use multiple models to improve prediction robustness.
	- Advanced Attention Mechanisms: Test models using hierarchical attention or enhanced multi-head architectures.
	- Metric: Choosing the best metric based on our problem.

	These strategies require significant computational resources and extended training time but offer substantial potential for performance improvement.