--- license: cc-by-nc-4.0 base_model: mistralai/Mistral-7B-Instruct-v0.2 tags: - generated_from_trainer - classification - Transformer-heads - finetune - chatml - gpt4 - synthetic data - distillation model-index: - name: Mistral_classification_head_qlora results: [] datasets: - dair-ai/emotion language: - en library_name: transformers pipeline_tag: text-generation --- # Mistral_classification_head_qlora ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e09e72e43b9464c835735f/qna1wMB7CLTe7lfpRy5x3.png) **Mistral_classification_head_qlora** has a new transformer head attached to it for sequence classification task and then resulting model has been finetuned on [dair-ai/emotion](https://huggingface.co/datasets/dair-ai/emotion) dataset using QloRA. The model has been trained for 1 epoch on 1x A40 GPU. The evaluation loss for the **emotion-head-3** attached to it was **1.313**. The base model used was * **[mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)** This experiment was performed using **[Transformer-heads library](https://github.com/center-for-humans-and-machines/transformer-heads/tree/main)** ### Training Script The training script for attaching a new transformer head for classification task using QLoRA is following: [Training Script Colab](https://colab.research.google.com/drive/1rPaG-Q6d_CutPOlKzjsfmPvwebNg_X6i?usp=sharing) ### Evaluating the Emotion-Head-3 For evaluating the transformer head that has been attached to the base model, you can refer to the following colab notebook [Colab Notebook for Evaluation](https://colab.research.google.com/drive/15UpNnoKJIWjG3G_WJFOQebjpUWyNoPKT?usp=sharing) ### Training hyperparameters The following hyperparameters were used during training: train_epochs = 1 eval_epochs = 1 logging_steps = 1 train_batch_size = 4 eval_batch_size = 4 * output_dir="emotion_linear_probe", * learning_rate=0.00002, * num_train_epochs=train_epochs, * logging_steps=logging_steps, * do_eval=False, * remove_unused_columns=False, * optim="paged_adamw_32bit", * gradient_checkpointing=True, * lr_scheduler_type="constant", * ddp_find_unused_parameters=False, * per_device_train_batch_size=train_batch_size, * per_device_eval_batch_size=eval_batch_size, * report_to=["wandb"] ### Framework versions - Transformers 4.39.0.dev0 - Pytorch 2.1.2+cu118 - Datasets 2.17.0 - Tokenizers 0.15.0 - Transfomer-heads