Syed-Hasan-8503's picture
Update README.md
744587a verified
metadata
license: cc-by-nc-4.0
base_model: mistralai/Mistral-7B-Instruct-v0.2
tags:
  - generated_from_trainer
  - classification
  - Transformer-heads
  - finetune
  - chatml
  - gpt4
  - synthetic data
  - distillation
model-index:
  - name: Mistral_classification_head_qlora
    results: []
datasets:
  - dair-ai/emotion
language:
  - en
library_name: transformers
pipeline_tag: text-generation

Mistral_classification_head_qlora

image/png

Mistral_classification_head_qlora has a new transformer head attached to it for sequence classification task and then resulting model has been finetuned on dair-ai/emotion dataset using QloRA. The model has been trained for 1 epoch on 1x A40 GPU. The evaluation loss for the emotion-head-3 attached to it was 1.313. The base model used was

This experiment was performed using Transformer-heads library

Training Script

The training script for attaching a new transformer head for classification task using QLoRA is following:

Training Script Colab

Evaluating the Emotion-Head-3

For evaluating the transformer head that has been attached to the base model, you can refer to the following colab notebook Colab Notebook for Evaluation

Training hyperparameters

The following hyperparameters were used during training:

train_epochs = 1 eval_epochs = 1 logging_steps = 1 train_batch_size = 4 eval_batch_size = 4

  • output_dir="emotion_linear_probe",
  • learning_rate=0.00002,
  • num_train_epochs=train_epochs,
  • logging_steps=logging_steps,
  • do_eval=False,
  • remove_unused_columns=False,
  • optim="paged_adamw_32bit",
  • gradient_checkpointing=True,
  • lr_scheduler_type="constant",
  • ddp_find_unused_parameters=False,
  • per_device_train_batch_size=train_batch_size,
  • per_device_eval_batch_size=eval_batch_size,
  • report_to=["wandb"]

Framework versions

  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu118
  • Datasets 2.17.0
  • Tokenizers 0.15.0
  • Transfomer-heads