README.md · Philipp-Sc/mistral-7b-reverse-instruct at 93415ec63cedc2156c8abb4b53ecb8f03eacb9be

metadata

license: apache-2.0
datasets:
  - pankajmathur/WizardLM_Orca
  - teknium/trismegistus-project
  - unalignment/toxic-dpo-v0.1
  - Intel/orca_dpo_pairs
language:
  - en
pipeline_tag: text-generation

Mistral 7b Reverse Instruct

This model is sft (LoRA) fine tuned to reverse engineer the original prompt of a given LLM output/response.
Use Case: The generation of synthetic instruct datasets for developing chatbots and domain specific fine tuning (e.g. "Summarization" & "Roleplay"). It is useful for labelling unlabeled datasets.

base_model: mistralai/Mistral-7B-v0.1 (=checkpoint-v1)
base_model: mistralai/Mistral-7B-v0.2 (>=checkpoint-v2)

For convinience the latest model export is provided under /latest_model_export as well as gguf quantized versions under /latest_ggml_models

Response Format

"[INST]\n### System:\n{system}\n### Instruction:\n{instruction}\n[/INST]\n"

Grammar File: inst_format.gbnf

Prompt Template

"\n### System:\nYou craft instructions for generating the given output through reverse engineering.\n### Instruction:\nDecipher the steps used to produce the given output and articulate a refined set of instructions (System & Instruction).\n### OUTPUT:\n {output}"

(use the template without the " ")

Training Dataset

About 21k items of the following datasets were used. (mostly coding-like tasks were removed)

v1 & v2: reverse-instruct_v1.json
v3: reverse-instruct_v2.json

The reverse instruct dataset has been compiled with entries from the following datasets:

Training Procedure

!cd LLaMA-Factory && WANDB_DISABLED=True PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:256 accelerate launch \
    --multi_gpu \
    --mixed_precision fp16 \
    --num_processes 2 \
    --num_machines 1 \
    --rdzv_backend static \
    --same_network \
    --gpu_ids all \
    --machine_rank 0 \
    --main_training_function main \
    --  src/train_bash.py  \
    --stage sft \
    --model_name_or_path mistralai/Mistral-7B-Instruct-v0.2 \
    --adapter_name_or_path path_to_checkpoint \
    --flash_attn \
    --neftune_noise_alpha 5 \
    --do_train \
    --dataset default \
    --template vanilla \
    --finetuning_type lora \
    --lora_target q_proj,v_proj \
    --output_dir path_to_sft_checkpoint \
    --overwrite_cache \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 1 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 10 \
    --save_total_limit 3 \
    --learning_rate 5e-5 \
    --num_train_epochs 9.0 \
    --plot_loss \
    --fp16 \
    --overwrite_output_dir \
    --cutoff_len 4096 \
    --quantization_bit 4

Training Time

v1: ~12h on Kaggle's P100 GPU
v2: >30h on Kaggle's T4 x2
v3: >40h on Kaggle's T4 x2

Loss

{'loss': 0.4424, 'learning_rate': 4.8398000023144565e-05, 'epoch': 1.03}

Framework versions

LLaMA-Factory