README.md · Philipp-Sc/mistral-7b-reverse-instruct at 346e4be5066a0d7731272b2287c7291274d41641

metadata

license: apache-2.0
datasets:
  - pankajmathur/WizardLM_Orca
language:
  - en
pipeline_tag: text-generation

base_model: mistralai/Mistral-7B-v0.1 (=checkpoint-v1)

base_model: mistralai/Mistral-7B-v0.2 (>=checkpoint-v2)

Reverse Instruct LoRa Adapter

This LoRa Adapter is fine tuned to reverse engineer the original prompt of a given LLM output/response.

Response Format

"[INST]\n### System:\n{system}\n### Instruction:\n{instruction}\n[/INST]\n"

(without the "")

Prompt Template

"\n### System:\nYou craft instructions for generating the given output through reverse engineering.\n### Instruction:\nDecipher the steps used to produce the given output and articulate a refined set of instructions (System & Instruction).\n### OUTPUT:\n {output}"

(use the template without the " ")

Training Dataset

About 21k items of the following datasets were used. (mostly coding-like tasks were removed)

wget https://raw.githubusercontent.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/main/data/alpaca_gpt4_data.json
wget https://raw.githubusercontent.com/teknium1/GPTeacher/main/Roleplay%20Supplemental/roleplay-instruct-v2.1.json
wget https://huggingface.co/datasets/pankajmathur/WizardLM_Orca/resolve/main/wizardlm_orca.json

Training Procedure

CUDA_VISIBLE_DEVICES=0 WANDB_DISABLED=True python LLaMA-Factory/src/train_bash.py \
    --stage sft \
    --model_name_or_path model_name_or_path \
    --checkpoint_dir checkpoint_dir \
    --do_train \
    --dataset default \
    --template vanilla \
    --finetuning_type lora \
    --lora_target q_proj,v_proj \
    --output_dir path_to_sft_checkpoint \
    --overwrite_cache \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 1 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 100 \
    --learning_rate 5e-5 \
    --num_train_epochs 3.0 \
    --plot_loss \
    --fp16 \
    --overwrite_output_dir \
    --cutoff_len 2048 \
    --quantization_bit 4

Training Time

v1: ~12h on Kaggle's P100 GPU
v2: >30h on Kaggle's T4 x2

Framework versions

LLaMA-Factory