Edit model card

Credits: Maxime Labonne https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac

(With minor alterations)

NeuralHermes 2.5 - Mistral 7B

NeuralHermes is based on the teknium/OpenHermes-2.5-Mistral-7B model that has been further fine-tuned with Direct Preference Optimization (DPO) using the Intel/orca_dpo_pairs dataset. .

Usage

You can run this model using the following code:

import transformers
from transformers import AutoTokenizer

# Format prompt
message = [
    {"role": "system", "content": "You are a helpful assistant chatbot."},
    {"role": "user", "content": "What is a Large Language Model?"}
]
tokenizer = AutoTokenizer.from_pretrained(new_model)
prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)

# Create pipeline
pipeline = transformers.pipeline(
    "text-generation",
    model=new_model,
    tokenizer=tokenizer
)

# Generate text
sequences = pipeline(
    prompt,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    num_return_sequences=1,
    max_length=200,
)
print(sequences[0]['generated_text'])

Training hyperparameters

LoRA:

  • r=16
  • lora_alpha=16
  • lora_dropout=0.05
  • bias="none"
  • task_type="CAUSAL_LM"
  • target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']

Training arguments:

  • per_device_train_batch_size=2 # Changed from 4
  • gradient_accumulation_steps=4
  • gradient_checkpointing=True
  • learning_rate=2e-5 # Changed from 5e-5
  • lr_scheduler_type="cosine"
  • max_steps=250 # Changed from 200
  • optim="paged_adamw_32bit"
  • warmup_steps=100

DPOTrainer:

  • beta=0.1
  • max_prompt_length=1024
  • max_length=1536
Downloads last month
7
Safetensors
Model size
7.24B params
Tensor type
FP16
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Finetuned from

Dataset used to train delayedkarma/NeuralHermes-2.5-Mistral-7B