Edit model card

image/png

Mahou-1.1-llama3-8B

Mahou is our attempt to build a production-ready conversational/roleplay LLM.

Future versions will be released iteratively and finetuned from flammen.ai conversational data.

Chat Format

This model has been trained to use ChatML format.

<|im_start|>system
{{system}}<|im_end|>
<|im_start|>{{char}}
{{message}}<|im_end|>
<|im_start|>{{user}}
{{message}}<|im_end|>

ST Settings

  1. Use ChatML for the Context Template.
  2. Turn on Instruct Mode for ChatML.
  3. Use the following stopping strings: ["<", "|", "<|", "\n"]

License

This model is based on Meta Llama-3-8B and is governed by the META LLAMA 3 COMMUNITY LICENSE AGREEMENT.

Method

Finetuned using an A100 on Google Colab.

Fine-tune a Mistral-7b model with Direct Preference Optimization - Maxime Labonne

Configuration

LoRA, model, and training settings:

# LoRA configuration
peft_config = LoraConfig(
    r=16,
    lora_alpha=16,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
)
# Model to fine-tune
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    load_in_4bit=True
)
model.config.use_cache = False
# Reference model
ref_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    load_in_4bit=True
)
# Training arguments
training_args = TrainingArguments(
    per_device_train_batch_size=2,
    gradient_accumulation_steps=2,
    gradient_checkpointing=True,
    learning_rate=3e-5,
    lr_scheduler_type="cosine",
    max_steps=420,
    save_strategy="no",
    logging_steps=1,
    output_dir=new_model,
    optim="paged_adamw_32bit",
    warmup_steps=100,
    bf16=True,
    report_to="wandb",
)
# Create DPO trainer
dpo_trainer = DPOTrainer(
    model,
    ref_model,
    args=training_args,
    train_dataset=dataset,
    tokenizer=tokenizer,
    peft_config=peft_config,
    beta=0.1,
    force_use_ref_model=True
)
Downloads last month
389
Safetensors
Model size
8.03B params
Tensor type
BF16
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Finetuned from

Dataset used to train flammenai/Mahou-1.1-llama3-8B

Space using flammenai/Mahou-1.1-llama3-8B 1

Collection including flammenai/Mahou-1.1-llama3-8B