flammen16-chinese-DPO-7B
A Mistral 7B LLM built from merging pretrained models and finetuning on Wenbo Pan's Chinese DPO Pairs. Flammen specializes in exceptional character roleplay, creative writing, and general intelligence. Please note this is an experimental model and is not recommended for production use.
我是一款基于混合预训练模型并在温博潘的中文DPO对话双方数据上微调的缅德尔7B大语言模型(LLM)。它的特长在于出色的角色扮演、创造性写作和通用智能。请注意,这是一个实验性模型,不适宜生产使用。
Method
Finetuned using an A100 on Google Colab. 🙏
Fine-tune a Mistral-7b model with Direct Preference Optimization - Maxime Labonne
Configuration
LoRA, model, and training settings:
# LoRA configuration
peft_config = LoraConfig(
r=16,
lora_alpha=16,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
)
# Model to fine-tune
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
load_in_4bit=True
)
model.config.use_cache = False
# Reference model
ref_model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
load_in_4bit=True
)
# Training arguments
training_args = TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=2,
gradient_checkpointing=True,
learning_rate=2e-5,
lr_scheduler_type="cosine",
max_steps=1000,
save_strategy="no",
logging_steps=1,
output_dir=new_model,
optim="paged_adamw_32bit",
warmup_steps=100,
bf16=True,
report_to="wandb",
)
# Create DPO trainer
dpo_trainer = DPOTrainer(
model,
ref_model,
args=training_args,
train_dataset=dataset,
tokenizer=tokenizer,
peft_config=peft_config,
beta=0.1,
max_prompt_length=1024,
max_length=1536,
force_use_ref_model=True
)
# Fine-tune model with DPO
dpo_trainer.train()
- Downloads last month
- 13
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for nbeerbower/flammen16-chinese-dpo-mistral-7B
Base model
flammenai/flammen15-mistral-7B
Finetuned
flammenai/flammen15-gutenberg-DPO-v1-7B
Finetuned
flammenai/flammen15X-mistral-7B
Finetuned
flammenai/flammen16-mistral-7B