Lewdiculous commited on
Commit
096da7f
1 Parent(s): f17b8c2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +82 -1
README.md CHANGED
@@ -4,4 +4,85 @@ tags:
4
  - mistral
5
  - roleplay
6
  ---
7
- GGUF-IQ-Imatrix quants for [flammenai/flammen18X-mistral-7B](https://huggingface.co/flammenai/flammen18X-mistral-7B).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - mistral
5
  - roleplay
6
  ---
7
+
8
+ Personal testing GGUF-IQ-Imatrix quants for [flammenai/flammen18X-mistral-7B](https://huggingface.co/flammenai/flammen18X-mistral-7B).
9
+
10
+ # Original model card information:
11
+
12
+ ![image/png](https://huggingface.co/nbeerbower/flammen13X-mistral-7B/resolve/main/flammen13x.png)
13
+
14
+ # flammen18X-mistral-7B
15
+
16
+ A Mistral 7B LLM built from merging pretrained models and finetuning on [ResplendentAI/NSFW_RP_Format_DPO](https://huggingface.co/datasets/ResplendentAI/NSFW_RP_Format_DPO).
17
+ Flammen specializes in exceptional character roleplay, creative writing, and general intelligence
18
+
19
+ ### Method
20
+
21
+ Finetuned using an A100 on Google Colab.
22
+
23
+ [Fine-tune a Mistral-7b model with Direct Preference Optimization](https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac) - [Maxime Labonne](https://huggingface.co/mlabonne)
24
+
25
+ ### Configuration
26
+
27
+ LoRA, model, and training settings:
28
+
29
+ ```python
30
+ # LoRA configuration
31
+ peft_config = LoraConfig(
32
+ r=16,
33
+ lora_alpha=16,
34
+ lora_dropout=0.05,
35
+ bias="none",
36
+ task_type="CAUSAL_LM",
37
+ target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
38
+ )
39
+
40
+ # Model to fine-tune
41
+ model = AutoModelForCausalLM.from_pretrained(
42
+ model_name,
43
+ torch_dtype=torch.bfloat16,
44
+ load_in_4bit=True
45
+ )
46
+ model.config.use_cache = False
47
+
48
+ # Reference model
49
+ ref_model = AutoModelForCausalLM.from_pretrained(
50
+ model_name,
51
+ torch_dtype=torch.bfloat16,
52
+ load_in_4bit=True
53
+ )
54
+
55
+ # Training arguments
56
+ training_args = TrainingArguments(
57
+ per_device_train_batch_size=2,
58
+ gradient_accumulation_steps=8,
59
+ gradient_checkpointing=True,
60
+ learning_rate=5e-5,
61
+ lr_scheduler_type="cosine",
62
+ max_steps=420,
63
+ save_strategy="no",
64
+ logging_steps=1,
65
+ output_dir=new_model,
66
+ optim="paged_adamw_32bit",
67
+ warmup_steps=100,
68
+ bf16=True,
69
+ report_to="wandb",
70
+ )
71
+
72
+ # Create DPO trainer
73
+ dpo_trainer = DPOTrainer(
74
+ model,
75
+ ref_model,
76
+ args=training_args,
77
+ train_dataset=dataset,
78
+ tokenizer=tokenizer,
79
+ peft_config=peft_config,
80
+ beta=0.1,
81
+ max_prompt_length=1024,
82
+ max_length=1536,
83
+ force_use_ref_model=True
84
+ )
85
+
86
+ # Fine-tune model with DPO
87
+ dpo_trainer.train()
88
+ ```