File size: 4,507 Bytes
9404335
 
 
 
 
 
 
 
 
 
5ae3f63
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9404335
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
library_name: transformers
license: apache-2.0
base_model:
- microsoft/Phi-3-medium-128k-instruct
datasets:
- flammenai/FlameMix-DPO-v1
- flammenai/Grill-preprod-v1_chatML
- flammenai/Grill-preprod-v2_chatML
---
**Exllamav2** quant (**exl2** / **3.0 bpw**) made with ExLlamaV2 v0.0.21

Other EXL2 quants:
| **Quant** | **Model Size** | **lm_head** |
| ----- | ---------- | ------- |
|<center>**[2.2](https://huggingface.co/Zoyd/flammenai_Mahou-1.2-phi-14B-2_2bpw_exl2)**</center> | <center>4032 MB</center> | <center>6</center> |
|<center>**[2.5](https://huggingface.co/Zoyd/flammenai_Mahou-1.2-phi-14B-2_5bpw_exl2)**</center> | <center>4485 MB</center> | <center>6</center> |
|<center>**[3.0](https://huggingface.co/Zoyd/flammenai_Mahou-1.2-phi-14B-3_0bpw_exl2)**</center> | <center>5312 MB</center> | <center>6</center> |
|<center>**[3.5](https://huggingface.co/Zoyd/flammenai_Mahou-1.2-phi-14B-3_5bpw_exl2)**</center> | <center>6116 MB</center> | <center>6</center> |
|<center>**[3.75](https://huggingface.co/Zoyd/flammenai_Mahou-1.2-phi-14B-3_75bpw_exl2)**</center> | <center>6526 MB</center> | <center>6</center> |
|<center>**[4.0](https://huggingface.co/Zoyd/flammenai_Mahou-1.2-phi-14B-4_0bpw_exl2)**</center> | <center>6936 MB</center> | <center>6</center> |
|<center>**[4.25](https://huggingface.co/Zoyd/flammenai_Mahou-1.2-phi-14B-4_25bpw_exl2)**</center> | <center>7312 MB</center> | <center>6</center> |
|<center>**[5.0](https://huggingface.co/Zoyd/flammenai_Mahou-1.2-phi-14B-5_0bpw_exl2)**</center> | <center>8559 MB</center> | <center>6</center> |
|<center>**[6.0](https://huggingface.co/Zoyd/flammenai_Mahou-1.2-phi-14B-6_0bpw_exl2)**</center> | <center>10220 MB</center> | <center>8</center> |
|<center>**[6.5](https://huggingface.co/Zoyd/flammenai_Mahou-1.2-phi-14B-6_5bpw_exl2)**</center> | <center>11013 MB</center> | <center>8</center> |
|<center>**[8.0](https://huggingface.co/Zoyd/flammenai_Mahou-1.2-phi-14B-8_0bpw_exl2)**</center> | <center>12726 MB</center> | <center>8</center> |

![image/png](https://huggingface.co/flammenai/Mahou-1.0-mistral-7B/resolve/main/mahou1.png)

# Mahou-1.2-phi-14B

Please note: this is an untested, experimental release.

Mahou is our attempt to build a production-ready conversational/roleplay LLM.

Future versions will be released iteratively and finetuned from flammen.ai conversational data.

### Chat Format

This model has been trained to use ChatML format.

```
<|im_start|>system
{{system}}<|im_end|>
<|im_start|>{{char}}
{{message}}<|im_end|>
<|im_start|>{{user}}
{{message}}<|im_end|>
```

# Roleplay Format

- Speech without quotes.
- Actions in `*asterisks*`

```
*leans against wall cooly* so like, i just casted a super strong spell at magician academy today, not gonna lie, felt badass.
```

### ST Settings

1. Use ChatML for the Context Template.
2. Turn on Instruct Mode for ChatML.
3. Use the following stopping strings: `["<", "|", "<|", "\n"]`

### Method

Finetuned using an A100 on Google Colab.

[Fine-tune a Mistral-7b model with Direct Preference Optimization](https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac) - [Maxime Labonne](https://huggingface.co/mlabonne)

### Configuration

LoRA, model, and training settings:

```python
# LoRA configuration
peft_config = LoraConfig(
    r=16,
    lora_alpha=16,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
)

# Model to fine-tune
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    load_in_4bit=True
)
model.config.use_cache = False

# Reference model
ref_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    load_in_4bit=True
)

# Training arguments
training_args = TrainingArguments(
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    gradient_checkpointing=True,
    learning_rate=5e-5,
    lr_scheduler_type="cosine",
    max_steps=2000,
    save_strategy="no",
    logging_steps=1,
    output_dir=new_model,
    optim="paged_adamw_32bit",
    warmup_steps=100,
    bf16=True,
    report_to="wandb",
)

# Create DPO trainer
dpo_trainer = DPOTrainer(
    model,
    ref_model,
    args=training_args,
    train_dataset=dataset,
    tokenizer=tokenizer,
    peft_config=peft_config,
    beta=0.1,
    force_use_ref_model=True
)

# Fine-tune model with DPO
dpo_trainer.train()
```