File size: 4,508 Bytes
37356eb 41485a1 37356eb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
---
library_name: transformers
license: apache-2.0
base_model:
- microsoft/Phi-3-medium-128k-instruct
datasets:
- flammenai/FlameMix-DPO-v1
- flammenai/Grill-preprod-v1_chatML
- flammenai/Grill-preprod-v2_chatML
---
**Exllamav2** quant (**exl2** / **3.75 bpw**) made with ExLlamaV2 v0.0.21
Other EXL2 quants:
| **Quant** | **Model Size** | **lm_head** |
| ----- | ---------- | ------- |
|<center>**[2.2](https://huggingface.co/Zoyd/flammenai_Mahou-1.2-phi-14B-2_2bpw_exl2)**</center> | <center>4032 MB</center> | <center>6</center> |
|<center>**[2.5](https://huggingface.co/Zoyd/flammenai_Mahou-1.2-phi-14B-2_5bpw_exl2)**</center> | <center>4485 MB</center> | <center>6</center> |
|<center>**[3.0](https://huggingface.co/Zoyd/flammenai_Mahou-1.2-phi-14B-3_0bpw_exl2)**</center> | <center>5312 MB</center> | <center>6</center> |
|<center>**[3.5](https://huggingface.co/Zoyd/flammenai_Mahou-1.2-phi-14B-3_5bpw_exl2)**</center> | <center>6116 MB</center> | <center>6</center> |
|<center>**[3.75](https://huggingface.co/Zoyd/flammenai_Mahou-1.2-phi-14B-3_75bpw_exl2)**</center> | <center>6526 MB</center> | <center>6</center> |
|<center>**[4.0](https://huggingface.co/Zoyd/flammenai_Mahou-1.2-phi-14B-4_0bpw_exl2)**</center> | <center>6936 MB</center> | <center>6</center> |
|<center>**[4.25](https://huggingface.co/Zoyd/flammenai_Mahou-1.2-phi-14B-4_25bpw_exl2)**</center> | <center>7312 MB</center> | <center>6</center> |
|<center>**[5.0](https://huggingface.co/Zoyd/flammenai_Mahou-1.2-phi-14B-5_0bpw_exl2)**</center> | <center>8559 MB</center> | <center>6</center> |
|<center>**[6.0](https://huggingface.co/Zoyd/flammenai_Mahou-1.2-phi-14B-6_0bpw_exl2)**</center> | <center>10220 MB</center> | <center>8</center> |
|<center>**[6.5](https://huggingface.co/Zoyd/flammenai_Mahou-1.2-phi-14B-6_5bpw_exl2)**</center> | <center>11013 MB</center> | <center>8</center> |
|<center>**[8.0](https://huggingface.co/Zoyd/flammenai_Mahou-1.2-phi-14B-8_0bpw_exl2)**</center> | <center>12726 MB</center> | <center>8</center> |
![image/png](https://huggingface.co/flammenai/Mahou-1.0-mistral-7B/resolve/main/mahou1.png)
# Mahou-1.2-phi-14B
Please note: this is an untested, experimental release.
Mahou is our attempt to build a production-ready conversational/roleplay LLM.
Future versions will be released iteratively and finetuned from flammen.ai conversational data.
### Chat Format
This model has been trained to use ChatML format.
```
<|im_start|>system
{{system}}<|im_end|>
<|im_start|>{{char}}
{{message}}<|im_end|>
<|im_start|>{{user}}
{{message}}<|im_end|>
```
# Roleplay Format
- Speech without quotes.
- Actions in `*asterisks*`
```
*leans against wall cooly* so like, i just casted a super strong spell at magician academy today, not gonna lie, felt badass.
```
### ST Settings
1. Use ChatML for the Context Template.
2. Turn on Instruct Mode for ChatML.
3. Use the following stopping strings: `["<", "|", "<|", "\n"]`
### Method
Finetuned using an A100 on Google Colab.
[Fine-tune a Mistral-7b model with Direct Preference Optimization](https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac) - [Maxime Labonne](https://huggingface.co/mlabonne)
### Configuration
LoRA, model, and training settings:
```python
# LoRA configuration
peft_config = LoraConfig(
r=16,
lora_alpha=16,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
)
# Model to fine-tune
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
load_in_4bit=True
)
model.config.use_cache = False
# Reference model
ref_model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
load_in_4bit=True
)
# Training arguments
training_args = TrainingArguments(
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
gradient_checkpointing=True,
learning_rate=5e-5,
lr_scheduler_type="cosine",
max_steps=2000,
save_strategy="no",
logging_steps=1,
output_dir=new_model,
optim="paged_adamw_32bit",
warmup_steps=100,
bf16=True,
report_to="wandb",
)
# Create DPO trainer
dpo_trainer = DPOTrainer(
model,
ref_model,
args=training_args,
train_dataset=dataset,
tokenizer=tokenizer,
peft_config=peft_config,
beta=0.1,
force_use_ref_model=True
)
# Fine-tune model with DPO
dpo_trainer.train()
``` |