File size: 4,477 Bytes
5960177 1036c1d 5960177 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 |
---
library_name: transformers
license: llama3
base_model:
- nbeerbower/llama-3-Stheno-Mahou-8B
datasets:
- flammenai/FlameMix-DPO-v1
- flammenai/Grill-preprod-v1_chatML
- flammenai/Grill-preprod-v2_chatML
---
**Exllamav2** quant (**exl2** / **2.5 bpw**) made with ExLlamaV2 v0.0.21
Other EXL2 quants:
| **Quant** | **Model Size** | **lm_head** |
| ----- | ---------- | ------- |
|<center>**[2.2](https://huggingface.co/Zoyd/flammenai_Mahou-1.2a-llama3-8B-2_2bpw_exl2)**</center> | <center>3250 MB</center> | <center>6</center> |
|<center>**[2.5](https://huggingface.co/Zoyd/flammenai_Mahou-1.2a-llama3-8B-2_5bpw_exl2)**</center> | <center>3479 MB</center> | <center>6</center> |
|<center>**[3.0](https://huggingface.co/Zoyd/flammenai_Mahou-1.2a-llama3-8B-3_0bpw_exl2)**</center> | <center>3893 MB</center> | <center>6</center> |
|<center>**[3.5](https://huggingface.co/Zoyd/flammenai_Mahou-1.2a-llama3-8B-3_5bpw_exl2)**</center> | <center>4311 MB</center> | <center>6</center> |
|<center>**[3.75](https://huggingface.co/Zoyd/flammenai_Mahou-1.2a-llama3-8B-3_75bpw_exl2)**</center> | <center>4518 MB</center> | <center>6</center> |
|<center>**[4.0](https://huggingface.co/Zoyd/flammenai_Mahou-1.2a-llama3-8B-4_0bpw_exl2)**</center> | <center>4727 MB</center> | <center>6</center> |
|<center>**[4.25](https://huggingface.co/Zoyd/flammenai_Mahou-1.2a-llama3-8B-4_25bpw_exl2)**</center> | <center>4935 MB</center> | <center>6</center> |
|<center>**[5.0](https://huggingface.co/Zoyd/flammenai_Mahou-1.2a-llama3-8B-5_0bpw_exl2)**</center> | <center>5557 MB</center> | <center>6</center> |
|<center>**[6.0](https://huggingface.co/Zoyd/flammenai_Mahou-1.2a-llama3-8B-6_0bpw_exl2)**</center> | <center>6496 MB</center> | <center>8</center> |
|<center>**[6.5](https://huggingface.co/Zoyd/flammenai_Mahou-1.2a-llama3-8B-6_5bpw_exl2)**</center> | <center>6902 MB</center> | <center>8</center> |
|<center>**[8.0](https://huggingface.co/Zoyd/flammenai_Mahou-1.2a-llama3-8B-8_0bpw_exl2)**</center> | <center>8131 MB</center> | <center>8</center> |
![image/png](https://huggingface.co/flammenai/Mahou-1.0-mistral-7B/resolve/main/mahou1.png)
# Mahou-1.2a-llama3-8B
Mahou is our attempt to build a production-ready conversational/roleplay LLM.
Future versions will be released iteratively and finetuned from flammen.ai conversational data.
### Chat Format
This model has been trained to use ChatML format.
```
<|im_start|>system
{{system}}<|im_end|>
<|im_start|>{{char}}
{{message}}<|im_end|>
<|im_start|>{{user}}
{{message}}<|im_end|>
```
# Roleplay Format
- Speech without quotes.
- Actions in `*asterisks*`
```
*leans against wall cooly* so like, i just casted a super strong spell at magician academy today, not gonna lie, felt badass.
```
### ST Settings
1. Use ChatML for the Context Template.
2. Turn on Instruct Mode for ChatML.
3. Use the following stopping strings: `["<", "|", "<|", "\n"]`
### Method
Finetuned using an A100 on Google Colab.
[Fine-tune a Mistral-7b model with Direct Preference Optimization](https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac) - [Maxime Labonne](https://huggingface.co/mlabonne)
### Configuration
LoRA, model, and training settings:
```python
# LoRA configuration
peft_config = LoraConfig(
r=16,
lora_alpha=16,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
)
# Model to fine-tune
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
load_in_4bit=True
)
model.config.use_cache = False
# Reference model
ref_model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
load_in_4bit=True
)
# Training arguments
training_args = TrainingArguments(
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
gradient_checkpointing=True,
learning_rate=5e-5,
lr_scheduler_type="cosine",
max_steps=2000,
save_strategy="no",
logging_steps=1,
output_dir=new_model,
optim="paged_adamw_32bit",
warmup_steps=100,
bf16=True,
report_to="wandb",
)
# Create DPO trainer
dpo_trainer = DPOTrainer(
model,
ref_model,
args=training_args,
train_dataset=dataset,
tokenizer=tokenizer,
peft_config=peft_config,
beta=0.1,
force_use_ref_model=True
)
# Fine-tune model with DPO
dpo_trainer.train()
``` |