nbeerbower commited on
Commit
8209f30
·
verified ·
1 Parent(s): ea9342c

Add model card with training configuration

Browse files
Files changed (1) hide show
  1. README.md +48 -0
README.md ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ pipeline_tag: text-generation
4
+ tags:
5
+ - merlina
6
+ - grimoire
7
+ - text-generation
8
+ - orpo
9
+ datasets:
10
+ - hemlang/Hemlock2-DPO
11
+ base_model:
12
+ - Qwen/Qwen2.5-Coder-7B-Instruct
13
+ ---
14
+
15
+ # Hemlock2-Coder-7B
16
+
17
+ ## Training Configuration
18
+
19
+ | Parameter | Value |
20
+ |-----------|-------|
21
+ | Training Mode | ORPO |
22
+ | Base Model | `Qwen/Qwen2.5-Coder-7B-Instruct` |
23
+ | Learning Rate | 9e-05 |
24
+ | Epochs | 2 |
25
+ | Batch Size | 2 |
26
+ | Gradient Accumulation | 8 |
27
+ | Effective Batch Size | 16 |
28
+ | Max Sequence Length | 2048 |
29
+ | Optimizer | paged_adamw_8bit |
30
+ | LR Scheduler | cosine |
31
+ | Warmup Ratio | 0.05 |
32
+ | Weight Decay | 0.01 |
33
+ | Max Grad Norm | 0.3 |
34
+ | Seed | 42 |
35
+ | Beta | 0.1 |
36
+ | Max Prompt Length | 1024 |
37
+ | LoRA Rank (r) | 128 |
38
+ | LoRA Alpha | 64 |
39
+ | LoRA Dropout | 0.05 |
40
+ | Target Modules | up_proj, down_proj, gate_proj, k_proj, q_proj, v_proj, o_proj |
41
+ | Quantization | 4-bit (NF4) |
42
+ | GPU | NVIDIA RTX A6000 |
43
+
44
+ ---
45
+
46
+ ![Trained with Merlina](https://raw.githubusercontent.com/Schneewolf-Labs/Merlina/refs/heads/main/frontend/madewithmerlina_smol.png)
47
+
48
+ [Merlina on GitHub](https://github.com/Schneewolf-Labs/Merlina)