File size: 3,025 Bytes
319611e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1e45d6b
 
 
319611e
 
 
 
 
 
 
 
 
 
 
1e45d6b
319611e
 
1e45d6b
319611e
 
1e45d6b
319611e
 
 
 
 
1e45d6b
319611e
1e45d6b
319611e
 
 
 
1e45d6b
319611e
 
 
 
 
 
 
56e1665
319611e
 
56e1665
319611e
 
56e1665
319611e
 
 
 
 
 
 
 
 
 
 
 
 
56e1665
319611e
 
 
 
1e45d6b
 
319611e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
---
license: mit
datasets:
- GAIR/LIMO
language:
- en
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
tags:
- R1
- DeepSeek
- Distill
- Qwen
- 7B
- LIMO
---
# LIMO-R1-Distill-Qwen-7B
Using [deepseek-ai/DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) as base model.

Fine-tuned on [GAIR/LIMO](https://huggingface.co/GAIR/LIMO).

Trained using LLaMA-Factory with the config: 
```
max_seq_length = 6*1024

lora_rank = 128
lora_alpha = lora_rank
lora_target = "all"

args = dict(
  stage="sft",
  do_train=True,
  model_name_or_path="unsloth/DeepSeek-R1-Distill-Qwen-7B-bnb-4bit",
  dataset="limo_restructured",
  template="custom_template",
  finetuning_type="lora",
  lora_target=lora_target,
  output_dir="qwen_distill_7b_lora",
  per_device_train_batch_size=1,
  gradient_accumulation_steps=4,
  lr_scheduler_type="cosine",
  logging_steps=1,
  warmup_ratio=0.05,
  learning_rate=1e-4,
  num_train_epochs=1.0,
  max_grad_norm=0.25,
  loraplus_lr_ratio=16.0,
  fp16=True,
  report_to="none",
  preprocessing_num_workers=16,
  cutoff_len=max_seq_length,
  optim="paged_adamw_8bit"
)

```

System used:
```
'Please reason step by step inside the <think> and </think> tags, and put your final answer within \\boxed{}.'
```

Custom template used in training:
```
register_template(
    name="custom_template",
    format_user=StringFormatter(
        slots=["<|User|>{{content}}<|Assistant|>"]
    ),
    format_assistant=StringFormatter(
        slots=["{{content}}<|end▁of▁sentence|>"]
    ),
    format_system=StringFormatter(
        slots=["<|begin▁of▁sentence|>{{content}}"]
    ),
    format_function=FunctionFormatter(
        slots=[
            "<|Assistant|><|tool▁calls▁begin|><|tool▁call▁begin|>{{type}}<|tool▁sep|>{{name}}\n```json\n{{arguments}}\n```<|tool▁call▁end|><|tool▁calls▁end|><|end▁of▁sentence|>"
        ],
        tool_format="qwen"
    ),
    format_observation=StringFormatter(
        slots=[
            "<|tool▁outputs▁begin|><|tool▁output_begin|>{{content}}<|tool▁output▁end|><|tool▁outputs▁end|>"
        ]
    ),
    format_tools=ToolFormatter(tool_format="qwen"),
    default_system="Please reason step by step inside the tags <think> and </think>, and put your final answer within \\boxed{}.",
    stop_words=["<|end▁of▁sentence|>"]
)
```

Every entry in the dataset starts with `<think>` and end its reasoning with `</think>`.

In the dataset for variation, I randomly replaced the start of the string "Okay," with one of the following:
```
starts = [
    "Alright,",
    "Well,",
    "So,",
    "Hmm,",
    "Okay then,",
    "Right,",
    "Let's see,",
    "Now,",
    "Alrighty,",
    "Thinking about it,",
    "You know,",
    "Well then,",
    "Come to think of it,",
    "Actually,",
    "Now that I think about it,",
    "Good question,",
    "Let me think,",
    "Let's see now,",
    "Interesting,",
    "Now then,"
]
```