File size: 3,025 Bytes
319611e 1e45d6b 319611e 1e45d6b 319611e 1e45d6b 319611e 1e45d6b 319611e 1e45d6b 319611e 1e45d6b 319611e 1e45d6b 319611e 56e1665 319611e 56e1665 319611e 56e1665 319611e 56e1665 319611e 1e45d6b 319611e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
---
license: mit
datasets:
- GAIR/LIMO
language:
- en
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
tags:
- R1
- DeepSeek
- Distill
- Qwen
- 7B
- LIMO
---
# LIMO-R1-Distill-Qwen-7B
Using [deepseek-ai/DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) as base model.
Fine-tuned on [GAIR/LIMO](https://huggingface.co/GAIR/LIMO).
Trained using LLaMA-Factory with the config:
```
max_seq_length = 6*1024
lora_rank = 128
lora_alpha = lora_rank
lora_target = "all"
args = dict(
stage="sft",
do_train=True,
model_name_or_path="unsloth/DeepSeek-R1-Distill-Qwen-7B-bnb-4bit",
dataset="limo_restructured",
template="custom_template",
finetuning_type="lora",
lora_target=lora_target,
output_dir="qwen_distill_7b_lora",
per_device_train_batch_size=1,
gradient_accumulation_steps=4,
lr_scheduler_type="cosine",
logging_steps=1,
warmup_ratio=0.05,
learning_rate=1e-4,
num_train_epochs=1.0,
max_grad_norm=0.25,
loraplus_lr_ratio=16.0,
fp16=True,
report_to="none",
preprocessing_num_workers=16,
cutoff_len=max_seq_length,
optim="paged_adamw_8bit"
)
```
System used:
```
'Please reason step by step inside the <think> and </think> tags, and put your final answer within \\boxed{}.'
```
Custom template used in training:
```
register_template(
name="custom_template",
format_user=StringFormatter(
slots=["<|User|>{{content}}<|Assistant|>"]
),
format_assistant=StringFormatter(
slots=["{{content}}<|end▁of▁sentence|>"]
),
format_system=StringFormatter(
slots=["<|begin▁of▁sentence|>{{content}}"]
),
format_function=FunctionFormatter(
slots=[
"<|Assistant|><|tool▁calls▁begin|><|tool▁call▁begin|>{{type}}<|tool▁sep|>{{name}}\n```json\n{{arguments}}\n```<|tool▁call▁end|><|tool▁calls▁end|><|end▁of▁sentence|>"
],
tool_format="qwen"
),
format_observation=StringFormatter(
slots=[
"<|tool▁outputs▁begin|><|tool▁output_begin|>{{content}}<|tool▁output▁end|><|tool▁outputs▁end|>"
]
),
format_tools=ToolFormatter(tool_format="qwen"),
default_system="Please reason step by step inside the tags <think> and </think>, and put your final answer within \\boxed{}.",
stop_words=["<|end▁of▁sentence|>"]
)
```
Every entry in the dataset starts with `<think>` and end its reasoning with `</think>`.
In the dataset for variation, I randomly replaced the start of the string "Okay," with one of the following:
```
starts = [
"Alright,",
"Well,",
"So,",
"Hmm,",
"Okay then,",
"Right,",
"Let's see,",
"Now,",
"Alrighty,",
"Thinking about it,",
"You know,",
"Well then,",
"Come to think of it,",
"Actually,",
"Now that I think about it,",
"Good question,",
"Let me think,",
"Let's see now,",
"Interesting,",
"Now then,"
]
```
|