Update README.md
Browse files
README.md
CHANGED
@@ -10,8 +10,6 @@ base_model:
|
|
10 |
|
11 |
MicroThinker-1B-Preview, a new model fine-tuned from the [huihui-ai/Llama-3.2-3B-Instruct-abliterated](https://huggingface.co/huihui-ai/Llama-3.2-3B-Instruct-abliterated) model.
|
12 |
|
13 |
-
The model is still being fine-tuned, and it will be ready very soon.
|
14 |
-
|
15 |
## Training Details
|
16 |
|
17 |
This is just a test, but the performance is quite good. Now, I'll introduce the test environment.
|
@@ -47,7 +45,7 @@ huggingface-cli download --repo-type dataset huihui-ai/LONGCOT-Refine-500K --lo
|
|
47 |
3. Used only the huihui-ai/QWQ-LONGCOT-500K dataset (#20000), Trained for 1 epoch:
|
48 |
|
49 |
```
|
50 |
-
swift sft --model huihui-ai/Llama-3.2-1B-Instruct-abliterated --model_type llama3_2 --train_type lora --dataset "data/qwq_500k.jsonl#20000" --torch_dtype bfloat16 --num_train_epochs 1 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --learning_rate 1e-4 --lora_rank 8 --lora_alpha 32 --target_modules all-linear --gradient_accumulation_steps 16 --eval_steps 50 --save_steps 50 --save_total_limit 2 --logging_steps 5 --max_length 16384 --output_dir output/Llama-3.2-1B-Instruct-abliterated/lora/sft --system "You are a helpful assistant. You should think step-by-step." --warmup_ratio 0.05 --dataloader_num_workers 4 --model_author "huihui-ai" --model_name "huihui-ai-robot"
|
51 |
```
|
52 |
|
53 |
|
@@ -71,7 +69,7 @@ swift infer --model huihui/checkpoint-1237-merged --stream true --infer_backend
|
|
71 |
6. Combined training with huihui-ai/QWQ-LONGCOT-500K (#20000) and huihui-ai/LONGCOT-Refine datasets (#20000), Trained for 1 epoch:
|
72 |
|
73 |
```
|
74 |
-
swift sft --model huihui-ai/checkpoint-1237-merged --model_type llama3_2 --train_type lora --dataset "data/qwq_500k.jsonl#20000" "data/refine_from_qwen2_5.jsonl#20000" --torch_dtype bfloat16 --num_train_epochs 1 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --learning_rate 1e-4 --lora_rank 8 --lora_alpha 32 --target_modules all-linear --gradient_accumulation_steps 16 --eval_steps 50 --save_steps 50 --save_total_limit 2 --logging_steps 5 --max_length 16384 --output_dir output/Llama-3.2-1B-Instruct-abliterated/lora/sft2 --system "You are a helpful assistant. You should think step-by-step." --warmup_ratio 0.05 --dataloader_num_workers 4 --model_author "huihui-ai" --model_name "huihui-ai-robot"
|
75 |
```
|
76 |
|
77 |
|
|
|
10 |
|
11 |
MicroThinker-1B-Preview, a new model fine-tuned from the [huihui-ai/Llama-3.2-3B-Instruct-abliterated](https://huggingface.co/huihui-ai/Llama-3.2-3B-Instruct-abliterated) model.
|
12 |
|
|
|
|
|
13 |
## Training Details
|
14 |
|
15 |
This is just a test, but the performance is quite good. Now, I'll introduce the test environment.
|
|
|
45 |
3. Used only the huihui-ai/QWQ-LONGCOT-500K dataset (#20000), Trained for 1 epoch:
|
46 |
|
47 |
```
|
48 |
+
swift sft --model huihui-ai/Llama-3.2-1B-Instruct-abliterated --model_type llama3_2 --train_type lora --dataset "data/QWQ-LONGCOT-500K/qwq_500k.jsonl#20000" --torch_dtype bfloat16 --num_train_epochs 1 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --learning_rate 1e-4 --lora_rank 8 --lora_alpha 32 --target_modules all-linear --gradient_accumulation_steps 16 --eval_steps 50 --save_steps 50 --save_total_limit 2 --logging_steps 5 --max_length 16384 --output_dir output/Llama-3.2-1B-Instruct-abliterated/lora/sft --system "You are a helpful assistant. You should think step-by-step." --warmup_ratio 0.05 --dataloader_num_workers 4 --model_author "huihui-ai" --model_name "huihui-ai-robot"
|
49 |
```
|
50 |
|
51 |
|
|
|
69 |
6. Combined training with huihui-ai/QWQ-LONGCOT-500K (#20000) and huihui-ai/LONGCOT-Refine datasets (#20000), Trained for 1 epoch:
|
70 |
|
71 |
```
|
72 |
+
swift sft --model huihui-ai/checkpoint-1237-merged --model_type llama3_2 --train_type lora --dataset "data/QWQ-LONGCOT-500K/qwq_500k.jsonl#20000" "data/LONGCOT-Refine-500K/refine_from_qwen2_5.jsonl#20000" --torch_dtype bfloat16 --num_train_epochs 1 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --learning_rate 1e-4 --lora_rank 8 --lora_alpha 32 --target_modules all-linear --gradient_accumulation_steps 16 --eval_steps 50 --save_steps 50 --save_total_limit 2 --logging_steps 5 --max_length 16384 --output_dir output/Llama-3.2-1B-Instruct-abliterated/lora/sft2 --system "You are a helpful assistant. You should think step-by-step." --warmup_ratio 0.05 --dataloader_num_workers 4 --model_author "huihui-ai" --model_name "huihui-ai-robot"
|
73 |
```
|
74 |
|
75 |
|