huihui-ai commited on
Commit
3fb8109
·
verified ·
1 Parent(s): ac772ef

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -4
README.md CHANGED
@@ -10,8 +10,6 @@ base_model:
10
 
11
  MicroThinker-1B-Preview, a new model fine-tuned from the [huihui-ai/Llama-3.2-3B-Instruct-abliterated](https://huggingface.co/huihui-ai/Llama-3.2-3B-Instruct-abliterated) model.
12
 
13
- The model is still being fine-tuned, and it will be ready very soon.
14
-
15
  ## Training Details
16
 
17
  This is just a test, but the performance is quite good. Now, I'll introduce the test environment.
@@ -47,7 +45,7 @@ huggingface-cli download --repo-type dataset huihui-ai/LONGCOT-Refine-500K --lo
47
  3. Used only the huihui-ai/QWQ-LONGCOT-500K dataset (#20000), Trained for 1 epoch:
48
 
49
  ```
50
- swift sft --model huihui-ai/Llama-3.2-1B-Instruct-abliterated --model_type llama3_2 --train_type lora --dataset "data/qwq_500k.jsonl#20000" --torch_dtype bfloat16 --num_train_epochs 1 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --learning_rate 1e-4 --lora_rank 8 --lora_alpha 32 --target_modules all-linear --gradient_accumulation_steps 16 --eval_steps 50 --save_steps 50 --save_total_limit 2 --logging_steps 5 --max_length 16384 --output_dir output/Llama-3.2-1B-Instruct-abliterated/lora/sft --system "You are a helpful assistant. You should think step-by-step." --warmup_ratio 0.05 --dataloader_num_workers 4 --model_author "huihui-ai" --model_name "huihui-ai-robot"
51
  ```
52
 
53
 
@@ -71,7 +69,7 @@ swift infer --model huihui/checkpoint-1237-merged --stream true --infer_backend
71
  6. Combined training with huihui-ai/QWQ-LONGCOT-500K (#20000) and huihui-ai/LONGCOT-Refine datasets (#20000), Trained for 1 epoch:
72
 
73
  ```
74
- swift sft --model huihui-ai/checkpoint-1237-merged --model_type llama3_2 --train_type lora --dataset "data/qwq_500k.jsonl#20000" "data/refine_from_qwen2_5.jsonl#20000" --torch_dtype bfloat16 --num_train_epochs 1 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --learning_rate 1e-4 --lora_rank 8 --lora_alpha 32 --target_modules all-linear --gradient_accumulation_steps 16 --eval_steps 50 --save_steps 50 --save_total_limit 2 --logging_steps 5 --max_length 16384 --output_dir output/Llama-3.2-1B-Instruct-abliterated/lora/sft2 --system "You are a helpful assistant. You should think step-by-step." --warmup_ratio 0.05 --dataloader_num_workers 4 --model_author "huihui-ai" --model_name "huihui-ai-robot"
75
  ```
76
 
77
 
 
10
 
11
  MicroThinker-1B-Preview, a new model fine-tuned from the [huihui-ai/Llama-3.2-3B-Instruct-abliterated](https://huggingface.co/huihui-ai/Llama-3.2-3B-Instruct-abliterated) model.
12
 
 
 
13
  ## Training Details
14
 
15
  This is just a test, but the performance is quite good. Now, I'll introduce the test environment.
 
45
  3. Used only the huihui-ai/QWQ-LONGCOT-500K dataset (#20000), Trained for 1 epoch:
46
 
47
  ```
48
+ swift sft --model huihui-ai/Llama-3.2-1B-Instruct-abliterated --model_type llama3_2 --train_type lora --dataset "data/QWQ-LONGCOT-500K/qwq_500k.jsonl#20000" --torch_dtype bfloat16 --num_train_epochs 1 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --learning_rate 1e-4 --lora_rank 8 --lora_alpha 32 --target_modules all-linear --gradient_accumulation_steps 16 --eval_steps 50 --save_steps 50 --save_total_limit 2 --logging_steps 5 --max_length 16384 --output_dir output/Llama-3.2-1B-Instruct-abliterated/lora/sft --system "You are a helpful assistant. You should think step-by-step." --warmup_ratio 0.05 --dataloader_num_workers 4 --model_author "huihui-ai" --model_name "huihui-ai-robot"
49
  ```
50
 
51
 
 
69
  6. Combined training with huihui-ai/QWQ-LONGCOT-500K (#20000) and huihui-ai/LONGCOT-Refine datasets (#20000), Trained for 1 epoch:
70
 
71
  ```
72
+ swift sft --model huihui-ai/checkpoint-1237-merged --model_type llama3_2 --train_type lora --dataset "data/QWQ-LONGCOT-500K/qwq_500k.jsonl#20000" "data/LONGCOT-Refine-500K/refine_from_qwen2_5.jsonl#20000" --torch_dtype bfloat16 --num_train_epochs 1 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --learning_rate 1e-4 --lora_rank 8 --lora_alpha 32 --target_modules all-linear --gradient_accumulation_steps 16 --eval_steps 50 --save_steps 50 --save_total_limit 2 --logging_steps 5 --max_length 16384 --output_dir output/Llama-3.2-1B-Instruct-abliterated/lora/sft2 --system "You are a helpful assistant. You should think step-by-step." --warmup_ratio 0.05 --dataloader_num_workers 4 --model_author "huihui-ai" --model_name "huihui-ai-robot"
73
  ```
74
 
75