Kendamarron
/

llm-jp-3-3.7b-o1-v0.1

Text Generation

Generated from Trainer

text-generation-inference

Model card Files Files and versions Community

Kendamarron commited on Dec 7, 2024

Commit

dc39238

·

verified ·

1 Parent(s): 5803481

Update README.md

Files changed (1) hide show

README.md +49 -1

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 library_name: transformers
-license: other
 base_model: llm-jp/llm-jp-3-3.7b-instruct
 tags:
 - llama-factory
@@ -9,6 +9,8 @@ tags:
 model-index:
 - name: sft
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -61,3 +63,49 @@ The following hyperparameters were used during training:
 - Pytorch 2.4.1+cu121
 - Datasets 3.1.0
 - Tokenizers 0.20.3

 ---
 library_name: transformers
+license: apache-2.0
 base_model: llm-jp/llm-jp-3-3.7b-instruct
 tags:
 - llama-factory
 model-index:
 - name: sft
   results: []
+language:
+- ja
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 - Pytorch 2.4.1+cu121
 - Datasets 3.1.0
 - Tokenizers 0.20.3
+### LLaMA-Factory yaml
+```
+### model
+model_name_or_path: llm-jp/llm-jp-3-3.7b-instruct
+### method
+stage: sft
+do_train: true
+finetuning_type: full
+deepspeed: examples/deepspeed/ds_z3_config.json
+### dataset
+dataset: cot_normal, cot_math
+template: alpaca_ja
+cutoff_len: 8192
+overwrite_cache: true
+preprocessing_num_workers: 16
+### output
+output_dir: saves/llm_jp/full/sft
+logging_steps: 10
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+### train
+per_device_train_batch_size: 8
+gradient_accumulation_steps: 4
+learning_rate: 1.0e-5
+num_train_epochs: 2.0
+lr_scheduler_type: cosine
+optim: adamw_bnb_8bit
+warmup_ratio: 0.1
+bf16: true
+ddp_timeout: 180000000
+### eval
+val_size: 0.01
+per_device_eval_batch_size: 1
+eval_strategy: steps
+eval_steps: 500
+### logging
+report_to: wandb
+```