See axolotl config

axolotl version: 0.4.0

base_model: T3Q-LLM/T3Q-LLM-sft1.0-dpo1.0
base_model_config: T3Q-LLM/T3Q-LLM-sft1.0-dpo1.0
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
is_llama_derived_model: true
hub_model_id: T3Q-LLM-sft1.0-dpo1.0_4300QA

load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
  # - path: admin_data.csv
    - path: superiort/multiplechoice-4300
      type: alpaca
      # The below are defaults. only set what's needed if you use a different column name.
      # system_prompt: ""
      # system_format: "{system}"
      # field_system: system
      # field_instruction: instruction
      # field_input: input
      # field_output: output

      # format: |-
      #   Human: {instruction} {input}
      #   Assistant:

      # no_input_format: "{instruction} "

# dataset_prepared_path: yanolja_preprocessed_data
dataset_prepared_path: last_run_prepared
val_set_size: 0.2
output_dir: ./T3Q-LLM-sft1.0-dpo1.0_4300QA

adapter: qlora
lora_model_dir: 

# device_map: [0,1,3]

sequence_len: 4096
sample_packing: false

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out: 

wandb_project: axolotl_T3Q_4300
wandb_entity: 
wandb_watch: 
wandb_run_id: T3Q_mod_4300
wandb_log_model: 

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 10
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false

gradient_checkpointing: true
early_stopping_patience: 
resume_from_checkpoint: 
local_rank: 
logging_steps: 1
xformers_attention: 
flash_attention: true

warmup_steps: 100
eval_steps: 0.01
save_strategy: epoch
save_steps: 
debug: 
deepspeed: 
weight_decay: 0.0
fsdp: 
fsdp_config: 
special_tokens:
    bos_token: "<s>"
    eos_token: "<|im_end|>"
    unk_token: "<unk>"
    pad_token: "</s>"  # EOS와 PAD가 동일

T3Q-LLM-sft1.0-dpo1.0_4300QA

This model is a fine-tuned version of T3Q-LLM/T3Q-LLM-sft1.0-dpo1.0 on the None dataset. It achieves the following results on the evaluation set:

Loss: 1.2288

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 2
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 4
total_train_batch_size: 32
total_eval_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss
1.2424	0.0093	1	1.0432
1.0333	0.1023	11	0.9004
0.8715	0.2047	22	0.7157
0.7053	0.3070	33	0.6548
0.6688	0.4093	44	0.6449
0.6823	0.5116	55	0.6282
0.5876	0.6140	66	0.6251
0.6994	0.7163	77	0.6290
0.6662	0.8186	88	0.6311
0.6239	0.9209	99	0.6338
0.5959	1.0233	110	0.6319
0.6408	1.1256	121	0.6668
0.595	1.2279	132	0.6221
0.5476	1.3302	143	0.6295
0.587	1.4326	154	0.6569
0.5867	1.5349	165	0.6208
0.5895	1.6372	176	0.6264
0.6581	1.7395	187	0.6208
0.5872	1.8419	198	0.6290
0.6314	1.9442	209	0.6243
0.4397	2.0465	220	0.6591
0.4568	2.1488	231	0.7095
0.422	2.2512	242	0.6914
0.453	2.3535	253	0.7001
0.4678	2.4558	264	0.6896
0.4335	2.5581	275	0.6776
0.4796	2.6605	286	0.6829
0.4637	2.7628	297	0.6742
0.4532	2.8651	308	0.6828
0.4348	2.9674	319	0.6836
0.2787	3.0698	330	0.8085
0.2336	3.1721	341	0.8380
0.2341	3.2744	352	0.7998
0.2393	3.3767	363	0.8041
0.2826	3.4791	374	0.8040
0.2505	3.5814	385	0.8099
0.3057	3.6837	396	0.8103
0.2789	3.7860	407	0.7964
0.269	3.8884	418	0.7891
0.2493	3.9907	429	0.7958
0.1193	4.0930	440	0.9242
0.1143	4.1953	451	0.9331
0.1147	4.2977	462	0.9112
0.1351	4.4	473	0.9290
0.0982	4.5023	484	0.9358
0.1011	4.6047	495	0.9279
0.09	4.7070	506	0.9289
0.1063	4.8093	517	0.9392
0.1038	4.9116	528	0.9267
0.0361	5.0140	539	0.9412
0.0371	5.1163	550	1.0589
0.033	5.2186	561	1.0253
0.0426	5.3209	572	1.0482
0.0357	5.4233	583	1.0388
0.0355	5.5256	594	1.0566
0.0373	5.6279	605	1.0470
0.0395	5.7302	616	1.0581
0.0366	5.8326	627	1.0696
0.0387	5.9349	638	1.0641
0.0127	6.0372	649	1.0692
0.0114	6.1395	660	1.1612
0.0105	6.2419	671	1.1575
0.0121	6.3442	682	1.1479
0.0082	6.4465	693	1.1591
0.011	6.5488	704	1.1669
0.0112	6.6512	715	1.1645
0.0109	6.7535	726	1.1628
0.0102	6.8558	737	1.1705
0.0098	6.9581	748	1.1769
0.006	7.0605	759	1.1840
0.0064	7.1628	770	1.2016
0.0063	7.2651	781	1.2133
0.0058	7.3674	792	1.2182
0.0056	7.4698	803	1.2218
0.0057	7.5721	814	1.2234
0.0059	7.6744	825	1.2245
0.0057	7.7767	836	1.2247
0.0048	7.8791	847	1.2247
0.0054	7.9814	858	1.2246
0.0051	8.0837	869	1.2252
0.0059	8.1860	880	1.2261
0.0053	8.2884	891	1.2272
0.0057	8.3907	902	1.2275
0.0056	8.4930	913	1.2280
0.0052	8.5953	924	1.2283
0.007	8.6977	935	1.2287
0.0052	8.8	946	1.2285
0.005	8.9023	957	1.2289
0.0056	9.0047	968	1.2288
0.005	9.1070	979	1.2289
0.0054	9.2093	990	1.2290
0.0053	9.3116	1001	1.2288
0.0049	9.4140	1012	1.2290
0.0052	9.5163	1023	1.2290
0.0058	9.6186	1034	1.2291
0.0059	9.7209	1045	1.2289
0.0055	9.8233	1056	1.2289
0.0054	9.9256	1067	1.2288

Framework versions

PEFT 0.10.0
Transformers 4.40.1
Pytorch 2.1.2+cu121
Datasets 2.15.0
Tokenizers 0.19.1

superiort
/

T3Q-LLM-sft1.0-dpo1.0_4300QA_10epochs

T3Q-LLM-sft1.0-dpo1.0_4300QA

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Adapter for

Evaluation results

T3Q-LLM-sft1.0-dpo1.0_4300QA

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Adapter for T3Q-LLM/T3Q-LLM-sft1.0-dpo1.0

Evaluation results

Adapter for