See axolotl config

axolotl version: 0.4.1

adapter: lora
base_model: EleutherAI/gpt-neo-125m
bf16: true
chat_template: llama3
dataset_prepared_path: null
datasets:
- data_files:
  - 9400c082b072ce22_train_data.json
  ds_type: json
  format: custom
  path: /workspace/input_data/9400c082b072ce22_train_data.json
  type:
    field_instruction: ja
    field_output: en
    format: '{instruction}'
    no_input_format: '{instruction}'
    system_format: '{system}'
    system_prompt: ''
debug: null
deepspeed: null
early_stopping_patience: 4
eval_max_new_tokens: 128
eval_steps: 150
eval_table_size: null
flash_attention: false
fp16: null
fsdp: null
fsdp_config: null
gradient_accumulation_steps: 4
gradient_checkpointing: true
group_by_length: false
hub_model_id: Romain-XV/19783dba-2611-430a-89e2-4d277105a2fb
hub_repo: null
hub_strategy: checkpoint
hub_token: null
learning_rate: 0.0002
load_best_model_at_end: true
load_in_4bit: true
load_in_8bit: false
local_rank: null
logging_steps: 1
lora_alpha: 128
lora_dropout: 0.3
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 64
lora_target_linear: true
lora_target_modules:
- q_proj
- k_proj
lr_scheduler: cosine
max_grad_norm: 1.0
max_steps: 9660
micro_batch_size: 2
mlflow_experiment_name: /tmp/9400c082b072ce22_train_data.json
model_type: AutoModelForCausalLM
num_epochs: 3
optimizer: adamw_bnb_8bit
output_dir: miner_id_24
pad_to_sequence_len: true
resume_from_checkpoint: null
s2_attention: null
sample_packing: false
save_steps: 150
sequence_len: 1024
special_tokens:
  pad_token: <|endoftext|>
strict: false
tf32: true
tokenizer_type: AutoTokenizer
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.03388084783433621
wandb_entity: null
wandb_mode: online
wandb_name: 97f965f0-6c2c-4001-93f0-b3bd5a572767
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: 97f965f0-6c2c-4001-93f0-b3bd5a572767
warmup_steps: 10
weight_decay: 0.0
xformers_attention: null

19783dba-2611-430a-89e2-4d277105a2fb

This model is a fine-tuned version of EleutherAI/gpt-neo-125m on the None dataset. It achieves the following results on the evaluation set:

Loss: 1.9652

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
training_steps: 9660

Training results

Training Loss	Epoch	Step	Validation Loss
22.3081	0.0001	1	5.3931
12.1969	0.0084	150	3.0791
12.8922	0.0168	300	2.9534
12.5402	0.0252	450	2.8506
10.4376	0.0337	600	2.7774
9.6342	0.0421	750	2.7315
11.2411	0.0505	900	2.6845
12.9895	0.0589	1050	2.6467
12.0386	0.0673	1200	2.6053
11.1212	0.0757	1350	2.5762
9.9151	0.0842	1500	2.5369
11.2688	0.0926	1650	2.5001
10.0221	0.1010	1800	2.4761
10.3173	0.1094	1950	2.4375
10.428	0.1178	2100	2.4349
7.0369	0.1262	2250	2.3860
10.4659	0.1347	2400	2.3725
10.4187	0.1431	2550	2.3579
6.8085	0.1515	2700	2.3285
12.6518	0.1599	2850	2.3110
9.9324	0.1683	3000	2.2927
7.8111	0.1767	3150	2.2739
9.2326	0.1852	3300	2.2593
8.6382	0.1936	3450	2.2342
8.518	0.2020	3600	2.2290
6.4198	0.2104	3750	2.2118
9.0537	0.2188	3900	2.2064
6.6054	0.2272	4050	2.1808
8.1502	0.2357	4200	2.1758
7.229	0.2441	4350	2.1579
7.0952	0.2525	4500	2.1411
7.7773	0.2609	4650	2.1294
9.354	0.2693	4800	2.1157
9.6896	0.2777	4950	2.1120
9.817	0.2862	5100	2.0999
9.7308	0.2946	5250	2.0837
7.0272	0.3030	5400	2.0796
9.446	0.3114	5550	2.0694
9.1402	0.3198	5700	2.0556
7.8589	0.3282	5850	2.0542
8.3354	0.3367	6000	2.0445
8.081	0.3451	6150	2.0343
6.7192	0.3535	6300	2.0259
10.2732	0.3619	6450	2.0235
9.3245	0.3703	6600	2.0137
8.6904	0.3787	6750	2.0092
6.4253	0.3872	6900	2.0042
8.0254	0.3956	7050	1.9975
10.3048	0.4040	7200	1.9963
9.2663	0.4124	7350	1.9909
8.596	0.4208	7500	1.9860
9.4026	0.4292	7650	1.9820
7.5361	0.4377	7800	1.9791
10.1732	0.4461	7950	1.9773
9.5052	0.4545	8100	1.9737
9.1775	0.4629	8250	1.9720
5.179	0.4713	8400	1.9702
6.0604	0.4797	8550	1.9688
7.6645	0.4882	8700	1.9676
6.7768	0.4966	8850	1.9666
8.6168	0.5050	9000	1.9657
9.4105	0.5134	9150	1.9658
8.4106	0.5218	9300	1.9655
5.5724	0.5302	9450	1.9654
5.8533	0.5387	9600	1.9652

Framework versions

PEFT 0.13.2
Transformers 4.46.0
Pytorch 2.5.0+cu124
Datasets 3.0.1
Tokenizers 0.20.1

Romain-XV
/

19783dba-2611-430a-89e2-4d277105a2fb

19783dba-2611-430a-89e2-4d277105a2fb

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for Romain-XV/19783dba-2611-430a-89e2-4d277105a2fb

Evaluation results