metadata

license: other
library_name: peft
tags:
  - generated_from_trainer
base_model: deepseek-ai/deepseek-coder-33b-instruct
model-index:
  - name: lora-logo_fix_epoch3_deepseek33b_gpt35i_lr_0.0002_alpha_1024_r_1024
    results: []

See axolotl config

axolotl version: 0.4.0

adapter: lora
base_model: deepseek-ai/deepseek-coder-33b-instruct
bf16: auto
dataset_prepared_path: ./logo_ds_preprocess_list_gpt35
datasets:
- path: ../logo/fix_synthetic_int_images_data.jsonl
  type:
    field_instruction: input
    field_output: output
    format: '### Instruction:

      {input}

      ### Response:

      '
    no_input_format: '{instruction}'
debug: null
deepspeed: ./deepspeed_configs/zero2.json
early_stopping_patience: null
eval_sample_packing: true
evals_per_epoch: 4
flash_attention: true
fp16: null
fsdp: null
fsdp_config: null
gradient_accumulation_steps: 2
gradient_checkpointing: true
group_by_length: false
is_llama_derived_model: true
learning_rate: 0.0002
load_in_4bit: false
load_in_8bit: true
local_rank: null
logging_steps: 1
lora_alpha: 1024
lora_dropout: 0.05
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 1024
lora_target_linear: true
lr_scheduler: cosine
micro_batch_size: 4
model_type: AutoModelForCausalLM
num_epochs: 3
optimizer: adamw_bnb_8bit
output_dir: ./lora-logo_fix_epoch3_deepseek33b_gpt35i_lr_0.0002_alpha_1024_r_1024
pad_to_sequence_len: true
resume_from_checkpoint: null
s2_attention: null
sample_packing: true
saves_per_epoch: 1
sequence_len: 1800
special_tokens:
  bos_token: "<\uFF5Cbegin\u2581of\u2581sentence\uFF5C>"
  eos_token: <|EOT|>
strict: true
tf32: false
tokenizer_type: AutoTokenizer
train_on_inputs: false
val_set_size: 0.05
wandb_entity: null
wandb_log_model: null
wandb_name: logo_fix_epoch3_deepseek33b_gpt35i_lr_0.0002_alpha_1024_r_1024
wandb_project: pbe-axo
wandb_watch: null
warmup_steps: 20
weight_decay: 0.0
xformers_attention: null

lora-logo_fix_epoch3_deepseek33b_gpt35i_lr_0.0002_alpha_1024_r_1024

This model is a fine-tuned version of deepseek-ai/deepseek-coder-33b-instruct on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.2293

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 20
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss
2.1791	0.0	1	2.1669
0.3312	0.25	71	0.3153
0.2906	0.5	142	0.2985
0.3079	0.75	213	0.2801
0.2859	1.0	284	0.2649
0.2544	1.23	355	0.2583
0.2291	1.49	426	0.2502
0.2632	1.74	497	0.2428
0.25	1.99	568	0.2341
0.1832	2.22	639	0.2356
0.2051	2.47	710	0.2316
0.2041	2.72	781	0.2293

Framework versions

PEFT 0.10.0
Transformers 4.40.0.dev0
Pytorch 2.1.2+cu121
Datasets 2.15.0
Tokenizers 0.15.0