Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Training procedure

trained 1 epoch on 1024 rows of turkish Q&A data there is 150 sentetic medical related Q&A in our data rest of data is mundane Q&A

LoRA attention dimension

lora_r = 16

Alpha parameter for LoRA scaling

lora_alpha = 16

Dropout probability for LoRA layers

lora_dropout = 0.1

Number of training epochs

num_train_epochs = 1

Enable fp16/bf16 training (set bf16 to True with an A100)

fp16 = False bf16 = False

Batch size per GPU for training

per_device_train_batch_size = 2

Batch size per GPU for evaluation

per_device_eval_batch_size = 2

Number of update steps to accumulate the gradients for

gradient_accumulation_steps = 4

Enable gradient checkpointing

gradient_checkpointing = True

Maximum gradient normal (gradient clipping)

max_grad_norm = 0.3

Initial learning rate (AdamW optimizer)

learning_rate = 2e-4

Weight decay to apply to all layers except bias/LayerNorm weights

weight_decay = 0.001

Optimizer to use

optim = "paged_adamw_32bit"

Learning rate schedule

lr_scheduler_type = "cosine"

Ratio of steps for a linear warmup (from 0 to learning rate)

warmup_ratio = 0.03

Group sequences into batches with same length

Saves memory and speeds up training considerably

group_by_length = True

################################################################################

SFT parameters

################################################################################

Maximum sequence length to use

max_seq_length = None

Pack multiple short examples in the same input sequence to increase efficiency

packing = False

Load the entire model on the GPU 0

device_map = {"": 0}

Log every X updates steps

logging_steps = 2 The following bitsandbytes quantization config was used during training:

  • load_in_8bit: False
  • load_in_4bit: True
  • llm_int8_threshold: 6.0
  • llm_int8_skip_modules: None
  • llm_int8_enable_fp32_cpu_offload: False
  • llm_int8_has_fp16_weight: False
  • bnb_4bit_quant_type: nf4
  • bnb_4bit_use_double_quant: False
  • bnb_4bit_compute_dtype: float16

Framework versions

  • PEFT 0.4.0
Downloads last month
0
Unable to determine this model’s pipeline type. Check the docs .

Dataset used to train taric49/llama3-8b-I-mixed_medical-turkish.v1