You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Training procedure

trained 1 epoch on 1024 rows of turkish Q&A data there is 150 sentetic medical related Q&A in our data rest of data is mundane Q&A

LoRA attention dimension

lora_r = 16

Alpha parameter for LoRA scaling

lora_alpha = 16

Dropout probability for LoRA layers

lora_dropout = 0.1

Number of training epochs

num_train_epochs = 1

Enable fp16/bf16 training (set bf16 to True with an A100)

fp16 = False bf16 = False

Batch size per GPU for training

per_device_train_batch_size = 2

Batch size per GPU for evaluation

per_device_eval_batch_size = 2

Number of update steps to accumulate the gradients for

gradient_accumulation_steps = 4

Enable gradient checkpointing

gradient_checkpointing = True

Maximum gradient normal (gradient clipping)

max_grad_norm = 0.3

Initial learning rate (AdamW optimizer)

learning_rate = 2e-4

Weight decay to apply to all layers except bias/LayerNorm weights

weight_decay = 0.001

Optimizer to use

optim = "paged_adamw_32bit"

Learning rate schedule

lr_scheduler_type = "cosine"

Ratio of steps for a linear warmup (from 0 to learning rate)

warmup_ratio = 0.03

Group sequences into batches with same length

Saves memory and speeds up training considerably

group_by_length = True

################################################################################

SFT parameters

################################################################################

Maximum sequence length to use

max_seq_length = None

Pack multiple short examples in the same input sequence to increase efficiency

packing = False

Load the entire model on the GPU 0

device_map = {"": 0}

Log every X updates steps

logging_steps = 2 The following bitsandbytes quantization config was used during training:

load_in_8bit: False
load_in_4bit: True
llm_int8_threshold: 6.0
llm_int8_skip_modules: None
llm_int8_enable_fp32_cpu_offload: False
llm_int8_has_fp16_weight: False
bnb_4bit_quant_type: nf4
bnb_4bit_use_double_quant: False
bnb_4bit_compute_dtype: float16

Framework versions

PEFT 0.4.0

Downloads last month: 0

Unable to determine this model’s pipeline type. Check the docs .

taric49
/

llama3-8b-I-mixed_medical-turkish.v1