metadata
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
language:
- en
license: apache-2.0
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- gguf
Llama 3.1-8B Instruct African-Ultrachat Quantize
- Developed by: vutuka
- License: apache-2.0
- Finetuned from model : meta-llama/meta-llama-3.1-8b-instruct
- Max Content Length :
8192
- Max Steps :
800
- Training Time :
02h-22min-08s
- Setup :
1 x RTX A6000
16 vCPU
58 GB RAM
150 GB Storage
Tokenizer & Chat Format
from unsloth.chat_templates import get_chat_template
tokenizer = get_chat_template(
tokenizer,
chat_template = "llama-3", # Supports zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, unsloth
mapping={
"role": "role",
"content": "content",
"user": "",
"assistant": "",
}
)
def formatting_prompts_func(examples):
convos = examples["messages"]
texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]
return { "text" : texts, }
pass
Trainer
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = shuffled_dataset,
dataset_text_field = "text",
max_seq_length = max_seq_length,
dataset_num_proc = 2,
packing = False, # Can make training 5x faster for short sequences.
args = TrainingArguments(
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
warmup_steps = 5,
max_steps = 800,
do_eval=True,
learning_rate = 3e-4,
log_level="debug",
#fp16 = not is_bfloat16_supported(),
bf16 = True,
logging_steps = 10,
optim = "adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = "linear",
seed = 3407,
output_dir = "outputs",
report_to='wandb',
warmup_ratio=0.3,
),
)
Inference with Llama CPP
This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.