Swahili llama 3 8b

  • Developed by: GodsonNtungi
  • License: apache-2.0
  • Finetuned from model : unsloth/llama-3-8b-bnb-4bit

An experimental model with poor performing results, but a great start

training run : 1 epoch
time: 9 hours : 20 mins : 07 seconds
training loss: 0.8683

PEFT parameters

model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Weakness
The model is not properly finetuned to generate end of text token when needed , hence great results start followed by gibberish depending on max token limit set

Downloads last month
13
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train GodsonNtungi/swahilillama3-8b