UlizaLlama_Q4_K_M-gguf 4-bit Quantized Bilingual Language Model

Overview

UlizaLlama_Q4_K_M-gguf is a 4-bit quantized version of the UlizaLlama model, a 7B parameter language model fine-tuned for Swahili and English. This quantized model offers the same bilingual capabilities as the original UlizaLlama but with significantly reduced model size and improved inference speed, making it ideal for deployment in resource-constrained environments.

Key Features

  • Bilingual Proficiency: Excels in both Swahili and English, with a focus on instructional tasks.
  • 4-bit Quantization: Utilizes the QQUF (Quantized QUarter Float) format for a 75% reduction in model size.
  • Efficient Inference: Faster processing and lower memory footprint compared to the full-precision model.
  • Versatile Applications: Suitable for question-answering, chat assistants, and various domain-specific tasks.

Model Details

  • Original Model: UlizaLlama (7B parameters)
  • Base Model: Jacaranda/kiswallama-pretrained (derived from Meta/Llama2)
  • Quantization Method: 4-bit QQUF
  • Languages: Swahili and English
  • License: CC BY-NC-SA 4.0 DEED

Installation

To use UlizaLlama-QQUF, you'll need a library that supports 4-bit quantized models. We recommend using the bitsandbytes library:

!pip install ctransformers

Usage

Here's a simple example of how to load and use de-coder/UlizaLlama_Q4_K_M-gguf

from ctransformers import AutoModelForCausalLM

# Load the model
llm = AutoModelForCausalLM.from_pretrained(
    "de-coder/UlizaLlama_Q4_K_M-gguf",
    model_file="Q4_K_M.gguf",
    lib="avx2"  # or "basic" if avx2 isn't supported
)

# Generate text
prompt = "Niambie kuhusu historia ya Kilimanjaro."
print(llm(prompt))

Performance and Trade-offs

UlizaLlama-QQUF offers substantial improvements in model size and inference speed. However, there might be a slight degradation in performance compared to the full-precision model. We encourage users to benchmark the model on their specific tasks to understand these trade-offs.

Use Cases

  1. Chatbots for healthcare, agriculture, education, and more.
  2. Language learning applications.
  3. Information services in Swahili-speaking regions.
  4. Edge devices and mobile applications.

Citation and Acknowledgments

If you use UlizaLlama_Q4_K_M-gguf in your work, please cite:

@misc{UlizaLlama_Q4_K_M-gguf,
  title={UlizaLlama_Q4_K_M-gguf: A Bilingual Language Model for Swahili and English},
  author={Kelvin Githu(de-coder)},
  year={2024},
  publisher={Kelvin Githu},
  howpublished={\url{https://huggingface.co/de-coder/UlizaLlama_Q4_K_M-gguf}},
}
Downloads last month
15
GGUF
Model size
6.91B params
Architecture
llama

4-bit

Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.