Kili-small-1.0

A powerful, lightweight Small Language Model built for general chat and code — fine-tunable on consumer hardware.

🤗 Model on Hugging Face · 📄 Apache 2.0 License · 💬 Discussions

Overview

Kili-small-1.0 is a 500M-parameter Small Language Model (SLM) developed by Kili Labs, designed to deliver strong general-purpose chat and code generation capabilities in an extremely efficient footprint.

Kili-small-1.0 features a custom architecture designed from the ground up by Kili Labs, while leveraging the battle-tested Qwen2 tokenizer, chat template, and vocabulary for broad ecosystem compatibility. This means you get the benefits of a purpose-built model without sacrificing interoperability — standard Qwen2-compatible tooling, tokenizers, and pipelines work out of the box.

The model is purpose-engineered for developers and researchers who need a capable, adaptable model that runs — and fine-tunes — on consumer-grade hardware.

Key Features

500M Parameters — Compact by design. Maximum capability per parameter.
General Chat & Code — Strong performance on natural language conversation and code generation tasks.
Consumer Hardware Compatible — Fine-tunable with as little as 15 GB of RAM on low-end ("potato") GPUs with standard optimisation techniques.
Apache 2.0 Licensed — Fully open. Use, modify, and distribute freely for commercial and research purposes.
Safetensors Format — Efficient, safe model serialisation out of the box.

Model Details

Property	Value
Architecture	Custom (Kili Labs)
Tokenizer	Qwen2 (vocab, chat template & tools)
Parameter Count	500M (0.5B)
Tensor Type	F16
Language	English
License	Apache 2.0
Task	Text Generation
Tags	SLM, Chat, Coding, Qwen2-compatible

Note on Architecture: While Kili-small-1.0 uses a custom model architecture designed by Kili Labs, it adopts the Qwen2 tokenizer, vocabulary, and chat template. This ensures full compatibility with Qwen2-compatible inference pipelines, tokenization utilities, and chat formatting tools.

Training Datasets

Kili-small-1.0 was trained on a curated combination of high-quality instruction and planning datasets:

Dataset	Description
`vicgalle/alpaca-gpt4`	GPT-4 generated Alpaca-format instruction-following data (~52k samples)
`Qwen/DeepPlanning`	Deep planning and reasoning tasks (~2.1k samples)

Quickstart

Installation

pip install transformers torch

Inference

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "kililabs/Kili-small-1.0"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

messages = [
    {"role": "user", "content": "Write a Python function to reverse a linked list."}
]

input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

output = model.generate(
    input_ids,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.9
)

print(tokenizer.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True))

Fine-Tuning on Consumer Hardware

Kili-small-1.0 is specifically designed to be accessible for fine-tuning. With appropriate optimisation, you can fine-tune on a single consumer GPU with ~15 GB of RAM.

Recommended Setup

pip install transformers peft bitsandbytes accelerate datasets trl

Example: QLoRA Fine-Tuning

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model
import torch

# 4-bit quantisation for low VRAM usage
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

model = AutoModelForCausalLM.from_pretrained(
    "kililabs/Kili-small-1.0",
    quantization_config=bnb_config,
    device_map="auto"
)

# LoRA configuration
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# trainable params: ~4.7M || all params: ~504M || trainable%: ~0.93%

Hardware Requirements

Setup	VRAM	RAM	Notes
Full F16 Inference	~2 GB	4 GB	Very fast
QLoRA Fine-Tuning (4-bit)	~6–8 GB	15 GB	Consumer GPUs (GTX 1080, RTX 3060, etc.)
Full Fine-Tuning	~4 GB	12 GB	F16, gradient checkpointing recommended

Tip: Enable gradient_checkpointing=True and use bf16 or fp16 mixed precision in your TrainingArguments to further reduce memory usage during fine-tuning.

Intended Use

Kili-small-1.0 is designed for:

Conversational AI — Instruction-following, Q&A, and general assistant tasks.
Code Generation — Writing, explaining, and debugging code across common languages.
Fine-Tuning Base — A lightweight starting point for domain-specific SLM development.
Edge & Resource-Constrained Deployments — Applications where model size and memory are critical constraints.

Limitations

As a 500M parameter model, Kili-small-1.0 may underperform larger models on complex multi-step reasoning or highly specialised domain tasks.
Output quality is strongly influenced by prompt quality. Clear, well-structured prompts yield the best results.
The model has not been independently evaluated on standard safety benchmarks. Users are responsible for applying appropriate safety measures in production deployments.

License

This model is released under the Apache License 2.0. You are free to use, reproduce, modify, and distribute this model for commercial and non-commercial purposes. See the full license text: Apache 2.0

Citation

If you use Kili-small-1.0 in your research or projects, please cite it as:

@misc{kililabs2025kilismall,
  title        = {Kili-small-1.0: A 500M Parameter Small Language Model for Chat and Code},
  author       = {Kili Labs},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/kililabs/Kili-small-1.0}},
  note         = {Apache 2.0 License}
}

About Kili Labs

Kili Labs is committed to building powerful, accessible AI tools that work for everyone — not just those with access to enterprise infrastructure. Kili-small-1.0 is a step toward democratising capable language models for developers, researchers, and builders worldwide.

Made with ❤️ by Kili Labs

Downloads last month: 85

Safetensors

Model size

0.5B params

Tensor type

F16

kililabs
/

Kili-small-1.0