Bonsai Logo

Bonsai: A Small Ternary-Weight Language Model

Model Details

Model Description

Bonsai is a small 500 million parameter ternary weight language model trained by deepgrove. Bonsai adopts the Llama architecture and Mistral tokenizer following Danube 3, with modified linear layers to support ternary weights. The model has been trained primarily using DCLM-Pro and Fineweb-Edu. Bonsai marks a new paradigm of efficiency, being trained in less than 5 billion tokens.

Usage

Bonsai can be easily used through the Huggingface Transformers library. However, we note that all operations are currently performed in 16 bit precision; we're currently working towards integrating our model design with custom mixed precision kernels. A quick example follows:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("deepgrove/Bonsai", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepgrove/Bonsai", trust_remote_code=True)
text = "What is the capital of France?"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

We note that Bonsai is not instruction tuned; we highly recommend finetuning the model before usage in a downstream task.

Evaluation

Bonsai achieves competitive performance among its peers, being one of the first ternary models to do so. Evalution results are below; for more detailed results and comparisons to other ternary models, please see the accompanying paper linked above. We use lm-eval for all benchmarks outside of MMLU and lighteval's cloze formulation for MMLU.

Model ARC-c ARC-e HS. OBQA PiQA Wino. MMLU Avg
MobiLlama 0.5B 26.62 46.68 51.66 30.00 71.65 54.50 28.61 44.25
Qwen 2 0.5B 28.84 50.29 49.12 33.00 69.26 56.99 31.78 45.61
MobileLLM 600M 29.01 56.65 55.35 34.00 71.65 59.75 31.40 48.13
Qwen 2.5 0.5B 32.25 58.29 52.18 35.40 69.91 56.12 33.40 48.22
Bonsai 33.36 57.95 48.04 34.00 70.24 54.85 30.28 46.96
Downloads last month
112
Safetensors
Model size
514M params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support