Instructions to use Entrit/Mistral-7B-v0.3-trit-uniform-d4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Entrit/Mistral-7B-v0.3-trit-uniform-d4 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Entrit/Mistral-7B-v0.3-trit-uniform-d4")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Entrit/Mistral-7B-v0.3-trit-uniform-d4") model = AutoModelForCausalLM.from_pretrained("Entrit/Mistral-7B-v0.3-trit-uniform-d4") - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Entrit/Mistral-7B-v0.3-trit-uniform-d4 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Entrit/Mistral-7B-v0.3-trit-uniform-d4" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Entrit/Mistral-7B-v0.3-trit-uniform-d4", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Entrit/Mistral-7B-v0.3-trit-uniform-d4
- SGLang
How to use Entrit/Mistral-7B-v0.3-trit-uniform-d4 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Entrit/Mistral-7B-v0.3-trit-uniform-d4" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Entrit/Mistral-7B-v0.3-trit-uniform-d4", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Entrit/Mistral-7B-v0.3-trit-uniform-d4" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Entrit/Mistral-7B-v0.3-trit-uniform-d4", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Entrit/Mistral-7B-v0.3-trit-uniform-d4 with Docker Model Runner:
docker model run hf.co/Entrit/Mistral-7B-v0.3-trit-uniform-d4
Mistral-7B-v0.3-trit-uniform-d4
Balanced ternary quantization of mistralai/Mistral-7B-v0.3 at depth d=4 (81 levels per weight, 6.64 bits per weight).
Produced with the codec from "Balanced Ternary Post-Training Quantization for Large Language Models" (Stentzel, 2026). See Entrit/tritllm-codec for the codec source.
Quick load
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("Entrit/Mistral-7B-v0.3-trit-uniform-d4")
tokenizer = AutoTokenizer.from_pretrained("Entrit/Mistral-7B-v0.3-trit-uniform-d4")
The weights are dequantized to FP16 for stock-transformers compatibility. The on-disk size is therefore the same as the FP16 source. The 6.64-bpw figure refers to the information content of the quantized matrices and is what matters for inference on hardware that consumes the packed trit format directly (see Entrit/tritllm-kernel).
Quantization details
| Field | Value |
|---|---|
| Source model | mistralai/Mistral-7B-v0.3 |
| Depth | d=4 (81 levels) |
| Bits per weight | 6.64 |
| Group size | 16 |
| Scale codebook | 27-entry log-spaced (scale_depth=3) |
| Method | Uniform PTQ |
| Quantized layers | all 2D linear matrices |
| Kept FP16 | lm_head, token embeddings, all *_norm layers |
| Codec | tritllm v2 |
Citation
@article{stentzel2026ternaryptq,
title = {Balanced Ternary Post-Training Quantization for Large Language Models},
author = {Stentzel, Eric},
year = 2026,
note = {Entrit Systems}
}
Reproducibility
git clone https://huggingface.co/Entrit/tritllm-codec
cd tritllm-codec
python quantize_model_v2.py --model mistralai/Mistral-7B-v0.3 --configs uniform-d4 --out ./out
- Downloads last month
- 152
Model tree for Entrit/Mistral-7B-v0.3-trit-uniform-d4
Base model
mistralai/Mistral-7B-v0.3