Instructions to use TaoTern/TaoNet-mini-A2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TaoTern/TaoNet-mini-A2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="TaoTern/TaoNet-mini-A2", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("TaoTern/TaoNet-mini-A2", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use TaoTern/TaoNet-mini-A2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "TaoTern/TaoNet-mini-A2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TaoTern/TaoNet-mini-A2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/TaoTern/TaoNet-mini-A2
- SGLang
How to use TaoTern/TaoNet-mini-A2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "TaoTern/TaoNet-mini-A2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TaoTern/TaoNet-mini-A2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "TaoTern/TaoNet-mini-A2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TaoTern/TaoNet-mini-A2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use TaoTern/TaoNet-mini-A2 with Docker Model Runner:
docker model run hf.co/TaoTern/TaoNet-mini-A2
Model Summary
TaoNet-mini-A2 is a 0.5B local-first language model intended for text generation experiments, lightweight instruction following, and research on efficient custom architectures.
This release is organized as a standard Hugging Face model package, while keeping the underlying TaoNet implementation in the repository for transparent loading and export.
Model Details
Model Specifications
| Specification | Value |
|---|---|
| Model name | TaoNet-mini-A2 |
| Model type | Causal language model |
| Architecture | TaoNetForCausalLM |
| Vocabulary size | 8,192 |
| Hidden size | 1,024 |
| Number of layers | 16 |
| Number of attention heads | 8 |
| Head dimension | 128 |
| Latent KV dimension | 768 |
| Feed-forward dimension | 3,072 |
| Maximum sequence length | 1,024 tokens |
| Dropout | 0.02 |
| Embedding type | Factorized embedding |
| Rope scale | 40.0 |
| Tokenizer | SentencePiece |
| Special tokens | <UNK>, <BOS>, <EOS>, <PAD> |
Hardware
Hardware
- GPU: 1 x RTX 5090
Software
- Training framework: TaoTrain
Quick Start
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
MODEL_NAME = "TaoTern/TaoNet-mini-A2"
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.bfloat16 if device == "cuda" else torch.float32
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
MODEL_NAME,
trust_remote_code=True,
torch_dtype=dtype,
).to(device)
prompt = "Fruit is now expensive so we should"
inputs = tokenizer(prompt, return_tensors="pt").to(device)
with torch.inference_mode():
output_ids = model.generate(
**inputs,
max_new_tokens=64,
temperature=0.7,
top_p=0.85,
repetition_penalty=1.2,
do_sample=True,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
)
completion = tokenizer.decode(
output_ids[0][inputs["input_ids"].shape[1]:],
skip_special_tokens=True,
)
print(completion)
Benchmarks
The following scores were reported for TaoNet-mini-A2:
| Benchmark | Score |
|---|---|
| MMLU | 0.2412 |
| HellaSwag | 0.3162 |
| ARC-Easy | 0.4331 |
| ARC-Challenge | 0.2560 |
| PIQA | 0.6137 |
| WinoGrande | 0.5083 |
These numbers should be treated as a snapshot of the current checkpoint, not as a universal capability guarantee.
Limitations
- This is a relatively small model, so it will not match larger frontier models on broad reasoning or long-horizon planning
- It may hallucinate or produce incorrect answers, especially on ambiguous prompts or tasks that require deep domain knowledge
- Outputs can be sensitive to prompt wording and generation parameters
- The model is not intended for safety-critical, legal, medical, or high-stakes decision-making without human review
- The reported benchmark scores are limited to the tasks listed above and do not describe full real-world quality
Citation
If you use TaoNet-mini-A2 in your research or product work, please cite:
@software{taonet_mini_a2_2026,
title={TaoNet-mini-A2},
author={Felix Thian},
year={2026},
url={https://huggingface.co/TaoTern/TaoNet-mini-A2}
}
License
This repository is released under the MIT License.
Acknowledgments
- Hugging Face Transformers for the model-loading interface
- SentencePiece for tokenizer support
- The TaoTrain export pipeline used to package the checkpoint
- Downloads last month
- 137