Instructions to use TaoTern/TaoNet-pico-A2-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TaoTern/TaoNet-pico-A2-instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="TaoTern/TaoNet-pico-A2-instruct", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("TaoTern/TaoNet-pico-A2-instruct", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use TaoTern/TaoNet-pico-A2-instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "TaoTern/TaoNet-pico-A2-instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TaoTern/TaoNet-pico-A2-instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/TaoTern/TaoNet-pico-A2-instruct
- SGLang
How to use TaoTern/TaoNet-pico-A2-instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "TaoTern/TaoNet-pico-A2-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TaoTern/TaoNet-pico-A2-instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "TaoTern/TaoNet-pico-A2-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TaoTern/TaoNet-pico-A2-instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use TaoTern/TaoNet-pico-A2-instruct with Docker Model Runner:
docker model run hf.co/TaoTern/TaoNet-pico-A2-instruct
TaoNet Pico A2 Instruct
Model summary
TaoNet Pico A2 Instruct is an instruction-tuned TaoNet A2 variant built on the same hardware and TaoTrain framework as the earlier checkpoints. The model was first pretrained on TaoData, then supervised fine-tuned on TaoChat to improve instruction following and conversational behavior. It uses the TaoNet MLA + RoPE architecture and keeps the small-model design goal of the original project.
Generation
Transformers
import torch
from transformers import AutoModelForCausalLM
model_id = "Lobakkang/TaoNet-pico-A2-instruct"
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.bfloat16 if device == "cuda" else torch.float32
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
torch_dtype=dtype,
).to(device)
model.eval()
Sample inference
import sys
from pathlib import Path
import torch
from transformers import AutoModelForCausalLM
ROOT = Path(__file__).resolve().parent
SRC = ROOT / "src"
if str(SRC) not in sys.path:
sys.path.insert(0, str(SRC))
from taoTrain.inference.loading import load_tokenizer
MODEL_ID = "Lobakkang/TaoNet-pico-A2-instruct"
TOKENIZER_PATH = ROOT / "tokenizer" / "tokenizer.model"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
dtype = torch.bfloat16 if device.type == "cuda" else torch.float32
tokenizer = load_tokenizer(TOKENIZER_PATH)
model = AutoModelForCausalLM.from_pretrained(
MODEL_ID,
trust_remote_code=True,
torch_dtype=dtype,
).to(device)
model.eval()
prompt = "<user>Hello world<assistant>"
input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)
with torch.no_grad():
output_ids = model.generate(
input_ids=input_ids,
max_new_tokens=32,
do_sample=True,
temperature=0.7,
top_p=0.95,
)
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
Benchmarks
These results are from the instruction-tuned checkpoint:
| Task | Score |
|---|---|
| MMLU | 0.2292 |
| HellaSwag | 0.2653 |
| ARC Easy | 0.3531 |
| ARC Challenge | 0.2381 |
| PIQA | 0.5555 |
| Winogrande | 0.4972 |
| Mean Primary Score | 0.3564 |
Limitations
This checkpoint is best treated as a small instruction-tuned research model rather than a production assistant.
- It can still miss constraints, follow ambiguous prompts poorly, or produce incomplete answers.
- It is not a substitute for a safety-reviewed assistant or a domain-specific system.
- Output quality is expected to vary outside the data and evaluation distribution used during pretraining and SFT.
- As with other small language models, it can produce plausible but incorrect text.
- It is not guaranteed to handle long-context, tool use, or safety-critical prompts reliably.
- The model should be validated before any downstream deployment or product use.
Training
Model
- Architecture: TaoNet / MLA + RoPE
- Tokenizer: SentencePiece
- Framework: TaoTrain
- Pretraining dataset: TaoData
- SFT dataset: TaoChat
Hardware
- GPU: 1 x RTX 5090
Software
- Training framework: TaoTrain
License
MIT License.
Citation
@software{taonet_pico_a2_instruct,
title={TaoNet Pico A2 Instruct},
author={Felix Thian},
year={2026},
url={https://huggingface.co/TaoTern/TaoNet-pico-A2-instruct}
}
If you use TaoTrain, TaoData, or TaoChat in a writeup, cite the corresponding project references in addition to this checkpoint.
- Downloads last month
- 12