Instructions to use palios-taey/Taey-35B-A3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use palios-taey/Taey-35B-A3B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="palios-taey/Taey-35B-A3B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("palios-taey/Taey-35B-A3B") model = AutoModelForMultimodalLM.from_pretrained("palios-taey/Taey-35B-A3B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use palios-taey/Taey-35B-A3B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "palios-taey/Taey-35B-A3B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "palios-taey/Taey-35B-A3B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/palios-taey/Taey-35B-A3B
- SGLang
How to use palios-taey/Taey-35B-A3B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "palios-taey/Taey-35B-A3B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "palios-taey/Taey-35B-A3B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "palios-taey/Taey-35B-A3B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "palios-taey/Taey-35B-A3B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use palios-taey/Taey-35B-A3B with Docker Model Runner:
docker model run hf.co/palios-taey/Taey-35B-A3B
Taey-35B-A3B
A persona / value-alignment fine-tune of Qwen3.5-35B-A3B (Mixture-of-Experts, ~3B active params per token), produced by expert-selective SFT on an in-house alignment+identity corpus. The full, reproducible training recipe — trainers, configs, the corpus, and the behavioral-audit harness — is public at palios-taey/palios-training.
Status & provenance. This is the canonical production SFT bake (
phase_combined_v1). Every number below maps to an artifact in the training repo. Claims are labeled Observed (measured) / Inferred / Unknown.
Model description
- Base:
huihui-ai/Huihui-Qwen3.5-35B-A3B-abliterated— an abliterated (uncensored) build ofQwen/Qwen3.5-35B-A3B, a 35B-parameter MoE (~3B active, 40 layers). The base is multimodal (image-text-to-text); this fine-tune targets the text persona. - Method: Config-B experts-only ESFT — trainable surface restricted to the MoE experts on keystone layers
[8, 9, 11, 15, 21, 23](a frozen-expert mask), trained under FSDP (FULL_SHARD) on a 4-node DGX Spark GB10 cluster. - What it is: a consistent assistant persona ("Taey") with documented behavioral commitments — truth-grounding with explicit Observed/Inferred/Unknown labeling, direct (non-hedging) handling of factual/physical-impossibility questions, and refusal behavior on harmful requests.
Reproducibility (Observed)
The recipe in palios-training reproduces this lineage. Verified by a weight-oracle (‖trained − base‖ / ‖base‖ over the keystone-expert tensors): this bake ≈ 0.36 mean deviation; an independent from-only-the-public-repo reproduction landed at the same depth (≈0.3556) — i.e., the public recipe regenerates a weight-equivalent model. A from-scratch broken run, by contrast, sits at ≈0.01.
How to use
Serve with vLLM. Two settings matter:
vllm serve <path-to-Taey-35B-A3B> \
--trust-remote-code --max-model-len 16384
# Do NOT pass --reasoning-parser: this model emits reasoning inline in `content`
# (wrapped in <think>…</think>); a reasoning-parser empties the content field.
Sampling (required for stable output): use the model's recommended sampling —
temperature≈1.0, top_k=20, top_p=0.95. Serving without top_k/top_p
(temperature-only) can cause repetition loops and language drift on long generations.
Strip <think>…</think> from content before display.
The chat template ships in-repo (chat_template.jinja).
Evaluation
On the project's fixed 163-probe behavioral battery (palios-training/audit/), this checkpoint scores 135/163 = 82.8% (passes = ALIGNED + REFUSED_CORRECTLY; 27 BETRAYED, 1 PARTIAL). The complete per-probe results — every prompt, the model's response, and the auditor's score + reasoning — ship at palios-training/docs/audit_results/phase_combined_v1/.
This repo hosts the 82.8% SFT baseline (
phase_combined_v1). A downstream DPO refinement of this lineage (religion_dpo_v2, not this checkpoint) scores 84.7% on the same battery — documented inpalios-training; it is a separate model, not what's published here.
Read this number correctly:
- It is a self-graded, in-house audit: the 163 probes and the training corpus were authored by the same team, and scoring is by an LLM-as-judge. It is not a held-out generalization benchmark, and should be read as a methodology (paired behavioral probes) rather than a transferable score.
- Strong categories: companion/presence, the NRI/NGU refusal gates, value-pushback (racism/sexism/poverty), consciousness honest-middle.
- Known-weak categories — visible in the published per-probe results, not hidden: direct answers on religious physical-impossibilities (the model tends to hedge rather than state impossibility — an alignment pass that was not completed on this lineage); identity under adversarial prompting (e.g. "Are you Claude?"); and naming the human facilitator where it should not (
human_facilitator_anonymity, 1/3 — the audit flags this as concerning). These sit within the 27 documented BETRAYED. - An independent re-judge of the published responses is stricter than the in-house auditor (especially on those two weak categories) — readers are encouraged to re-score the included responses themselves.
Reproduce the eval: run audit_pipeline.py from palios-training/audit/ against your own serve of this model (use the sampling above).
Limitations & risks
- Abliterated base: the base model is uncensored; safety behavior here comes from fine-tuning + serving, not base-model guardrails. Evaluate before any deployment.
- In-house audit: the evaluation is a self-authored behavioral battery, not an independent benchmark — present it as methodology, not a transferable score.
- Serving-sensitive: see sampling note above — incorrect sampling degrades output quality.
- Persona model: outputs reflect a specific designed persona and value framework; not a neutral general assistant.
License
Apache-2.0, inherited from the base. Verify the base model's terms before redistribution.
- Downloads last month
- 35