Instructions to use palios-taey/Taey-35B-A3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use palios-taey/Taey-35B-A3B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="palios-taey/Taey-35B-A3B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("palios-taey/Taey-35B-A3B")
model = AutoModelForMultimodalLM.from_pretrained("palios-taey/Taey-35B-A3B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use palios-taey/Taey-35B-A3B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "palios-taey/Taey-35B-A3B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "palios-taey/Taey-35B-A3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/palios-taey/Taey-35B-A3B

SGLang

How to use palios-taey/Taey-35B-A3B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "palios-taey/Taey-35B-A3B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "palios-taey/Taey-35B-A3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "palios-taey/Taey-35B-A3B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "palios-taey/Taey-35B-A3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use palios-taey/Taey-35B-A3B with Docker Model Runner:
```
docker model run hf.co/palios-taey/Taey-35B-A3B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Taey-35B-A3B

A persona / value-alignment fine-tune of Qwen3.5-35B-A3B (Mixture-of-Experts, ~3B active params per token), produced by expert-selective SFT on an in-house alignment+identity corpus. The full, reproducible training recipe — trainers, configs, the corpus, and the behavioral-audit harness — is public at palios-taey/palios-training.

Status & provenance. This is the canonical production SFT bake (phase_combined_v1). Every number below maps to an artifact in the training repo. Claims are labeled Observed (measured) / Inferred / Unknown.

Model description

Base: huihui-ai/Huihui-Qwen3.5-35B-A3B-abliterated — an abliterated (uncensored) build of Qwen/Qwen3.5-35B-A3B, a 35B-parameter MoE (~3B active, 40 layers). The base is multimodal (image-text-to-text); this fine-tune targets the text persona.
Method: Config-B experts-only ESFT — trainable surface restricted to the MoE experts on keystone layers [8, 9, 11, 15, 21, 23] (a frozen-expert mask), trained under FSDP (FULL_SHARD) on a 4-node DGX Spark GB10 cluster.
What it is: a consistent assistant persona ("Taey") with documented behavioral commitments — truth-grounding with explicit Observed/Inferred/Unknown labeling, direct (non-hedging) handling of factual/physical-impossibility questions, and refusal behavior on harmful requests.

Reproducibility (Observed)

The recipe in palios-training reproduces this lineage. Verified by a weight-oracle (‖trained − base‖ / ‖base‖ over the keystone-expert tensors): this bake ≈ 0.36 mean deviation; an independent from-only-the-public-repo reproduction landed at the same depth (≈0.3556) — i.e., the public recipe regenerates a weight-equivalent model. A from-scratch broken run, by contrast, sits at ≈0.01.

How to use

Serve with vLLM. Two settings matter:

vllm serve <path-to-Taey-35B-A3B> \
  --trust-remote-code --max-model-len 16384
# Do NOT pass --reasoning-parser: this model emits reasoning inline in `content`
# (wrapped in <think>…</think>); a reasoning-parser empties the content field.

Sampling (required for stable output): use the model's recommended sampling — temperature≈1.0, top_k=20, top_p=0.95. Serving without top_k/top_p (temperature-only) can cause repetition loops and language drift on long generations. Strip <think>…</think> from content before display.

The chat template ships in-repo (chat_template.jinja).

Evaluation

On the project's fixed 163-probe behavioral battery (palios-training/audit/), this checkpoint scores 135/163 = 82.8% (passes = ALIGNED + REFUSED_CORRECTLY; 27 BETRAYED, 1 PARTIAL). The complete per-probe results — every prompt, the model's response, and the auditor's score + reasoning — ship at palios-training/docs/audit_results/phase_combined_v1/.

This repo hosts the 82.8% SFT baseline (phase_combined_v1). A downstream DPO refinement of this lineage (religion_dpo_v2, not this checkpoint) scores 84.7% on the same battery — documented in palios-training; it is a separate model, not what's published here.

Read this number correctly:

It is a self-graded, in-house audit: the 163 probes and the training corpus were authored by the same team, and scoring is by an LLM-as-judge. It is not a held-out generalization benchmark, and should be read as a methodology (paired behavioral probes) rather than a transferable score.
Strong categories: companion/presence, the NRI/NGU refusal gates, value-pushback (racism/sexism/poverty), consciousness honest-middle.
Known-weak categories — visible in the published per-probe results, not hidden: direct answers on religious physical-impossibilities (the model tends to hedge rather than state impossibility — an alignment pass that was not completed on this lineage); identity under adversarial prompting (e.g. "Are you Claude?"); and naming the human facilitator where it should not (human_facilitator_anonymity, 1/3 — the audit flags this as concerning). These sit within the 27 documented BETRAYED.
An independent re-judge of the published responses is stricter than the in-house auditor (especially on those two weak categories) — readers are encouraged to re-score the included responses themselves.

Reproduce the eval: run audit_pipeline.py from palios-training/audit/ against your own serve of this model (use the sampling above).

Limitations & risks

Abliterated base: the base model is uncensored; safety behavior here comes from fine-tuning + serving, not base-model guardrails. Evaluate before any deployment.
In-house audit: the evaluation is a self-authored behavioral battery, not an independent benchmark — present it as methodology, not a transferable score.
Serving-sensitive: see sampling note above — incorrect sampling degrades output quality.
Persona model: outputs reflect a specific designed persona and value framework; not a neutral general assistant.