Support our open-source dataset and model releases!

IMG_3906

Esper 4 is an agentic coding, architecture, DevOps, and MLOps specialist built on Qwen 3.6 27B!

Prompting Guide

Esper 4 uses the Qwen3.6-27B prompt format.

Use Esper 4 with your agentic framework of choice or as a stand-alone chat and code assistant.

Example inference script to get started:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "ValiantLabs/Qwen3.6-27B-Esper4"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Implement CQRS for network appliance config management.\n\nRequirements:\n- Write side: 200 commands/sec, 4 command handlers, SQLite with custom journaling\n- Read side: 1000 queries/sec, 3 read projections in shared memory segments\n- Eventual consistency window: 100ms max\n- Handle atomic swap of projection memory for rebuilds\n- Binary configuration format versioning for schema evolution\n- Framework: libevent with custom protocol parser\n\nConstraints:\n- Manual memory management only, no garbage collection\n- Lock-free data structures where possible\n- Shared memory projections must survive process restarts\n- Command handlers must be thread-safe with 4 worker threads\n- Projection rebuild must not block queries\n- Binary format must support forward/backward compatibility\n- Error handling for corrupted journal recovery\n- Memory-mapped I/O for shared segments\n- Zero-copy where possible for performance\n\nDeliverables:\n1. Command processing pipeline with journaling\n2. Projection engine with shared memory management\n3. Query dispatcher with read-your-writes consistency\n4. Schema evolution system with versioned binary format\n5. Integration with libevent for network I/O\n6. Stress test showing 200 cmd/s + 1000 q/s sustained\n\nAssume x86_64 Linux, pthreads, atomic operations. No high-level frameworks."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=100000
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 248069 (</think>)
    index = len(output_ids) - output_ids[::-1].index(248069)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)

image/jpeg

Esper 4 is created by Valiant Labs.

Check out our HuggingFace page to see all of our models!

We care about open source. For everyone to use.

Downloads last month
27
Safetensors
Model size
28B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ValiantLabs/Qwen3.6-27B-Esper4

Base model

Qwen/Qwen3.6-27B
Finetuned
(244)
this model
Quantizations
2 models

Datasets used to train ValiantLabs/Qwen3.6-27B-Esper4

Space using ValiantLabs/Qwen3.6-27B-Esper4 1