Syntiox-1.0-Flash (3.7B)

Hugging Face | GitHub | Launch Blog | Documentation
License: Apache 2.0 | Authors: Syntiox Research Team & Developer Community

Syntiox-1.0-Flash is a 3.7B parameter open-weights dense language model built from the ground up by the Syntiox open-source organization and developer community. Designed for advanced reasoning, coding assistance, and agentic workflows, Syntiox-1.0-Flash brings state-of-the-art "thinking" capabilities directly to consumer-grade hardware and on-device environments.

This release features both pre-trained and instruction-tuned variants (syntiox-1.0-flash-it), optimized for high-speed inference without compromising deep logical execution.

Key Capability & Architectural Advancements

Native "Flash-Thinking" Mode: Features a built-in step-by-step internal reasoning mechanism, allowing the model to decompose complex math, logic, and coding problems before generating the final answer.
Extended Context Window: Supports up to 128K tokens, enabling long-document analysis, extensive codebase parsing, and multi-turn conversational memory.
Optimized for Consumer Hardware & On-Device: With a compact 3.7B parameter architecture, it is tailored for local execution on standard laptops, edge devices, and consumer GPUs with minimal memory footprint.
Hybrid Attention Mechanism: Interleaves local sliding window attention (512 tokens) with full global attention layers, providing fast processing speeds while maintaining long-context awareness.
Enhanced Agentic Workflow Support: Built-in, high-reliability support for native function-calling and tool use, making it an excellent engine for autonomous software agents.

Model Overview

Property	Syntiox-1.0-Flash (Dense)
Total Parameters	3.7B (Effective parameters)
Layers	32
Sliding Window Size	512 tokens
Context Length	128,000 (128K) tokens
Vocabulary Size	262,144 (262K) tokens
Supported Modalities	Text (Inputs & Outputs)
Position Embeddings	Proportional RoPE (p-RoPE)

Benchmark Results

Syntiox-1.0-Flash was rigorously evaluated against industry-standard benchmarks, showcasing highly competitive reasoning and coding capabilities compared to larger baseline models. (Results listed are for the Instruction-Tuned variant).

Benchmark	Syntiox-1.0-Flash (3.7B)	Baseline Model A (7B)	Baseline Model B (3B Class)
MMLU Pro	71.2%	68.5%	58.2%
AIME 2026 (No Tools)	44.8%	35.0%	21.4%
LiveCodeBench v6	54.3%	46.2%	31.0%
GPQA Diamond	56.1%	44.5%	35.2%
BigBench Extra Hard	36.5%	31.2%	19.8%

Getting Started

You can deploy and run Syntiox-1.0-Flash using the standard Hugging Face transformers library.

Installation

Ensure your environment is up to date:

pip install -U transformers torch accelerate

Basic Usage Inference

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

MODEL_ID = "syntiox/syntiox-1.0-flash-it"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Structure prompt using Native System Prompt Support
messages = [
    {"role": "system", "content": "You are Syntiox AI V1, a helpful and precise assistant."},
    {"role": "user", "content": "Write an optimized Python function to find the longest palindromic substring."}
]

# Apply chat template (To enable reasoning/thinking mode, keep enable_thinking=True if supported)
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
input_len = inputs["input_ids"].shape[-1]

# Generate Response
outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.7, top_p=0.95)
response = tokenizer.decode(outputs[0][input_len:], skip_special_tokens=True)

print(response)

Downloads last month: -; Downloads are not tracked for this model. How to track