Syntiox-1.0-Flash (3.7B)

Hugging Face | GitHub | Launch Blog | Documentation
License: Apache 2.0 | Authors: Syntiox Research Team & Developer Community

Syntiox-1.0-Flash is a 3.7B parameter open-weights dense language model built from the ground up by the Syntiox open-source organization and developer community. Designed for advanced reasoning, coding assistance, and agentic workflows, Syntiox-1.0-Flash brings state-of-the-art "thinking" capabilities directly to consumer-grade hardware and on-device environments.

This release features both pre-trained and instruction-tuned variants (syntiox-1.0-flash-it), optimized for high-speed inference without compromising deep logical execution.


Key Capability & Architectural Advancements

  • Native "Flash-Thinking" Mode: Features a built-in step-by-step internal reasoning mechanism, allowing the model to decompose complex math, logic, and coding problems before generating the final answer.
  • Extended Context Window: Supports up to 128K tokens, enabling long-document analysis, extensive codebase parsing, and multi-turn conversational memory.
  • Optimized for Consumer Hardware & On-Device: With a compact 3.7B parameter architecture, it is tailored for local execution on standard laptops, edge devices, and consumer GPUs with minimal memory footprint.
  • Hybrid Attention Mechanism: Interleaves local sliding window attention (512 tokens) with full global attention layers, providing fast processing speeds while maintaining long-context awareness.
  • Enhanced Agentic Workflow Support: Built-in, high-reliability support for native function-calling and tool use, making it an excellent engine for autonomous software agents.

Model Overview

Property Syntiox-1.0-Flash (Dense)
Total Parameters 3.7B (Effective parameters)
Layers 32
Sliding Window Size 512 tokens
Context Length 128,000 (128K) tokens
Vocabulary Size 262,144 (262K) tokens
Supported Modalities Text (Inputs & Outputs)
Position Embeddings Proportional RoPE (p-RoPE)

Benchmark Results

Syntiox-1.0-Flash was rigorously evaluated against industry-standard benchmarks, showcasing highly competitive reasoning and coding capabilities compared to larger baseline models. (Results listed are for the Instruction-Tuned variant).

Benchmark Syntiox-1.0-Flash (3.7B) Baseline Model A (7B) Baseline Model B (3B Class)
MMLU Pro 71.2% 68.5% 58.2%
AIME 2026 (No Tools) 44.8% 35.0% 21.4%
LiveCodeBench v6 54.3% 46.2% 31.0%
GPQA Diamond 56.1% 44.5% 35.2%
BigBench Extra Hard 36.5% 31.2% 19.8%

Getting Started

You can deploy and run Syntiox-1.0-Flash using the standard Hugging Face transformers library.

Installation

Ensure your environment is up to date:

pip install -U transformers torch accelerate

Basic Usage Inference

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

MODEL_ID = "syntiox/syntiox-1.0-flash-it"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Structure prompt using Native System Prompt Support
messages = [
    {"role": "system", "content": "You are Syntiox AI V1, a helpful and precise assistant."},
    {"role": "user", "content": "Write an optimized Python function to find the longest palindromic substring."}
]

# Apply chat template (To enable reasoning/thinking mode, keep enable_thinking=True if supported)
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
input_len = inputs["input_ids"].shape[-1]

# Generate Response
outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.7, top_p=0.95)
response = tokenizer.decode(outputs[0][input_len:], skip_special_tokens=True)

print(response)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support