Instructions to use yugeshkarunamurthy/FastContext-1.0-4B-oQ4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use yugeshkarunamurthy/FastContext-1.0-4B-oQ4 with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("yugeshkarunamurthy/FastContext-1.0-4B-oQ4")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps Settings
LM Studio

How to use yugeshkarunamurthy/FastContext-1.0-4B-oQ4 with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "yugeshkarunamurthy/FastContext-1.0-4B-oQ4"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "yugeshkarunamurthy/FastContext-1.0-4B-oQ4"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use yugeshkarunamurthy/FastContext-1.0-4B-oQ4 with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "yugeshkarunamurthy/FastContext-1.0-4B-oQ4"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default yugeshkarunamurthy/FastContext-1.0-4B-oQ4

Run Hermes

hermes

MLX LM

How to use yugeshkarunamurthy/FastContext-1.0-4B-oQ4 with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "yugeshkarunamurthy/FastContext-1.0-4B-oQ4"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "yugeshkarunamurthy/FastContext-1.0-4B-oQ4"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "yugeshkarunamurthy/FastContext-1.0-4B-oQ4",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

FastContext-1.0-4B-SFT-oQ4

An oQ4 quantized version of FastContext-1.0-4B-SFT optimized for Apple Silicon using oMLX.

This model preserves the repository exploration capabilities of FastContext while significantly reducing memory usage and improving inference efficiency through mixed-precision oQ quantization.

About FastContext

FastContext is a lightweight repository-exploration subagent designed for coding agents. Instead of having a single model perform both repository exploration and problem solving, FastContext specializes in repository discovery and evidence gathering using parallel tool calls.

The model explores repositories through:

READ
GLOB
GREP

and returns concise file paths and line references for downstream coding agents.

Original model: FastContext-1.0-4B-SFT.

Quantization

This release uses:

Quantization: oQ4
Format: MLX
Target Platform: Apple Silicon
Mixed Precision: Enabled
Optimized for local inference

The oQ quantization pipeline allocates higher precision to more sensitive weights while aggressively compressing less important regions of the network, providing a strong quality-to-size ratio.

Recommended Inference Settings

For best performance:

temperature: 0.7
top_p: 0.6
top_k: 20
min_p: 0
repetition_penalty: 1.05
presence_penalty: 1.5
thinking: true

oMLX Preset

temp: 0.7
top_p: 0.6
top_k: 20
min_p: 0
rep_penalty: 1.05
presence_penalty: 1.5
enable_thinking: true

These settings were selected to improve repository exploration quality, encourage broader search behavior, and maintain stable citation generation.

Example Usage

from mlx_lm import load, generate

model, tokenizer = load("FastContext-1.0-4B-SFT-oQ4")

prompt = "Find where authentication tokens are validated."

response = generate(
    model,
    tokenizer,
    prompt=prompt,
    temp=0.7,
    top_p=0.6,
    top_k=20,
)

print(response)

Intended Use

This model is intended for:

Repository exploration
Codebase navigation
SWE-bench style workflows
Coding agents
Retrieval and evidence gathering
Search-heavy software engineering tasks

It is not intended to replace a primary coding model. FastContext works best as a specialized exploration subagent paired with a stronger reasoning or code-generation model.

Performance

FastContext was trained specifically to improve repository exploration efficiency and reduce the token overhead associated with repository search. The original paper reports improved end-to-end coding-agent performance while reducing token consumption across multiple SWE benchmarks.

Recommended Deployment

Apple Silicon:

M1 Pro / Max
M2 Pro / Max / Ultra
M3 Series
M4 Series

Works well with:

MLX
oMLX
Open WebUI
LM Studio (MLX builds)
Custom agent frameworks

Credits

Microsoft FastContext Team
Qwen Team
Apple MLX
oMLX

Citation

Please cite the original FastContext paper when using this model in research:

@misc{zhang2026fastcontexttrainingefficientrepository,
      title={FastContext: Training Efficient Repository Explorer for Coding Agents},
      author={Shaoqiu Zhang and Maoquan Wang and Yuling Shi and Yuhang Wang and Xiaodong Gu and Yongqiang Yao and Rao Fu and Shengyu Fu},
      year={2026},
      eprint={2606.14066},
      archivePrefix={arXiv},
      primaryClass={cs.SE}
}

Downloads last month: -

Safetensors

Model size

0.7B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Paper for yugeshkarunamurthy/FastContext-1.0-4B-oQ4

FastContext: Training Efficient Repository Explorer for Coding Agents

Paper • 2606.14066 • Published 5 days ago • 75