FastContext-1.0-4B-RL β€” Q4_K_M GGUF

Community GGUF quantization of microsoft/FastContext-1.0-4B-RL, built with llama.cpp.

FastContext is a repo exploration subagent (Qwen3-4B backbone) trained to locate relevant files and return compact file:line citations. It's designed to run alongside a primary coding agent, offloading all file search so the main agent's context stays clean.

Quantization

File Method Size
FastContext-1.0-4B-RL-Q4_K_M.gguf Q4_K_M 2.5 GB

Built from BF16 safetensors using:

python convert_hf_to_gguf.py microsoft/FastContext-1.0-4B-RL --outtype bf16
llama-quantize model-bf16.gguf model-q4_k_m.gguf Q4_K_M

Usage

Serve with llama.cpp:

llama-server -m FastContext-1.0-4B-RL-Q4_K_M.gguf \
  --alias fastcontext --port 8084 \
  -ngl 999 -fa on -c 131072 \
  --temp 0.6 --top-p 0.95 --top-k 20 \
  --parallel 4 --jinja --no-mmap

Drive it via the harness (handles the path bug β€” see below):

git clone https://github.com/sdougbrown/fastcontext-harness
python fc_explore.py /path/to/repo "where is the auth logic?"

Path bug β€” important for local use

FastContext was trained on SWE-bench instances where repos are Docker-mounted at /<repo-name>/. The model generates paths like /myrepo/cmd/main.go even when the actual workspace is /home/user/Code/myrepo. In the training environment this resolves correctly; locally, every tool call fails and the model fabricates a final answer.

The RL variant improves on SFT β€” in testing it returned correct full absolute paths on familiar workspaces where SFT hallucinated. But path truncation still fires on external repos, and when it does the RL model answers confidently with invented file structures rather than spiralling (arguably a harder failure mode to catch).

fastcontext-harness has a 15-line resolve_path() fix and annotated examples showing both variants across two codebases.

SFT vs RL

The SFT GGUF is available from mitkox/FastContext-1.0-4B-SFT-Q4_K_M-GGUF. Quick comparison from local testing (Q4_K_M, llama.cpp):

SFT + path fix RL (no fix)
Familiar workspace βœ“ correct after path correction βœ“ correct full paths natively
External repo βœ“ correct after path correction βœ— truncated path + invented file structure
Failure mode when wrong spiral β†’ visible confident 1-shot β†’ silent

With the resolve_path() fix applied, SFT is the more reliable choice for arbitrary repos. RL is notably better on workspaces where paths match its training distribution.

See examples/rl-vs-sft.txt in the harness repo for annotated run output.

Downloads last month
120
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for sdougbrown/FastContext-1.0-4B-RL-GGUF

Quantized
(15)
this model