Instructions to use sdougbrown/FastContext-1.0-4B-RL-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use sdougbrown/FastContext-1.0-4B-RL-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="sdougbrown/FastContext-1.0-4B-RL-GGUF", filename="FastContext-1.0-4B-RL-Q4_K_M.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use sdougbrown/FastContext-1.0-4B-RL-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf sdougbrown/FastContext-1.0-4B-RL-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf sdougbrown/FastContext-1.0-4B-RL-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf sdougbrown/FastContext-1.0-4B-RL-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf sdougbrown/FastContext-1.0-4B-RL-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf sdougbrown/FastContext-1.0-4B-RL-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf sdougbrown/FastContext-1.0-4B-RL-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf sdougbrown/FastContext-1.0-4B-RL-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf sdougbrown/FastContext-1.0-4B-RL-GGUF:Q4_K_M
Use Docker
docker model run hf.co/sdougbrown/FastContext-1.0-4B-RL-GGUF:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use sdougbrown/FastContext-1.0-4B-RL-GGUF with Ollama:
ollama run hf.co/sdougbrown/FastContext-1.0-4B-RL-GGUF:Q4_K_M
- Unsloth Studio
How to use sdougbrown/FastContext-1.0-4B-RL-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for sdougbrown/FastContext-1.0-4B-RL-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for sdougbrown/FastContext-1.0-4B-RL-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for sdougbrown/FastContext-1.0-4B-RL-GGUF to start chatting
- Pi
How to use sdougbrown/FastContext-1.0-4B-RL-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf sdougbrown/FastContext-1.0-4B-RL-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "sdougbrown/FastContext-1.0-4B-RL-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use sdougbrown/FastContext-1.0-4B-RL-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf sdougbrown/FastContext-1.0-4B-RL-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default sdougbrown/FastContext-1.0-4B-RL-GGUF:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use sdougbrown/FastContext-1.0-4B-RL-GGUF with Docker Model Runner:
docker model run hf.co/sdougbrown/FastContext-1.0-4B-RL-GGUF:Q4_K_M
- Lemonade
How to use sdougbrown/FastContext-1.0-4B-RL-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull sdougbrown/FastContext-1.0-4B-RL-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.FastContext-1.0-4B-RL-GGUF-Q4_K_M
List all available models
lemonade list
FastContext-1.0-4B-RL β Q4_K_M GGUF
Community GGUF quantization of microsoft/FastContext-1.0-4B-RL, built with llama.cpp.
FastContext is a repo exploration subagent (Qwen3-4B backbone) trained to locate relevant files and return compact file:line citations. It's designed to run alongside a primary coding agent, offloading all file search so the main agent's context stays clean.
Quantization
| File | Method | Size |
|---|---|---|
FastContext-1.0-4B-RL-Q4_K_M.gguf |
Q4_K_M | 2.5 GB |
Built from BF16 safetensors using:
python convert_hf_to_gguf.py microsoft/FastContext-1.0-4B-RL --outtype bf16
llama-quantize model-bf16.gguf model-q4_k_m.gguf Q4_K_M
Usage
Serve with llama.cpp:
llama-server -m FastContext-1.0-4B-RL-Q4_K_M.gguf \
--alias fastcontext --port 8084 \
-ngl 999 -fa on -c 131072 \
--temp 0.6 --top-p 0.95 --top-k 20 \
--parallel 4 --jinja --no-mmap
Drive it via the harness (handles the path bug β see below):
git clone https://github.com/sdougbrown/fastcontext-harness
python fc_explore.py /path/to/repo "where is the auth logic?"
Path bug β important for local use
FastContext was trained on SWE-bench instances where repos are Docker-mounted at /<repo-name>/. The model generates paths like /myrepo/cmd/main.go even when the actual workspace is /home/user/Code/myrepo. In the training environment this resolves correctly; locally, every tool call fails and the model fabricates a final answer.
The RL variant improves on SFT β in testing it returned correct full absolute paths on familiar workspaces where SFT hallucinated. But path truncation still fires on external repos, and when it does the RL model answers confidently with invented file structures rather than spiralling (arguably a harder failure mode to catch).
fastcontext-harness has a 15-line resolve_path() fix and annotated examples showing both variants across two codebases.
SFT vs RL
The SFT GGUF is available from mitkox/FastContext-1.0-4B-SFT-Q4_K_M-GGUF. Quick comparison from local testing (Q4_K_M, llama.cpp):
| SFT + path fix | RL (no fix) | |
|---|---|---|
| Familiar workspace | β correct after path correction | β correct full paths natively |
| External repo | β correct after path correction | β truncated path + invented file structure |
| Failure mode when wrong | spiral β visible | confident 1-shot β silent |
With the resolve_path() fix applied, SFT is the more reliable choice for arbitrary repos. RL is notably better on workspaces where paths match its training distribution.
See examples/rl-vs-sft.txt in the harness repo for annotated run output.
- Downloads last month
- 120
4-bit
Model tree for sdougbrown/FastContext-1.0-4B-RL-GGUF
Base model
Qwen/Qwen3-4B-Instruct-2507