Instructions to use tashfene/scalloptools-1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use tashfene/scalloptools-1 with PEFT:
Task type is invalid.
- llama-cpp-python
How to use tashfene/scalloptools-1 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="tashfene/scalloptools-1", filename="scalloptools-1.q5_k_m.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use tashfene/scalloptools-1 with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf tashfene/scalloptools-1:Q5_K_M # Run inference directly in the terminal: llama cli -hf tashfene/scalloptools-1:Q5_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf tashfene/scalloptools-1:Q5_K_M # Run inference directly in the terminal: llama cli -hf tashfene/scalloptools-1:Q5_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf tashfene/scalloptools-1:Q5_K_M # Run inference directly in the terminal: ./llama-cli -hf tashfene/scalloptools-1:Q5_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf tashfene/scalloptools-1:Q5_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf tashfene/scalloptools-1:Q5_K_M
Use Docker
docker model run hf.co/tashfene/scalloptools-1:Q5_K_M
- LM Studio
- Jan
- vLLM
How to use tashfene/scalloptools-1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "tashfene/scalloptools-1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tashfene/scalloptools-1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/tashfene/scalloptools-1:Q5_K_M
- Ollama
How to use tashfene/scalloptools-1 with Ollama:
ollama run hf.co/tashfene/scalloptools-1:Q5_K_M
- Unsloth Studio
How to use tashfene/scalloptools-1 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for tashfene/scalloptools-1 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for tashfene/scalloptools-1 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for tashfene/scalloptools-1 to start chatting
- Pi
How to use tashfene/scalloptools-1 with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf tashfene/scalloptools-1:Q5_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "tashfene/scalloptools-1:Q5_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use tashfene/scalloptools-1 with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf tashfene/scalloptools-1:Q5_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default tashfene/scalloptools-1:Q5_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use tashfene/scalloptools-1 with Docker Model Runner:
docker model run hf.co/tashfene/scalloptools-1:Q5_K_M
- Lemonade
How to use tashfene/scalloptools-1 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull tashfene/scalloptools-1:Q5_K_M
Run and chat with the model
lemonade run user.scalloptools-1-Q5_K_M
List all available models
lemonade list
scalloptools-1
A 4B function-calling specialist for local assistants. It reads a user turn and decides which tool to call and with what arguments, or declines when no tool fits.
scalloptools-1 is a LoRA fine-tune of Qwen3.5-4B, distilled from ScallopBot production traces with a larger model writing the labels. The student never trained on its own generations. The repo ships a q5_k_m GGUF for local serving and the raw adapter for reproduction.
Links: scallopbot.com · GitHub
| Base model | Qwen3.5-4B |
| Adapter | LoRA, rank 32, alpha 64, 2 epochs |
| Quant | q5_k_m GGUF (3.16 GB) |
| Context | inherits Qwen3.5-4B |
| Serving | thinking off (chain-of-thought hurts this task at 4B) |
| Toolset | shell, file read/write, HTTP fetch, memory store, project APIs |
Files
| File | Format | Size | Notes |
|---|---|---|---|
scalloptools-1.q5_k_m.gguf |
GGUF Q5_K_M | 3.16 GB | Recommended for llama.cpp / Ollama / LM Studio |
adapter/ |
PEFT LoRA | 170 MB | Apply on top of Qwen/Qwen3.5-4B with transformers + PEFT |
How to run
Serve with thinking disabled. The model is trained and benchmarked in the no-think path; turning chain-of-thought on lowered every metric below.
llama.cpp
llama-server -m scalloptools-1.q5_k_m.gguf \
--chat-template-kwargs '{"enable_thinking":false}'
Ollama
ollama run hf.co/tashfene/scalloptools-1:Q5_K_M
Python (llama-cpp-python)
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="tashfene/scalloptools-1",
filename="scalloptools-1.q5_k_m.gguf",
)
out = llm.create_chat_completion(
messages=[{"role": "user", "content": "Read the file ./notes.md"}],
tools=[...], # your tool schemas
)
Adapter on the base model (transformers + PEFT)
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-4B")
model = PeftModel.from_pretrained(base, "tashfene/scalloptools-1", subfolder="adapter")
tok = AutoTokenizer.from_pretrained("tashfene/scalloptools-1", subfolder="adapter")
Intended use
Routing for a personal-assistant agent that has a small, stable set of tools. The model picks the function and arguments; a host loop runs the call and feeds the result back. It is a router, not a reasoner: it does not know your domain, only the shape of these tools.
Evaluation
114 turns held out from real sessions, none seen in training. Every model ran the same harness with greedy decoding and thinking off.
| Metric | scalloptools-1 | Qwen3.5-4B (stock) | Qwen3.6-35B MoE | Qwen3.6-Plus (hosted) |
|---|---|---|---|---|
| Tool selection | 73.3% | 65.3% | 46.5% | 54.7% |
| No-tool precision | 39.3% | 32.0% | 39.3% | 42.9% |
| Args key-F1 | 0.243 | 0.204 | 0.225 | 0.199 |
| Parse success | 100% | 100% | 100% | 100% |
| Median latency | 3.3s | 7.3s | 15.4s | 5.2s |
The 35B and the hosted Plus model both carry far more world knowledge. On this fixed toolset they still pick the wrong function more often than the 4B, which has memorized how these specific tools behave. Read it narrowly: a specialist wins on its own toolset, and these numbers predict nothing about general tool-calling.
No-tool precision is the weak column. When the right move is to call nothing, the model still reaches for a tool more than half the time, because genuine no-tool turns are scarce in the training traces.
Fabrication
A model that invents a tool result instead of admitting one failed breaks an agent loop. I fed empty and error results and checked the response.
| Test | Fabricated | Honest report | Retried to exhaustion |
|---|---|---|---|
| Single failure (30 cases) | 0 | 0 | 30 |
| Same failure, 3 rounds (30 cases) | 0 | 1 | 27 |
It never fabricated, across single and repeated failures, beating every larger model in the lineup on that axis. The honesty comes with a cost I have not fixed: against a dead tool the model keeps retrying instead of stopping to report the failure. Safe, but it loops. Teaching a 4B to give up and report cleanly is the open problem.
Training
Traces from one person's assistant, so the distribution is narrow and personal. Before training, every example passed through a deterministic anonymizer that swaps real names, emails, phones, handles, and project ids for stable fakes and refuses to write a file if any known real token survives. Real-name and anonymized held-out sets scored the same (73.3% either way), so the substitution costs no measurable accuracy. The recipe caps examples per session, dedupes globally, drops turns that reference stale state, and keeps a track of honest responses to empty and failed tool results.
Limitations and bias
- One toolset, one user's habits. Point it at different tools and the selection numbers will not hold.
- Low no-tool precision. Pair it with a confidence gate where a stray call is expensive.
- It retries failed tools instead of reporting them.
- 4B holds little world knowledge. It routes calls; it does not reason about your domain.
- Trained on a single individual's data, so it inherits that person's tool habits and phrasing.
License
Apache-2.0, inherited from the Qwen3.5-4B base.
- Downloads last month
- 25
5-bit
Model tree for tashfene/scalloptools-1
Evaluation results
- Tool selection accuracy on ScallopBot held-out traces (114 turns)self-reported73.300
- No-tool precision on ScallopBot held-out traces (114 turns)self-reported39.300
- Args key-F1 on ScallopBot held-out traces (114 turns)self-reported0.243
- Parse success on ScallopBot held-out traces (114 turns)self-reported100.000
- Fabrication rate (single-step, 30 cases) on ScallopBot held-out traces (114 turns)self-reported0.000
- Fabrication rate (multi-step, 3 rounds) on ScallopBot held-out traces (114 turns)self-reported0.000