Instructions to use tashfene/scalloptools-1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries
PEFT
How to use tashfene/scalloptools-1 with PEFT:
```
Task type is invalid.
```

How to use tashfene/scalloptools-1 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="tashfene/scalloptools-1",
	filename="scalloptools-1.q5_k_m.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use tashfene/scalloptools-1 with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf tashfene/scalloptools-1:Q5_K_M
# Run inference directly in the terminal:
llama cli -hf tashfene/scalloptools-1:Q5_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf tashfene/scalloptools-1:Q5_K_M
# Run inference directly in the terminal:
llama cli -hf tashfene/scalloptools-1:Q5_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf tashfene/scalloptools-1:Q5_K_M
# Run inference directly in the terminal:
./llama-cli -hf tashfene/scalloptools-1:Q5_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf tashfene/scalloptools-1:Q5_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf tashfene/scalloptools-1:Q5_K_M

Use Docker

docker model run hf.co/tashfene/scalloptools-1:Q5_K_M

LM Studio
Jan

vLLM

How to use tashfene/scalloptools-1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "tashfene/scalloptools-1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tashfene/scalloptools-1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/tashfene/scalloptools-1:Q5_K_M

Ollama
How to use tashfene/scalloptools-1 with Ollama:
```
ollama run hf.co/tashfene/scalloptools-1:Q5_K_M
```

Unsloth Studio

How to use tashfene/scalloptools-1 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for tashfene/scalloptools-1 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for tashfene/scalloptools-1 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for tashfene/scalloptools-1 to start chatting

How to use tashfene/scalloptools-1 with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf tashfene/scalloptools-1:Q5_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "tashfene/scalloptools-1:Q5_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use tashfene/scalloptools-1 with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf tashfene/scalloptools-1:Q5_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default tashfene/scalloptools-1:Q5_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use tashfene/scalloptools-1 with Docker Model Runner:
```
docker model run hf.co/tashfene/scalloptools-1:Q5_K_M
```

Lemonade

How to use tashfene/scalloptools-1 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull tashfene/scalloptools-1:Q5_K_M

Run and chat with the model

lemonade run user.scalloptools-1-Q5_K_M

List all available models

lemonade list

scalloptools-1

A 4B function-calling specialist for local assistants. It reads a user turn and decides which tool to call and with what arguments, or declines when no tool fits.

scalloptools-1 is a LoRA fine-tune of Qwen3.5-4B, distilled from ScallopBot production traces with a larger model writing the labels. The student never trained on its own generations. The repo ships a q5_k_m GGUF for local serving and the raw adapter for reproduction.

Links: scallopbot.com · GitHub


Base model	Qwen3.5-4B
Adapter	LoRA, rank 32, alpha 64, 2 epochs
Quant	q5_k_m GGUF (3.16 GB)
Context	inherits Qwen3.5-4B
Serving	thinking off (chain-of-thought hurts this task at 4B)
Toolset	shell, file read/write, HTTP fetch, memory store, project APIs

Files

File	Format	Size	Notes
`scalloptools-1.q5_k_m.gguf`	GGUF Q5_K_M	3.16 GB	Recommended for llama.cpp / Ollama / LM Studio
`adapter/`	PEFT LoRA	170 MB	Apply on top of `Qwen/Qwen3.5-4B` with transformers + PEFT

How to run

Serve with thinking disabled. The model is trained and benchmarked in the no-think path; turning chain-of-thought on lowered every metric below.

llama.cpp

llama-server -m scalloptools-1.q5_k_m.gguf \
  --chat-template-kwargs '{"enable_thinking":false}'

Ollama

ollama run hf.co/tashfene/scalloptools-1:Q5_K_M

Python (llama-cpp-python)

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="tashfene/scalloptools-1",
    filename="scalloptools-1.q5_k_m.gguf",
)
out = llm.create_chat_completion(
    messages=[{"role": "user", "content": "Read the file ./notes.md"}],
    tools=[...],          # your tool schemas
)

Adapter on the base model (transformers + PEFT)

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-4B")
model = PeftModel.from_pretrained(base, "tashfene/scalloptools-1", subfolder="adapter")
tok = AutoTokenizer.from_pretrained("tashfene/scalloptools-1", subfolder="adapter")

Intended use

Routing for a personal-assistant agent that has a small, stable set of tools. The model picks the function and arguments; a host loop runs the call and feeds the result back. It is a router, not a reasoner: it does not know your domain, only the shape of these tools.

Evaluation

114 turns held out from real sessions, none seen in training. Every model ran the same harness with greedy decoding and thinking off.

Metric	scalloptools-1	Qwen3.5-4B (stock)	Qwen3.6-35B MoE	Qwen3.6-Plus (hosted)
Tool selection	73.3%	65.3%	46.5%	54.7%
No-tool precision	39.3%	32.0%	39.3%	42.9%
Args key-F1	0.243	0.204	0.225	0.199
Parse success	100%	100%	100%	100%
Median latency	3.3s	7.3s	15.4s	5.2s

The 35B and the hosted Plus model both carry far more world knowledge. On this fixed toolset they still pick the wrong function more often than the 4B, which has memorized how these specific tools behave. Read it narrowly: a specialist wins on its own toolset, and these numbers predict nothing about general tool-calling.

No-tool precision is the weak column. When the right move is to call nothing, the model still reaches for a tool more than half the time, because genuine no-tool turns are scarce in the training traces.

Fabrication

A model that invents a tool result instead of admitting one failed breaks an agent loop. I fed empty and error results and checked the response.

Test	Fabricated	Honest report	Retried to exhaustion
Single failure (30 cases)	0	0	30
Same failure, 3 rounds (30 cases)	0	1	27

It never fabricated, across single and repeated failures, beating every larger model in the lineup on that axis. The honesty comes with a cost I have not fixed: against a dead tool the model keeps retrying instead of stopping to report the failure. Safe, but it loops. Teaching a 4B to give up and report cleanly is the open problem.

Training

Traces from one person's assistant, so the distribution is narrow and personal. Before training, every example passed through a deterministic anonymizer that swaps real names, emails, phones, handles, and project ids for stable fakes and refuses to write a file if any known real token survives. Real-name and anonymized held-out sets scored the same (73.3% either way), so the substitution costs no measurable accuracy. The recipe caps examples per session, dedupes globally, drops turns that reference stale state, and keeps a track of honest responses to empty and failed tool results.

Limitations and bias

One toolset, one user's habits. Point it at different tools and the selection numbers will not hold.
Low no-tool precision. Pair it with a confidence gate where a stray call is expensive.
It retries failed tools instead of reporting them.
4B holds little world knowledge. It routes calls; it does not reason about your domain.
Trained on a single individual's data, so it inherits that person's tool habits and phrasing.

License

Apache-2.0, inherited from the Qwen3.5-4B base.

Downloads last month: 25

GGUF

Model size

4B params

Architecture

qwen35

Hardware compatibility

5-bit

Model tree for tashfene/scalloptools-1

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

Qwen/Qwen3.5-4B

Finetuned

(312)

this model

Evaluation results

Tool selection accuracy on ScallopBot held-out traces (114 turns)
self-reported

73.300
No-tool precision on ScallopBot held-out traces (114 turns)
self-reported

39.300
Args key-F1 on ScallopBot held-out traces (114 turns)
self-reported

0.243
Parse success on ScallopBot held-out traces (114 turns)
self-reported

100.000
Fabrication rate (single-step, 30 cases) on ScallopBot held-out traces (114 turns)
self-reported

0.000
Fabrication rate (multi-step, 3 rounds) on ScallopBot held-out traces (114 turns)
self-reported

0.000