Instructions to use NotHereNorThere/YapLlama-1b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use NotHereNorThere/YapLlama-1b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="NotHereNorThere/YapLlama-1b",
	filename="Q5_K_S.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use NotHereNorThere/YapLlama-1b with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf NotHereNorThere/YapLlama-1b:Q5_K_S
# Run inference directly in the terminal:
llama-cli -hf NotHereNorThere/YapLlama-1b:Q5_K_S

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf NotHereNorThere/YapLlama-1b:Q5_K_S
# Run inference directly in the terminal:
llama-cli -hf NotHereNorThere/YapLlama-1b:Q5_K_S

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf NotHereNorThere/YapLlama-1b:Q5_K_S
# Run inference directly in the terminal:
./llama-cli -hf NotHereNorThere/YapLlama-1b:Q5_K_S

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf NotHereNorThere/YapLlama-1b:Q5_K_S
# Run inference directly in the terminal:
./build/bin/llama-cli -hf NotHereNorThere/YapLlama-1b:Q5_K_S

Use Docker

docker model run hf.co/NotHereNorThere/YapLlama-1b:Q5_K_S

LM Studio
Jan
Ollama
How to use NotHereNorThere/YapLlama-1b with Ollama:
```
ollama run hf.co/NotHereNorThere/YapLlama-1b:Q5_K_S
```

Unsloth Studio

How to use NotHereNorThere/YapLlama-1b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for NotHereNorThere/YapLlama-1b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for NotHereNorThere/YapLlama-1b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for NotHereNorThere/YapLlama-1b to start chatting

How to use NotHereNorThere/YapLlama-1b with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf NotHereNorThere/YapLlama-1b:Q5_K_S

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "NotHereNorThere/YapLlama-1b:Q5_K_S"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use NotHereNorThere/YapLlama-1b with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf NotHereNorThere/YapLlama-1b:Q5_K_S

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default NotHereNorThere/YapLlama-1b:Q5_K_S

Run Hermes

hermes

Docker Model Runner
How to use NotHereNorThere/YapLlama-1b with Docker Model Runner:
```
docker model run hf.co/NotHereNorThere/YapLlama-1b:Q5_K_S
```

Lemonade

How to use NotHereNorThere/YapLlama-1b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull NotHereNorThere/YapLlama-1b:Q5_K_S

Run and chat with the model

lemonade run user.YapLlama-1b-Q5_K_S

List all available models

lemonade list

YapLlama-1B

Llama 3.2-1B fine-tuned on 600 OpenThoughts rows for chain-of-thought reasoning.

Named honestly. It will show its work, at length, whether you asked for that or not.

What it is

QLoRA fine-tune of Llama 3.2-1B-Instruct on a sampled subset of OpenThoughts-114k. Goal was to transfer structured CoT reasoning behavior into a 1B model quickly and cheaply. It didn't quite get the memo.

Training

Setting	Value
Base model	`Llama-3.2-1B-Instruct`
Method	QLoRA (4-bit NF4, LoRA r=16)
Dataset	OpenThoughts-114k, 600 rows sampled
Hardware	RTX 4060 8GB
Attention	FlashAttention 2
Packing	Enabled

Short eval results

"A bat and a ball cost $1.10 total. The bat costs $1.00 more than the ball. How much does the ball cost?"

Clean algebra, correct calculations, structured steps, passed!

"I have a 3 gallon jug and a 5 gallon jug. I need exactly 4 gallons. How?"

Right intuition, wrong intermediate steps, got lucky, C+.

"There are 12 fish in a tank. Half of them drown. How many are left?"

Accepted false premise, confidently answered 6, failed.

"A train leaves Chicago at 60mph. Another leaves New York 2 hours later at 90mph. The cities are 790 miles apart. Where do they meet?"

Yapped for 3 minutes at ~200 tk/s, filled its context, and had a meltdown.

Honest assessment

CoT format transferred cleanly on well-formed algebra problems. Verbosity is through the roof. Llama 3.2's base personality bleeds through, producing longer and sometimes circular reasoning before landing on an answer. Fits the name.

State tracking is marginally better than previous tests but still unreliable, often gets correct intuitions through broken intermediate reasoning rather than genuine simulation. Premise checking is absent entirely, consistent with a training set of well-formed problems where the model never had to question the question.

Roughly ties with my other model Qwemini-0.5B-Alpha on eval despite 2x the parameters. Dataset quality and premise-checking coverage matter more than model size at this scale.

Inference speed (llama.cpp, GGUF, 1B, RTX 4060)

Format	Speed
f16	~90 tok/s
Q5_K_S	~220 tok/s

Run Q5_K_S. The quality difference from f16 is negligible at 1B, the speed and VRAM difference is not.

What would improve it

Premise-checking traces (~50 examples where the model catches and rejects a false setup)
More data — 600 rows is enough to transfer the format, not enough to deeply generalize
Bigger base — 3B would close the state tracking gap significantly

Downloads last month: 151

GGUF

Model size

1B params

Architecture

llama

Hardware compatibility

5-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including NotHereNorThere/YapLlama-1b

Distilling and FT

Collection

Serious *attempts* at transferring knowledge and capacity. • 2 items • Updated 4 days ago