Instructions to use NotHereNorThere/Qwemini-1.7b-Beta with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use NotHereNorThere/Qwemini-1.7b-Beta with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("NotHereNorThere/Qwemini-1.7b-Beta", dtype="auto") - llama-cpp-python
How to use NotHereNorThere/Qwemini-1.7b-Beta with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="NotHereNorThere/Qwemini-1.7b-Beta", filename="model-Q5_K_M.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use NotHereNorThere/Qwemini-1.7b-Beta with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf NotHereNorThere/Qwemini-1.7b-Beta:Q5_K_M # Run inference directly in the terminal: llama-cli -hf NotHereNorThere/Qwemini-1.7b-Beta:Q5_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf NotHereNorThere/Qwemini-1.7b-Beta:Q5_K_M # Run inference directly in the terminal: llama-cli -hf NotHereNorThere/Qwemini-1.7b-Beta:Q5_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf NotHereNorThere/Qwemini-1.7b-Beta:Q5_K_M # Run inference directly in the terminal: ./llama-cli -hf NotHereNorThere/Qwemini-1.7b-Beta:Q5_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf NotHereNorThere/Qwemini-1.7b-Beta:Q5_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf NotHereNorThere/Qwemini-1.7b-Beta:Q5_K_M
Use Docker
docker model run hf.co/NotHereNorThere/Qwemini-1.7b-Beta:Q5_K_M
- LM Studio
- Jan
- Ollama
How to use NotHereNorThere/Qwemini-1.7b-Beta with Ollama:
ollama run hf.co/NotHereNorThere/Qwemini-1.7b-Beta:Q5_K_M
- Unsloth Studio
How to use NotHereNorThere/Qwemini-1.7b-Beta with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for NotHereNorThere/Qwemini-1.7b-Beta to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for NotHereNorThere/Qwemini-1.7b-Beta to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for NotHereNorThere/Qwemini-1.7b-Beta to start chatting
- Pi
How to use NotHereNorThere/Qwemini-1.7b-Beta with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf NotHereNorThere/Qwemini-1.7b-Beta:Q5_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "NotHereNorThere/Qwemini-1.7b-Beta:Q5_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use NotHereNorThere/Qwemini-1.7b-Beta with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf NotHereNorThere/Qwemini-1.7b-Beta:Q5_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default NotHereNorThere/Qwemini-1.7b-Beta:Q5_K_M
Run Hermes
hermes
- Docker Model Runner
How to use NotHereNorThere/Qwemini-1.7b-Beta with Docker Model Runner:
docker model run hf.co/NotHereNorThere/Qwemini-1.7b-Beta:Q5_K_M
- Lemonade
How to use NotHereNorThere/Qwemini-1.7b-Beta with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull NotHereNorThere/Qwemini-1.7b-Beta:Q5_K_M
Run and chat with the model
lemonade run user.Qwemini-1.7b-Beta-Q5_K_M
List all available models
lemonade list
Qwemini-1.7B-Beta
Qwen3-1.7B fine-tuned on 250 Gemini 3 Pro chain-of-thought traces.
The grown-up version of Qwemini-0.5B-Alpha. Same teacher, same approach, a model that actually has the architecture to use it.
What it is
QLoRA fine-tune of Qwen3-1.7B on 250 Gemini 3 Pro structured reasoning traces. Goal was pure style transfer, Qwen3 already knows how to reason, this teaches it how we want it to reason. The native <think> token support changes everything compared to the 0.5B predecessor.
Training
| Setting | Value |
|---|---|
| Base model | Qwen/Qwen3-1.7B |
| Method | QLoRA (4-bit NF4, LoRA r=16) |
| Dataset | 250 Gemini 3 Pro CoT traces |
| Hardware | RTX 4060 8GB |
| Attention | FlashAttention 2 |
| Packing | Enabled |
Eval results
| Prompt | Result | Notes |
|---|---|---|
| Bat & ball ($1.10 problem) | ⚠️ Wrong answer, right process | Got $0.10, but thinking block caught its own error and rationalized past it anyway |
| 1/2 of 12 Fish drowning | ⚠️ Near miss | Noted "ambiguity in the question's phrasing" inside think block, answered 6 anyway — closest any model got to catching the false premise |
| Jug problem (3gal + 5gal = 4gal) | ✅ Correct strategy | Thinking block described the correct solution perfectly, written steps got slightly garbled |
| Pills trick (3 pills, every 30 min) | ⚠️ Contradicted itself | Produced two different answers (60 min and 90 min) in the same response without resolving the conflict |
The big finding
Thinking tags activated unprompted.
Qwen3's native thinking architecture survived the fine-tune intact. The model genuinely uses an internal scratchpad before answering rather than just formatting its output to look like reasoning. This is qualitatively different from every other model in the Qwemini/YapLlama/AtomCoT family — those learned the costume of reasoning. This one is actually thinking, just not always correctly.
Honest assessment
The failure modes are completely different from the smaller model, instead of confident wrong answers or structured nonsense, you get a model that notices problems, almost catches false premises, and occasionally argues with itself.
The bat and ball error is the most interesting result: the thinking block explicitly computed $1.20 ≠ $1.10 and then declared the solution valid anyway. It's not that it can't detect errors — it's that it doesn't always act on them. More data and more epochs would likely close this gap.
Compared to Qwemini-0.5B-Alpha
| 0.5B-Alpha | 1.7B-Beta | |
|---|---|---|
| Native thinking tags | ❌ | ✅ |
| Bat & ball | ✅ Correct | ⚠️ Wrong but self-aware |
| Premise checking | ❌ | ⚠️ Almost |
| Jug problem | ❌ Hallucinated | ✅ Correct strategy |
| Reasoning quality | Structured correct | Genuinely thinking |
What would improve it
- More epochs, loss was still healthy at checkpoint, room to keep learning
- Premise-checking traces, it almost caught the fish problem, 50 targeted examples would probably close it
- More data and more varietey (eg 6000 rows, Gemini 3.1 + Opus 4.6) is the natural next step
Part of
The Qwemini model family, Qwen models fine-tuned for structured reasoning.
| Model | Params | Thinking tags | Actually reasons |
|---|---|---|---|
| Qwemini-0.5B-Alpha | 500M | ❌ | ✅ simple problems |
| Qwemini-1.7B-Beta | 1.7B | ✅ | ✅ with self-correction attempts |
- Downloads last month
- 3
5-bit
16-bit