teolm30
/

How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="teolm30/fox-1.6",
	filename="model.gguf",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Fox 1.6

Fox 1.6 is a compact, device-friendly fork of Qwen/Qwen3-1.7B for general-purpose text generation.

What this repo is

  • A lightweight Hugging Face model repo with real weights and tokenizer files
  • Intended to be easier to deploy on consumer hardware than much larger LLMs
  • Branded and published under the Fox 1.6 name

What this repo is not

  • It is not trained from scratch here
  • It is not a claim that the model beats every large frontier model
  • It is a fork of the upstream Qwen 3 1.7B checkpoint

Intended use

  • On-device or low-footprint inference
  • Assistant-style chat and completion
  • Rapid experimentation with a compact base model

Notes

If you want Fox 1.6 to become a genuinely new model, the next step is to fine-tune this fork on a curated instruction dataset and evaluate it against your target benchmarks.

🤖 Run with Ollama

ollama run hf.co/teolm30/fox-1.6
Downloads last month
611
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
Input a message to start chatting with teolm30/fox-1.6.

Model tree for teolm30/fox-1.6

Finetuned
Qwen/Qwen3-1.7B
Quantized
(279)
this model