Instructions to use lerugray/spectre-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use lerugray/spectre-7b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="lerugray/spectre-7b",
	filename="spectre-qwen2-5-7b-instruct-Q5_K_M.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use lerugray/spectre-7b with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf lerugray/spectre-7b:Q5_K_M
# Run inference directly in the terminal:
llama-cli -hf lerugray/spectre-7b:Q5_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf lerugray/spectre-7b:Q5_K_M
# Run inference directly in the terminal:
llama-cli -hf lerugray/spectre-7b:Q5_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf lerugray/spectre-7b:Q5_K_M
# Run inference directly in the terminal:
./llama-cli -hf lerugray/spectre-7b:Q5_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf lerugray/spectre-7b:Q5_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf lerugray/spectre-7b:Q5_K_M

Use Docker

docker model run hf.co/lerugray/spectre-7b:Q5_K_M

LM Studio
Jan

vLLM

How to use lerugray/spectre-7b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "lerugray/spectre-7b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lerugray/spectre-7b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/lerugray/spectre-7b:Q5_K_M

Ollama
How to use lerugray/spectre-7b with Ollama:
```
ollama run hf.co/lerugray/spectre-7b:Q5_K_M
```

Unsloth Studio

How to use lerugray/spectre-7b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for lerugray/spectre-7b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for lerugray/spectre-7b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for lerugray/spectre-7b to start chatting

Atomic Chat new
Docker Model Runner
How to use lerugray/spectre-7b with Docker Model Runner:
```
docker model run hf.co/lerugray/spectre-7b:Q5_K_M
```

Lemonade

How to use lerugray/spectre-7b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull lerugray/spectre-7b:Q5_K_M

Run and chat with the model

lemonade run user.spectre-7b-Q5_K_M

List all available models

lemonade list

spectre: a Karl Marx register model

A 7B voice tune that writes in the register of Karl Marx: the political economist, the theorist of capital, the correspondent who dissected how the bourgeois order actually works. The conceit is Marx himself, answering as the New-York Tribune correspondent he once was. A spectre is haunting your VRAM.

v2 (2026-06-16): retrained (full fine-tune) to trim a tendency in the prior build to complete into "published-article" scaffolding — fabricated datelines, invented titles, and bracketed citations. v2 answers more as a man speaking aloud than as an article for print. Weights updated in place; same conceit, same public-domain sources.

It channels the analysis, not a biography. The model trains on Marx and Engels's own voice-bearing works in their public-domain English: the Manifesto, the Eighteenth Brumaire, Wage-Labour and Capital, Value Price and Profit, The Civil War in France, the Critique of the Gotha Programme. What it learns is the cadence — the patient exposure of contradiction, the long argumentative sentence, the contempt for the self-deceiving.

What it does

Ask it about labour, capital, the commodity, the state, religion, or the present day and it answers in the analytical-polemical register. It etymologises the modern through the nineteenth century: asked about the gig economy it reaches for the horse-cart and routes back to wage-labour. It does not reassure. It dissects.

How it was built

Base: Qwen2.5-7B-Instruct.
Method: completion-style causal-LM fine-tuning, QLoRA at rank 32, adapter merged onto the fp16 base before GGUF conversion. ~37 minutes on one rented A6000-class GPU.
Source: six public-domain Marx/Engels works in their public-domain English translations (Moore's 1888 Manifesto, Eleanor Marx Aveling's Value Price and Profit, etc.), transcriptions from marxists.org. Roughly 1,200 completion records (authentic chunks oversampled) plus a small (~4%) modern-bridge set so the voice can reach present questions. The corpus is not published.
Inference: a lead-in frame ("One puts to Karl Marx this question…") elicits the first-person voice; plain chat narrates about Marx instead of as him.

Intended use

Creative writing, political-theory pedagogy in a register, tabletop and interactive fiction, voice prototyping. It is a register, not a source. Treat its output as generated prose, not as Marx's documented positions or as fact.

Limitations and honest notes

It invents freely — names, dates, citations, events. It will confidently attribute a letter to a date that never existed. Read it for the voice, not the record.
It occasionally recites. A verbatim-regurgitation audit (24 generations vs the training corpus) found a mean longest-verbatim-run of ~7 words and one generation that reproduced a 38-word span of the Communist Manifesto. That span is Moore's 1888 translation — public domain — so it carries no copyright exposure; it is flagged here only as a transparency note about memorisation. No copyrighted translation and no synthetic bridge text was reproduced at length.
Period framing. It reasons from the nineteenth century outward, which is the point and also the limit.

License

CC-BY-NC-4.0. The source works and their English translations are public domain, so the weights could ship permissively; the non-commercial clause is a deliberately conservative choice given the synthetic modern-bridge component and the persona framing. Attribution: Ray Weiss / The Elect. Source texts: marxists.org (public domain). No warranty.

Downloads last month: 40

GGUF

Model size

8B params

Architecture

qwen2

Hardware compatibility

5-bit

Model tree for lerugray/spectre-7b

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Quantized

(341)

this model