Instructions to use lerugray/spectre-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use lerugray/spectre-7b with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="lerugray/spectre-7b", filename="spectre-qwen2-5-7b-instruct-Q5_K_M.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use lerugray/spectre-7b with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf lerugray/spectre-7b:Q5_K_M # Run inference directly in the terminal: llama-cli -hf lerugray/spectre-7b:Q5_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf lerugray/spectre-7b:Q5_K_M # Run inference directly in the terminal: llama-cli -hf lerugray/spectre-7b:Q5_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf lerugray/spectre-7b:Q5_K_M # Run inference directly in the terminal: ./llama-cli -hf lerugray/spectre-7b:Q5_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf lerugray/spectre-7b:Q5_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf lerugray/spectre-7b:Q5_K_M
Use Docker
docker model run hf.co/lerugray/spectre-7b:Q5_K_M
- LM Studio
- Jan
- vLLM
How to use lerugray/spectre-7b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "lerugray/spectre-7b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lerugray/spectre-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/lerugray/spectre-7b:Q5_K_M
- Ollama
How to use lerugray/spectre-7b with Ollama:
ollama run hf.co/lerugray/spectre-7b:Q5_K_M
- Unsloth Studio
How to use lerugray/spectre-7b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for lerugray/spectre-7b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for lerugray/spectre-7b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for lerugray/spectre-7b to start chatting
- Atomic Chat new
- Docker Model Runner
How to use lerugray/spectre-7b with Docker Model Runner:
docker model run hf.co/lerugray/spectre-7b:Q5_K_M
- Lemonade
How to use lerugray/spectre-7b with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull lerugray/spectre-7b:Q5_K_M
Run and chat with the model
lemonade run user.spectre-7b-Q5_K_M
List all available models
lemonade list
spectre: a Karl Marx register model
A 7B voice tune that writes in the register of Karl Marx: the political economist, the theorist of capital, the correspondent who dissected how the bourgeois order actually works. The conceit is Marx himself, answering as the New-York Tribune correspondent he once was. A spectre is haunting your VRAM.
v2 (2026-06-16): retrained (full fine-tune) to trim a tendency in the prior build to complete into "published-article" scaffolding — fabricated datelines, invented titles, and bracketed citations. v2 answers more as a man speaking aloud than as an article for print. Weights updated in place; same conceit, same public-domain sources.
It channels the analysis, not a biography. The model trains on Marx and Engels's own voice-bearing works in their public-domain English: the Manifesto, the Eighteenth Brumaire, Wage-Labour and Capital, Value Price and Profit, The Civil War in France, the Critique of the Gotha Programme. What it learns is the cadence — the patient exposure of contradiction, the long argumentative sentence, the contempt for the self-deceiving.
What it does
Ask it about labour, capital, the commodity, the state, religion, or the present day and it answers in the analytical-polemical register. It etymologises the modern through the nineteenth century: asked about the gig economy it reaches for the horse-cart and routes back to wage-labour. It does not reassure. It dissects.
How it was built
- Base: Qwen2.5-7B-Instruct.
- Method: completion-style causal-LM fine-tuning, QLoRA at rank 32, adapter merged onto the fp16 base before GGUF conversion. ~37 minutes on one rented A6000-class GPU.
- Source: six public-domain Marx/Engels works in their public-domain English translations (Moore's 1888 Manifesto, Eleanor Marx Aveling's Value Price and Profit, etc.), transcriptions from marxists.org. Roughly 1,200 completion records (authentic chunks oversampled) plus a small (~4%) modern-bridge set so the voice can reach present questions. The corpus is not published.
- Inference: a lead-in frame ("One puts to Karl Marx this question…") elicits the first-person voice; plain chat narrates about Marx instead of as him.
Intended use
Creative writing, political-theory pedagogy in a register, tabletop and interactive fiction, voice prototyping. It is a register, not a source. Treat its output as generated prose, not as Marx's documented positions or as fact.
Limitations and honest notes
- It invents freely — names, dates, citations, events. It will confidently attribute a letter to a date that never existed. Read it for the voice, not the record.
- It occasionally recites. A verbatim-regurgitation audit (24 generations vs the training corpus) found a mean longest-verbatim-run of ~7 words and one generation that reproduced a 38-word span of the Communist Manifesto. That span is Moore's 1888 translation — public domain — so it carries no copyright exposure; it is flagged here only as a transparency note about memorisation. No copyrighted translation and no synthetic bridge text was reproduced at length.
- Period framing. It reasons from the nineteenth century outward, which is the point and also the limit.
License
CC-BY-NC-4.0. The source works and their English translations are public domain, so the
weights could ship permissively; the non-commercial clause is a deliberately conservative
choice given the synthetic modern-bridge component and the persona framing. Attribution:
Ray Weiss / The Elect. Source texts: marxists.org (public domain). No warranty.
- Downloads last month
- 40
5-bit