Instructions to use Fordentinc/book-builder-bookwriter-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Fordentinc/book-builder-bookwriter-v1 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Fordentinc/book-builder-bookwriter-v1", filename="book-builder-bookwriter-v1-F16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use Fordentinc/book-builder-bookwriter-v1 with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Fordentinc/book-builder-bookwriter-v1:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Fordentinc/book-builder-bookwriter-v1:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Fordentinc/book-builder-bookwriter-v1:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Fordentinc/book-builder-bookwriter-v1:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Fordentinc/book-builder-bookwriter-v1:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf Fordentinc/book-builder-bookwriter-v1:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Fordentinc/book-builder-bookwriter-v1:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf Fordentinc/book-builder-bookwriter-v1:Q4_K_M
Use Docker
docker model run hf.co/Fordentinc/book-builder-bookwriter-v1:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use Fordentinc/book-builder-bookwriter-v1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Fordentinc/book-builder-bookwriter-v1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Fordentinc/book-builder-bookwriter-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Fordentinc/book-builder-bookwriter-v1:Q4_K_M
- Ollama
How to use Fordentinc/book-builder-bookwriter-v1 with Ollama:
ollama run hf.co/Fordentinc/book-builder-bookwriter-v1:Q4_K_M
- Unsloth Studio new
How to use Fordentinc/book-builder-bookwriter-v1 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Fordentinc/book-builder-bookwriter-v1 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Fordentinc/book-builder-bookwriter-v1 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Fordentinc/book-builder-bookwriter-v1 to start chatting
- Pi new
How to use Fordentinc/book-builder-bookwriter-v1 with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Fordentinc/book-builder-bookwriter-v1:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Fordentinc/book-builder-bookwriter-v1:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Fordentinc/book-builder-bookwriter-v1 with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Fordentinc/book-builder-bookwriter-v1:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Fordentinc/book-builder-bookwriter-v1:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use Fordentinc/book-builder-bookwriter-v1 with Docker Model Runner:
docker model run hf.co/Fordentinc/book-builder-bookwriter-v1:Q4_K_M
- Lemonade
How to use Fordentinc/book-builder-bookwriter-v1 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Fordentinc/book-builder-bookwriter-v1:Q4_K_M
Run and chat with the model
lemonade run user.book-builder-bookwriter-v1-Q4_K_M
List all available models
lemonade list
- 🚧 WORK IN PROGRESS — v1 EARLY RELEASE 🚧
- A bigger, better model (v2) is in development.
- book-builder-bookwriter-v1
🚧 WORK IN PROGRESS — v1 EARLY RELEASE 🚧
A bigger, better model (v2) is in development.
This page is the v1 early release of book-builder-bookwriter-v1. It is not the final model. Skim the rest of this card before you generate anything so you know what you're getting and what you're not.
book-builder-bookwriter-v1
A 7.6B-parameter prose-writing model fine-tuned on 10,211 human-authored novels (310,316 chapters, ~1.82 billion tokens) with a strict story-bible-to-chapter format. Built for BookBuilder to generate novel chapters from structured story bibles.
⚠ Work in Progress (v1, early release)
This is an early checkpoint of an ongoing training run. Training was paused at step 5000 / 9697 (~52% of one epoch) so the artifacts could be released publicly while the larger run continues.
Known limitations of v1:
- Treats bibles as style prompts, not strict plot instructions. Expect drift from the synopsis on one-shot generation.
- Does not reliably follow "FORBIDDEN" rules, character role assignments, or per-character constraints in rich BookBuilder-style bibles.
- Loaded keywords in synopses (proper names like "Stillwater", specific years like "1947") can trigger off-topic associations from the training corpus.
- On longer generations, may drift into paragraph-level repetition loops without the tuned sampling defaults in
ollama/Modelfile.What's coming:
- v2 (Q3/Q4 2026): Larger base model (Qwen 2.5 14B or 32B) + synthesized instruction-following data so the model can actually obey FORBIDDEN sections, distinguish protagonists from antagonists, and follow beat sheets. This addresses the main v1 limitation.
- v1 continuation to step 9697: the LoRA may be taken to the full single-epoch checkpoint and re-released as
book-builder-bookwriter-v1.1if testing shows it's worth the additional training compute. The resumable training state is preserved at branchresumable-step-5000.Best use for v1 right now: as a prose-style backbone inside a structured pipeline (like BookBuilder itself) that provides per-chapter beat sheets and plot anchors at generation time. The pipeline supplies the discipline the v1 model can't enforce on its own. For one-shot "give me a chapter from a synopsis" use, v1 produces readable prose but will frequently drift from the intended plot.
This is NOT a chat model. Do not prompt it like ChatGPT. See "How to use" below.
What this model does
You write a Story Bible in the format shown below (or fill in the template).
You give the model the bible plus ### Chapter.
The model writes the chapter prose.
It will not answer questions. It will not respond to "Write me a story about X." It only continues prose conditioned on the bible context.
Format the model expects
Every training example looked exactly like this:
### Bible
Title: [book title]
Author: [author name]
Genre: [genre]
Publisher: [publisher]
Synopsis: [1-3 paragraphs describing the book]
### Genre
[genre]
### Chapter
[chapter title]
[chapter prose...]
Your prompt MUST end at ### Chapter\n[chapter title]\n\n and the model fills in the prose.
How to use
Option 1: Ollama (easiest)
IMPORTANT: A plain ollama pull from HF discards any Modelfile parameters in the repo, so Ollama runs the model with its default sampling — which on long completion prompts causes repetition loops and over-long generations. Use the setup script below to register the model under the name bookbuilder with tuned defaults that prevent both problems.
One-shot setup:
curl -sSL https://huggingface.co/Fordentinc/book-builder-bookwriter-v1/resolve/main/ollama/setup_ollama.sh | bash
# Optional: pass a quant tag, default is Q5_K_M
# curl -sSL https://huggingface.co/Fordentinc/book-builder-bookwriter-v1/resolve/main/ollama/setup_ollama.sh | bash -s Q8_0
Then:
ollama run bookbuilder < your_bible.txt
The script pulls the GGUF, builds a local Modelfile with the right repeat_penalty 1.18, num_predict 2500, and stop tokens, and registers the result as bookbuilder.
If you'd rather pull manually (without the loop fix), you need to pass sampling flags every time:
ollama pull hf.co/Fordentinc/book-builder-bookwriter-v1:Q5_K_M
ollama run hf.co/Fordentinc/book-builder-bookwriter-v1:Q5_K_M \
--num-predict 2500 --repeat-penalty 1.18 --temperature 0.75
Available quants:
Q4_K_M(4.7 GB) - fits on 8 GB GPUs, some token-decoding artifactsQ5_K_M(5.4 GB) - balanced, recommended for 24 GB cardsQ8_0(8.1 GB) - near-lossless, cleaner token decoding than K-quantsF16(15.2 GB) - full precision, no quantization artifacts
Option 2: LM Studio
- Search for
Fordentinc/book-builder-bookwriter-v1 - Download the Q5_K_M quant
- Switch to Completion mode (not Chat). This is critical.
- Paste your filled-in bible as the input
- Generate
Option 3: llama.cpp
./llama-cli -hf Fordentinc/book-builder-bookwriter-v1:Q5_K_M \
--temp 0.8 --top-p 0.95 -n 2048 \
-f your_bible.txt
Option 4: Transformers + PEFT (Python, full bf16)
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
base_id = "Qwen/Qwen2.5-7B"
adapter = "Fordentinc/book-builder-bookwriter-v1"
tok = AutoTokenizer.from_pretrained(adapter)
base = AutoModelForCausalLM.from_pretrained(base_id, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(base, adapter)
model.eval()
bible_and_chapter_header = open("your_bible.txt").read()
inputs = tok(bible_and_chapter_header, return_tensors="pt").to("cuda")
out = model.generate(
**inputs,
max_new_tokens=2048,
do_sample=True, temperature=0.8, top_p=0.95,
repetition_penalty=1.05,
)
print(tok.decode(out[0], skip_special_tokens=True))
Option 5: vLLM (production / OpenAI-compatible API)
vLLM cannot load the LoRA adapter alone; use the merged bf16 weights instead:
vllm serve Fordentinc/book-builder-bookwriter-v1 --dtype bfloat16 --max-model-len 16384
Step-by-step: from blank page to chapter
- Download the template: bible_template.txt
- Fill in every field. The model relies on each section to anchor character voices, setting, and tone.
- Save as plain text (e.g.
my_book.txt). - End the file with
### Chapterfollowed by your chapter title and one blank line. Example:### Chapter Chapter 1: The Long Drive Home - Run inference using one of the options above.
- The model writes ~1500-4000 words of prose, then stops or hits your
max_new_tokenscap. - For chapter 2: keep the same bible, change the chapter header, optionally append the last paragraph of chapter 1 so the model continues smoothly.
See example_bibles/ for two complete working examples.
Recommended sampling parameters
| Parameter | Value | Why |
|---|---|---|
temperature |
0.8 | Lower = repetitive, higher = incoherent |
top_p |
0.95 | Standard nucleus sampling |
repetition_penalty |
1.05 | Prevents loops; do not push past 1.15 |
max_new_tokens |
2048-4096 | Most chapters land in 1500-3500 tokens |
min_p |
0.05 (if supported) | Better than top_k for prose |
Training details
- Base model: Qwen 2.5 7B (Apache 2.0)
- Method: QLoRA, r=16, alpha=32, dropout 0.05
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Training corpus: 10,211 novels with derivable story bibles (Sci-Fi, Thriller, Romance, Crime, Western, Fantasy, etc.)
- Corpus stats: 310,316 chapter rows, ~1.4 billion words, ~1.82 billion tokens
- Hardware: 1× NVIDIA B200 (180 GB HBM3e)
- Sequence length: 2048
- Effective batch: 32 (per-device 4, grad-accum 8)
- Optimizer: paged_adamw_8bit
- LR: 2e-4 cosine, 3% warmup
- Epochs: 1
- Wall time: ~13.5 hours
- Data cleanup: em-dashes removed (replaced with commas), smart-quotes normalized, residual front-matter stripped, chapters with <800 or >6000 words filtered out
Quirks and limitations
- No system messages, no chat history, no
[INST]tags. The model was never shown those during training. - Bibles outside its training distribution (highly experimental forms, non-Western names, modern slang heavy) may produce uneven results.
- Names follow Western conventions (USA/UK/Italy/Western Europe). The training filter excluded other naming traditions.
- Em-dashes are absent from training data and the model will rarely produce them. This is intentional.
- Chapter length is learned from data (avg ~4,500 words). To force shorter chapters, cap
max_new_tokens.
License
Apache 2.0 (inherited from base Qwen 2.5 7B). You may use commercially.
Citation
@misc{bookbuilder_bookwriter_v1_2026,
author = {Fordentinc},
title = {book-builder-bookwriter-v1: A prose-writing LoRA on Qwen 2.5 7B},
year = {2026},
url = {https://huggingface.co/Fordentinc/book-builder-bookwriter-v1},
}
Reporting issues
Open a discussion on this model page. Include the bible you used (first 500 chars) and the first 200 chars of the model output.
- Downloads last month
- 197
Model tree for Fordentinc/book-builder-bookwriter-v1
Base model
Qwen/Qwen2.5-7B