Instructions to use preparebuddy/ielts-9b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use preparebuddy/ielts-9b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="preparebuddy/ielts-9b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("preparebuddy/ielts-9b") model = AutoModelForMultimodalLM.from_pretrained("preparebuddy/ielts-9b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use preparebuddy/ielts-9b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "preparebuddy/ielts-9b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "preparebuddy/ielts-9b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/preparebuddy/ielts-9b
- SGLang
How to use preparebuddy/ielts-9b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "preparebuddy/ielts-9b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "preparebuddy/ielts-9b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "preparebuddy/ielts-9b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "preparebuddy/ielts-9b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use preparebuddy/ielts-9b with Docker Model Runner:
docker model run hf.co/preparebuddy/ielts-9b
- PrepareBuddy IELTS-9B (Qwen3.5) โ the most factually accurate
- Where the 9B fits (the honest study result)
- Where it fits best (real-world use cases)
- Pros & cons
- Links
- What it generates
- Prompt format (not a chat model โ use the tag prefix)
- Examples (the family's shared output format; answer keys verified)
- Supported types per section โ and how to prompt each
- Generating a full exam section (one passage โ all question types)
- Usage (transformers)
- Recommended architecture for reliable output (important)
- Strengths & honest limits (9B)
- Training
- License
- The 2B / 4B / 9B family โ pick the right one
- Getting better results: grounding + the re-checking loop (the biggest quality lever)
- Prompt tips
- Where the 9B fits (the honest study result)
PrepareBuddy IELTS-9B (Qwen3.5) โ the most factually accurate
The largest model in the family. A fine-tune of Qwen3.5-9B (Apache-2.0) on PrepareBuddy's curated IELTS content. It generates IELTS practice across all four sections.
A content generator, not an assessment tool. It writes passages, transcripts, tasks, questions and answer keys. It does not score student work. A fine-tune of Qwen3.5-9B โ not a from-scratch foundation model.
Where the 9B fits (the honest study result)
Part of our 2B/4B/9B study. Two things the size buys โ and one it doesn't:
- โ Best world-knowledge / facts: from-scratch passages get names, dates and places right more often than the smaller models (e.g. correct "Central Asia" where a 2B wrote the wrong region).
- โ 100% completion answers verbatim-in-passage.
- โ Size did not buy better verdict reasoning โ fine-tuning was flat (base 79% โ ft 77%), and a fine-tuned 2B matched it on verdicts. So pick the 9B for fact-heavy from-scratch generation, not because it reasons better about verdicts.
Full numbers: technical report.
Where it fits best (real-world use cases)
Reach for the 9B when factual fidelity is the priority and you have a GPU:
- Fact-heavy from-scratch passages โ science, history, geography: it gets names/dates/places right most often.
- A content factory on a GPU server โ bulk-generate a question bank where passage accuracy matters most.
- The top-quality tier behind a paid product, paired with grounding + verification.
- Sentence/summary completion โ 100% in-passage answers.
Not the best pick for laptops/edge or cost-sensitive serving (โ 2B/4B), or if you expect better verdict reasoning โ size didn't buy that (a fine-tuned 2B matched it).
Pros & cons
| โ Pros | โ ๏ธ Cons |
|---|---|
| Best facts in the family; 100% completion | Heaviest โ ~18 GB VRAM bf16 (quantise to ~5โ6 GB) |
| Strong Writing / Speaking / Listening | MCQ answer-position skews to "B" โ fix at serving |
| Most reliable from-scratch passages | Verdict reasoning no better than the 2B (~8% easy / โ25% varied) |
| Apache-2.0; the quality ceiling of the set | Best on a GPU/server, not a laptop |
Links
- ๐ง Models: ielts-2b ยท ielts-4b ยท ielts-9b
- ๐ป Apple Silicon / LM Studio (MLX): ielts-2b-mlx ยท ielts-4b-mlx
- ๐ Try the live demo: Hugging Face Space
- ๐ Full technical report & findings: ielts-qwen3.5
What it generates
| Section | Types | Output |
|---|---|---|
| Reading | TFNG, YNNG, MCQ, Sentence/Summary Completion, Matching, Long-form | passage + questions + answer key with justifications |
| Writing | Task 1 (chart), Task 2 (essay) | task prompt + word limit + timing (+ chart data for T1) |
| Listening | dialogue/monologue | transcript + questions + answer key (text for downstream TTS) |
| Speaking | Part 1, 2, 3 | examiner question / cue card + model answer |
Prompt format (not a chat model โ use the tag prefix)
<TEST=IELTS><SECTION=READING><TYPE=TFNG><DIFF=medium><TOPIC=ocean currents> Generate a short passage with 4 True/False/Not Given statements and an answer key.
SECTION=READING | WRITING | LISTENING | SPEAKINGTYPE(Reading) =TFNG | YNNG | MCQ | MCQ_MULTI | SENTENCE_COMPLETION | SUMMARY_COMPLETION | MATCHING_HEADINGS | MATCHING_FEATURES | MATCHING_ENDINGS | LONGFORM; (Writing)TASK1 | TASK2; (Speaking)PART1 | PART2 | PART3; (Listening)LISTENINGDIFF=easy | medium | hard- Serve with
enable_thinking=Falseโ for this task, reasoning mode lowers verdict accuracy.
Examples (the family's shared output format; answer keys verified)
These illustrate the format all three models produce โ the 9B generates it with the highest factual accuracy and 100% completion grounding. Reading examples use the recommended grounded approach (generate against a real passage), so every answer key here was checked against its passage. (The 9B needs a GPU to run โ ~18 GB in bf16 โ so we show the shared verified format rather than a laptop generation.)
Writing โ Task 2 (temp 0.7)
Input
<TEST=IELTS><SECTION=WRITING><TYPE=TASK2><DIFF=medium><TOPIC=whether university education should be free> Generate an IELTS Academic Writing Task 2 question with word limit and timing.
Output
TASK:
Some people believe that university education should be free for all students who wish to
pursue a degree, while others think this would result in very high costs.
To what extent do you agree or disagree?
WORD LIMIT: 250 words TIME: 40 minutes
Reading โ True/False/Not Given (grounded, temp 0.3 โ keys verified)
Input
<TEST=IELTS><SECTION=READING><TYPE=TFNG><DIFF=medium><TOPIC=coral reefs> Using ONLY the passage below, write True/False/Not Given statements with an answer key. Do not write a new passage.
PASSAGE: Coral reefs โฆ are built by tiny animals called polyps, which secrete calcium carbonate to form hard skeletons. Reefs grow only in warm, shallow, sunlit water โฆ The Great Barrier Reef, off Australia, is the largest reef system in the world.
Output
QUESTIONS:
1. Coral reefs are built by tiny animals called polyps that secrete calcium carbonate.
2. Coral reefs grow in cold, deep water because the polyps depend on algae for food.
3. The Great Barrier Reef is the largest reef system on Earth.
ANSWER KEY:
1. TRUE โ "built by tiny animals called polyps, which secrete calcium carbonate".
2. FALSE โ the passage says reefs "grow only in warm, shallow, sunlit water", not cold deep water.
3. TRUE โ "The Great Barrier Reef โฆ is the largest reef system in the world".
Reading โ Multiple Choice (grounded, temp 0.3 โ key verified)
Output
QUESTION โ Choose the correct letter, A, B, C or D.
Why do coral polyps depend on algae living in their tissues?
A. The algae provide calcium carbonate to build the skeleton.
B. Algae help polyps survive in cold, deep water.
C. The algae supply food to the polyps through photosynthesis.
D. Algae protect polyps from bleaching.
ANSWER KEY:
1. C โ the passage states polyps "depend on algae living in their tissues for food"; the other options are not stated.
(โ ๏ธ MCQ correct-answer letter skews toward "B" on the 9B โ spread answer positions at serving, or only publish pre-checked MCQ.)
Reading โ Sentence Completion (grounded, temp 0.3 โ 9B answers 100% in-passage)
Output
QUESTIONS โ Complete each sentence with ONE word from the passage.
1. Reefs grow only in warm, shallow water because the polyps depend on algae for _____. โ food
2. When water becomes too warm, the polyps expel the algae and turn white โ a process known as _____. โ bleaching
Supported types per section โ and how to prompt each
| Section ยท Type | Prompt <TYPE=โฆ> |
Temp | What you get |
|---|---|---|---|
| Reading ยท True/False/Not Given | TFNG |
0.3 | passage + statements + key |
| Reading ยท Yes/No/Not Given | YNNG |
0.3 | opinion passage + statements + key |
| Reading ยท Multiple choice | MCQ / MCQ_MULTI |
0.3 | passage + AโD question(s) + key (watch the "B" skew) |
| Reading ยท Sentence/Summary completion | SENTENCE_COMPLETION / SUMMARY_COMPLETION |
0.3 | gap items + key (9B: 100% in-passage) |
| Reading ยท Matching | MATCHING_* |
0.5 | matching task + key (experimental) |
| Reading ยท Long-form | LONGFORM |
0.6 | ~600-word passage + mixed questions + key |
| Writing ยท Task 1 / Task 2 | TASK1 / TASK2 |
0.7 | task + word limit + timing |
| Speaking ยท Part 1/2/3 | PART1 / PART2 / PART3 |
0.7 | examiner question / cue card + model answer |
| Listening | LISTENING |
0.7 | transcript + questions + key |
Tip โ for dependable Reading keys, generate grounded: prepend a real passage and add "Using ONLY the passage below โฆ Do not write a new passage."
Generating a full exam section (one passage โ all question types)
Generate one passage, then each question type against it:
<โฆTYPE=LONGFORMโฆ> Write ONLY a ~600-word IELTS reading passage. No questions.- For each type:
Using ONLY the passage below, write 5 TFNG statements with an answer key. Do not write a new passage.\nPASSAGE:\n<passage> - Concatenate โ a real-exam-style section. The 9B's strong facts make its from-scratch passages the most reliable base. (The demo Space does this.)
Usage (transformers)
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
repo = "preparebuddy/ielts-9b"
tok = AutoTokenizer.from_pretrained(repo)
# ~18 GB in bf16 (24 GB+ GPU). For smaller GPUs, load 4-bit (~5โ6 GB):
model = AutoModelForCausalLM.from_pretrained(repo, dtype=torch.bfloat16, device_map="auto").eval()
SYSTEM = ("You generate authentic IELTS Academic practice content across reading, writing, "
"listening, and speaking. Produce passages, transcripts, tasks, questions, and answer "
"keys or model answers as appropriate to the section. Use IELTS-style register: "
"academic, neutral, factually plausible. This is content generation, not assessment.")
user = "<TEST=IELTS><SECTION=READING><TYPE=TFNG><DIFF=medium><TOPIC=solar power> Generate a short passage with 4 True/False/Not Given statements and an answer key."
inp = tok.apply_chat_template([{"role":"system","content":SYSTEM},{"role":"user","content":user}],
add_generation_prompt=True, enable_thinking=False, return_tensors="pt", return_dict=True).to(model.device)
out = model.generate(**inp, max_new_tokens=900, do_sample=True, temperature=0.3, top_p=0.9)
print(tok.decode(out[0][inp["input_ids"].shape[1]:], skip_special_tokens=True))
Settings: temp 0.3 for verdicts (TFNG/YNNG/MCQ), 0.7 for passages/writing/speaking; top_p 0.9; one SECTION+TYPE per call. VRAM: ~18 GB bf16; quantise (load_in_4bit=True) to ~5โ6 GB for normal GPUs. The 9B's home is code/server or a GPU Space rather than a laptop.
Recommended architecture for reliable output (important)
The 9B is a strong drafter with the family's best facts. For dependable answer keys, run it as a system:
- Ground โ generate against a real passage (the 9B's strong facts already make from-scratch passages reliable, but grounding removes residual risk).
- Verify โ re-check each answer key with an independent judge (a trained 2B is a cheap, effective verifier) and flag disagreements.
- Review/regenerate the small flagged minority.
Measured end-to-end: raw grounded generation โ 75% โ โ 85โ90% with this verify loop.
Strengths & honest limits (9B)
- โ Best facts in the family; 100% completion; strong Writing/Speaking/Listening; 0 non-English-token leak.
- โ ๏ธ MCQ answer-position skews to "B" (in our sample the correct answer was B 7/7) โ spread the position at serving, or only publish pre-checked MCQ.
- โ ๏ธ Verdict reasoning is no better than the smaller models โ ~8% logic slips on easy items (โ25% on varied from-scratch); use grounding + verification.
- โ ๏ธ Heaviest to run โ ~18 GB VRAM (bf16); best on a GPU/server, not a laptop.
- Listening/Speaking output is text (for downstream TTS); no audio. Not an assessment tool.
Training
LoRA fine-tune of Qwen3.5-9B (bf16; r16/ฮฑ32; completion-only loss; enable_thinking=False; 2 epochs, lr 1e-4), trained on an NVIDIA L40S (48 GB) cloud GPU, on 1,438 curated + balanced examples (โโ
NOT GIVEN in verdict types). Dataset not released (proprietary). Full method, hardware and results: technical report.
License
Apache-2.0, inheriting from Qwen3.5-9B. Free to use, modify, distribute (incl. commercially); retain attribution to the base model and PrepareBuddy.
The 2B / 4B / 9B family โ pick the right one
| ielts-2b | ielts-4b โญ | ielts-9b | |
|---|---|---|---|
| Best for | cheapest; best verdict judge/verifier | balanced general use | best facts (from scratch) |
| Verdict accuracy (fine-tuned)ยน | 80% | 74% | 77% |
| Completion answers in-passage | โ ๏ธ 37% | โ 100% | โ 100% |
| Facts in from-scratch passages | weakest | good | โ best |
| MCQ answer-position | ok | ok | โ ๏ธ skews "B" |
| Size (bf16) | ~5 GB | ~9 GB | ~18 GB |
| Use with grounding | strongly | recommended | recommended |
ยน greedy, 101-item held-out gold. Fine-tuning's benefit is inversely proportional to base capability โ it transformed the 2B (+40) and was flat on the 4B/9B. Full method + findings + tables: technical report.
Getting better results: grounding + the re-checking loop (the biggest quality lever)
1. Ground โ generate against a real passage so facts come from the source:
Using ONLY the passage below, write 4 True/False/Not Given statements with an answer key. Do NOT write a new passage.
PASSAGE: <your real passage>
2. Re-check (verify) โ independently re-judge each answer key, flag disagreements:
for statement in generated_statements:
verdict = judge(model, passage, statement) # TRUE / FALSE / NOT GIVEN
if verdict != generated_key[statement]:
flag_for_review_or_regenerate(statement) # the verifier catches ~75-80% of errors
A trained 2B catches ~75โ80% of verdict errors as a verifier (cheap); any 4B+ works too. 3. Review / regenerate the flagged minority. Measured: โ 75% raw โ โ 85โ90% with this loop.
Prompt tips
- Always use the tag prefix
<TEST=IELTS><SECTION=โฆ><TYPE=โฆ><DIFF=โฆ><TOPIC=โฆ>โ it's not a chat model. - Temperature: 0.3 for verdicts, 0.7 for passages/writing/speaking; top_p 0.9.
enable_thinking=Falseโ reasoning mode lowers verdict accuracy.- One SECTION+TYPE per call; build a full section by generating each type against one shared passage.
- Downloads last month
- 18


