Instructions to use rishanthrajendhran/POLARIS-no-HRI-9B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use rishanthrajendhran/POLARIS-no-HRI-9B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="rishanthrajendhran/POLARIS-no-HRI-9B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("rishanthrajendhran/POLARIS-no-HRI-9B")
model = AutoModelForImageTextToText.from_pretrained("rishanthrajendhran/POLARIS-no-HRI-9B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use rishanthrajendhran/POLARIS-no-HRI-9B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "rishanthrajendhran/POLARIS-no-HRI-9B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rishanthrajendhran/POLARIS-no-HRI-9B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/rishanthrajendhran/POLARIS-no-HRI-9B

SGLang

How to use rishanthrajendhran/POLARIS-no-HRI-9B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "rishanthrajendhran/POLARIS-no-HRI-9B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rishanthrajendhran/POLARIS-no-HRI-9B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "rishanthrajendhran/POLARIS-no-HRI-9B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rishanthrajendhran/POLARIS-no-HRI-9B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use rishanthrajendhran/POLARIS-no-HRI-9B with Docker Model Runner:
```
docker model run hf.co/rishanthrajendhran/POLARIS-no-HRI-9B
```

POLARIS-no-HRI-9B

POLARIS-no-HRI-9B is the matched ablation variant of POLARIS-9B. It uses the same GRPO training recipe with the same structured Story Quality reward, identical hyperparameters, and the same training data — but without human-reference injection (HRI). Instead of 5 policy rollouts + 1 injected human-written story per group, it was trained with 6 policy rollouts with no reference anchor.

It is a strong creative-writing model in its own right — substantially better than the base Qwen3.5-9B — but lags POLARIS-9B most noticeably at far-transfer lengths (8–12k words).

Comparison with POLARIS-9B

The gap between this model and POLARIS-9B is small at in-distribution lengths and grows at longer requested lengths, consistent with HRI's role in maintaining gradient pressure toward stronger writing as generation extends beyond the training range.

Story Quality by requested length (GPT-5.4 judge, 180 held-out prompts)

Model	ID (1–4k)	Near OOD (4–8k)	Far OOD (8–12k)	Aggregate	Slope
POLARIS-9B	57.4	48.2	44.1	52.1	−3.0
POLARIS-no-HRI-9B	56.5	47.0	37.7	49.7	−3.8
Qwen3.5-9B (base)	35.1	8.7	−11.8	18.5	−10.8
Qwen3.5-27B	51.5	38.7	24.6	42.8	−5.9

Slope is the linear fit across the six length buckets (points per step). A steeper negative slope indicates faster quality degradation as requested length increases.

EQ-Bench Longform by requested length (GPT-5.4 judge, uniform aggregation)

Model	ID (1–4k)	Near OOD (4–8k)	Far OOD (8–12k)	Aggregate
POLARIS-9B	63.1	57.5	54.3	59.8
POLARIS-no-HRI-9B	62.1	55.7	51.6	58.2
Qwen3.5-9B (base)	50.2	37.2	30.3	42.6

Length adherence (generated / requested word count)

Model	ID (1–4k)	Near OOD (4–8k)	Far OOD (8–12k)	All
POLARIS-9B	0.99	0.87	0.72	0.90
POLARIS-no-HRI-9B	0.94	0.86	0.70	0.87
Qwen3.5-9B (base)	1.09	0.96	0.88	1.01

OOD benchmarks

Model	WritingBench (D4)	LongBench-Write	EQ-Bench Creative
POLARIS-9B	7.9	81.2	70.3
POLARIS-no-HRI-9B	7.8	82.1	69.7
Qwen3.5-9B (base)	6.8	67.1	59.2

On OOD benchmarks the two variants are essentially tied; the HRI advantage is concentrated at long in-distribution lengths where narrative coherence and arc completion are required over many thousands of tokens.

Intended Use

Long-form story generation (short-stories, flash fiction, narrative scenes)
Creative writing (essays, book reviews, podcast scripts etc)

Out-of-Scope Use

Factual or knowledge-intensive writing where correctness matters
Legal, medical, or financial content
Reproducing or recovering the withheld training stories

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "rishanthrajendhran/POLARIS-no-HRI-9B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto",
)

prompt = (
    "Write a 2000-word story about an archivist who discovers that missing "
    "library books are returning with handwritten notes from the future."
)

messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=8192,
    do_sample=True,
    temperature=0.6,
    top_p=0.95,
    top_k=20,
    repetition_penalty=1.10,
)

generated = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(generated)

Recommended Generation Settings

Identical to POLARIS-9B.

Setting	Value	Notes
`temperature`	0.4-1.0	Lower temperatures (0.4-0.6) recommended for long-form story writing
`top_p`	0.95
`top_k`	20
`repetition_penalty`	1.0-1.10
`presence_penalty`	0.0-1.5	Do no set repetition_penalty and presence_penalty together
`max_new_tokens`	14336	Minimum recommended for 8–12k target lengths
`enable_thinking`	True

Prompting

it is recommended to include an explicit length request in the prompt:

Write a 3000-word story about [premise].

At far-transfer lengths (8–12k), this model undershoots more than POLARIS-9B (length ratio ≈ 0.70 vs 0.72). For generation targets above 6k words, POLARIS-9B is the recommended variant.

Known Limitations

The same qualitative failure modes present in POLARIS-9B apply here — stylistic overloading and local coherence failures — since both models share the same base, training data, and reward. The key additional limitation of this variant relative to POLARIS-9B:

Steeper quality degradation at long lengths. Story Quality slope is −3.8 vs −3.0 for POLARIS-9B. At 8–12k words, the gap to POLARIS-9B is 6.4 Story Quality points, compared to ~1–2 points at in-distribution lengths. If your use case involves prompts requesting long stories, POLARIS-9B is the better choice.

Training

Identical to POLARIS-9B except for the group composition.

Parameter	Value
Base model	Qwen3.5-9B
Training algorithm	GRPO
Training data	~1,388 prompt–story pairs from 100 short-story anthologies
Max reference length	4,000 words
GPUs	4× A100 80GB
Training time	~48 hours
Compute cost	~$400
Judge cost	~$60 (Gemini 3 Flash, flex tier)
Training steps	160
Batch size	8 GRPO groups
Group size	6 policy rollouts (no human reference)
HRI	Disabled
Online reward judge	Gemini 3 Flash
Evaluation judge	GPT-5.4

Citation

@article{polaris2026,
  title   = {{POLARIS}: Guiding Small Models to Write Long Stories},
  author  = {Anonymous},
  journal = {ACL submission},
  year    = {2026},
}

(Citation will be updated upon acceptance.)

Downloads last month: 7

Safetensors

Model size

9B params

Tensor type

BF16

Model tree for rishanthrajendhran/POLARIS-no-HRI-9B

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B