Instructions to use groundhogLLM/ACC-Qwen3-30B-A3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use groundhogLLM/ACC-Qwen3-30B-A3B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="groundhogLLM/ACC-Qwen3-30B-A3B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("groundhogLLM/ACC-Qwen3-30B-A3B")
model = AutoModelForCausalLM.from_pretrained("groundhogLLM/ACC-Qwen3-30B-A3B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use groundhogLLM/ACC-Qwen3-30B-A3B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "groundhogLLM/ACC-Qwen3-30B-A3B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "groundhogLLM/ACC-Qwen3-30B-A3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/groundhogLLM/ACC-Qwen3-30B-A3B

SGLang

How to use groundhogLLM/ACC-Qwen3-30B-A3B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "groundhogLLM/ACC-Qwen3-30B-A3B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "groundhogLLM/ACC-Qwen3-30B-A3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "groundhogLLM/ACC-Qwen3-30B-A3B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "groundhogLLM/ACC-Qwen3-30B-A3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use groundhogLLM/ACC-Qwen3-30B-A3B with Docker Model Runner:
```
docker model run hf.co/groundhogLLM/ACC-Qwen3-30B-A3B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

ACC-Qwen3-30B-A3B

This is the official checkpoint for the paper ACC: Compiling Agent Trajectories for Long-Context Training.

Overview

We fine-tuned Qwen3-30B-A3B-Thinking with Agent Context Compilation (ACC) — a method that converts multi-turn agent trajectories (Search, SWE, SQL) into long-context QA pairs for direct supervised fine-tuning. Unlike standard agent SFT that masks tool responses, ACC assembles scattered evidence across turns into a single context, enabling explicit supervision of long-range dependency modeling.

Performance Highlights

Benchmark	Score	Δ vs Base
MRCR	68.28	+18.09
GraphWalks	77.51	+7.59
GPQA-Diamond	70.20	+2.49
MMLU-Pro	76.00	+1.50

Results on MRCR and GraphWalks are comparable to Qwen3-235B-A22B despite ~8× fewer active parameters. General capabilities are preserved.

Training Data

Dataset: groundhogLLM/ACC-dataset
Size: 10,802 compiled trajectories (Search: 3,369; SWE: 4,368; SQL: 3,065)
Context length: 2K – 128K tokens
Training seq length: 131,072 tokens
Epochs: 4

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "groundhogLLM/ACC-Qwen3-30B-A3B"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Standard Qwen3 chat template applies
messages = [{"role": "user", "content": "Your long-context question here..."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=1024)
print(tokenizer.decode(outputs[0]))

Citation

If you use this model, please cite:

@misc{su2026acccompilingagenttrajectories,
      title={ACC: Compiling Agent Trajectories for Long-Context Training}, 
      author={Qisheng Su and Zhen Fang and Shiting Huang and Yu Zeng and Yiming Zhao and Kou Shi and Ziao Zhang and Lin Chen and Zehui Chen and Lijun Wu and Feng Zhao},
      year={2026},
      eprint={2605.21850},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2605.21850}, 
}

Downloads last month: 84

Safetensors

Model size

31B params

Tensor type

BF16

Model tree for groundhogLLM/ACC-Qwen3-30B-A3B

Quantizations

2 models

Collection including groundhogLLM/ACC-Qwen3-30B-A3B

ACC: Compiling Agent Trajectories for Long-Context Training

Collection

2 items • Updated 7 days ago • 1

Paper for groundhogLLM/ACC-Qwen3-30B-A3B

ACC: Compiling Agent Trajectories for Long-Context Training

Paper • 2605.21850 • Published 7 days ago • 59