Qwen3-8B-Eagle3-NeMoRL-RedhatAI-thinking

EAGLE-3 drafter for Qwen/Qwen3-8B, converted from speculators format to NeMo-RL-compatible raw-eagle3 format. Private research artifact for the EfficientRollout (NeurIPS 2026) EAGLE-3 baseline reproduction in NeMo-RL native.

Upstream: RedHatAI/Qwen3-8B-Thinking-speculator.eagle3, trained by RedHat with the vLLM/speculators library on Magpie-Pro-300K-Filtered + UltraChat-200k, thinking enabled.
Conversion: tensor values unchanged.
- config: nested speculators config -> flat raw-eagle3 config + model_type: llama.
- weight keys: layers.0.* -> midlayer.*.
- embed_tokens.weight dropped because NeMo-RL/vLLM reuse the target model embeddings.
eagle_aux_hidden_state_layer_ids: [2, 18, 33] injected. The upstream config contains the key but sets it to null; this mirror makes the Qwen3 36-layer speculators/vLLM convention explicit: (2, n//2, n-3).
norm_before_residual: true preserved from upstream. This is required for speculators-format RedHat EAGLE3 drafters; omitting it caused acceptance collapse in earlier sanity checks.
rope_theta: 10000.0 preserved from upstream transformer_layer_config. This differs from the non-thinking RedHat Qwen3 drafter mirror but matches this thinking drafter's public config.

Reported upstream k=3 acceptance lengths:

dataset	k=3 AL
HumanEval	2.90
math_reasoning	3.04
qa	2.67
question	2.78
rag	2.69

Usage (NeMo-RL):

policy:
  draft:
    model_name: minseokim25/Qwen3-8B-Eagle3-NeMoRL-RedhatAI-thinking

Downloads last month: 74

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support