Rodimus+-Coder

πŸ€– ModelScope πŸ€— Hugging Face πŸ–₯️ GitHub

## Introduction Rodimus* is a new series of efficient large language models designed to address the challenges of computational complexity in Transformer-based architectures. The Rodimus* includes the base Rodimus model and its enhanced version, Rodimus+. Rodimus leverages a novel Data-Dependent Tempered Selection (DDTS) mechanism within a purely recurrent, linear attention-based framework, achieving high performance.

Building on this, Rodimus+ combines the strengths of Rodimus and the innovative Sliding Window Shared-Key Attention (SW-SKA) in a hybrid approach. This combination effectively integrates semantic, token, and head compression techniques, enabling a balance between accuracy and efficiency.

Beyond academic validation, we train and open-source the lightweight Rodimus+-Coder code LLM, based on the Rodimus architecture. It comes in sizes of 1.6B and 4B, and achieves outstanding results that surpass state-of-the-art (SOTA) models of the same size.

For more details, please refer to our Paper and Github.

Model Downloads

You can download the following table to see the various parameters for your use case. If you are located in mainland China, we also provide the model on modelscope.cn to speed up the download process.

Model #Total Params Training Tokens Context Length Download
Rodimus+-Coder-1.6B-Base 1.6B 8.2T 4K πŸ€— HuggingFace
Rodimus+-Coder-1.6B-Chat 1.6B - 4K πŸ€— HuggingFace
Rodimus+-Coder-4B-Base 4B 8.2T 4K πŸ€— HuggingFace
Rodimus+-Coder-4B-Chat 4B - 4K πŸ€— HuggingFace

Rodimus+-Coder Evaluation

We re-evaluate the metrics of the Qwen series models, and the metrics of other series models are quoted from the original paper. For detailed evaluation code, please refer to the evaluation method of Ling-Coder-Lite in CodeFuse-Evaluation.

Rodimus+-Coder-Base

Datasets Qwen2.5-Coder-1.5B Rodimus+-Coder-1.6B-Base Gemma2-2B-PT Qwen2.5-Coder-3B Rodimus+-Coder-4B-Base Gemma3-4B-PT Qwen2.5-Coder-7B
Coding Tasks
HumanEval 41.5 51.2 19.5 51.8 60.4 36.0 60.4
HumanEval+ 34.8 45.1 - 40.9 52.4 - 50.6
MBPP 57.2 51.2 31.0 62.6 64.6 46.0 70.0
MBPP+ 66.1 62.2 - 65.9 71.4 - 70.1
BCBCOMPLETION 21.6 17.9 - 26.2 30.8 - 30.4
MultiPL-E 46.1 52.5 - 49.4 60.7 - 56.9
CRUXEval 38.5 45.1 - 44.6 56.4 - 56.8
Coding Avg. 43.7 46.5 - 48.8 56.7 - 56.4
General Tasks
C-EVAL 55.2 56.7 - 65.3 70.2 - 69.1
CMMLU 54.5 52.3 - 65.4 68.3 - 72.7
MMLU 55.5 51.1 52.2 63.3 62.6 59.6 70.5
BBH 21.8 46.8 42.4 32.5 61.9 50.9 67.3
General Avg. 46.8 51.7 - 56.6 65.8 - 69.9
Mathematics Tasks
GSM8K 60.4 68.7 25.0 72.1 78.5 38.4 83.4
MATH 23.7 29.0 16.4 31.9 37.0 24.2 42.2
Math Avg. 41.9 48.9 20.7 52.0 57.8 31.3 62.8
Overall
Overall 44.4 48.4 - 51.7 59.6 - 61.6
### Rodimus+-Coder-Chat
Datasets Qwen2.5-Coder-1.5B-Instruct Rodimus+-Coder-1.6B-Chat Gemma2-2B-IT Qwen2.5-Coder-Instruct Phi-4-Mini-3.8B Rodimus+-Coder-4B-Chat Gemma3-4B-IT Qwen2.5-Coder-7B-Instruct
Coding Tasks
HumanEval 64.6 76.8 20.1 79.9 74.4 86.6 71.3 87.2
HumanEval+ 63.4 73.8 - 80.5 68.3 82.9 - 82.3
MBPP 51.0 59.0 36.6 59.2 65.3 68.0 63.2 75.8
MBPP+ 53.0 66.4 - 61.9 63.8 68.5 - 75.1
LCB(24.08-24.11) 4.0 10.9 - 13.0 - 13.9 - 22.8
BCBINSTRUCT 10.8 21.5 - 21.7 33.8 26.6 - 30.6
HumanEval-Mul 50.8 57.3 - 67.4 - 70.6 - 76.1
MBPP-Mul 43.4 52.4 - 53.4 - 59.6 - 61.4
MBXP-EN 55.8 75.5 - 76.0 - 87.3 - 87.7
MBXP-CN 48.8 75.0 - 68.7 - 84.3 - 83.5
CRUXEval 28.6 55.0 - 51.6 - 63.2 - 69.3
HumanEvalFix 38.9 52.6 - 55.5 - 68.8 - 69.3
Spider 61.2 71.4 - 71.8 42.2 73.5 - 82.0
Coding Avg. 44.2 57.5 - 58.5 - 65.7 - 69.5
General Tasks
C-EVAL 51.5 50.8 - 62.0 - 61.6 - 66.4
CMMLU 45.2 50.5 - 60.1 - 62.0 - 64.9
MMLU 52.0 49.3 56.1 61.7 67.3 57.5 58.1 66.1
BBH 24.2 58.7 41.4 57.3 70.4 63.7 72.2 59.1
General Avg. 43.2 52.3 - 60.3 - 61.2 - 64.1
Mathematics Tasks
GSM8K 54.4 68.5 62.6 73.5 88.6 79.2 89.2 79.5
MATH 38.1 33.5 27.2 44.1 64.0 44.1 75.6 60.8
Math Avg. 46.2 51.0 44.9 58.8 68.8 61.7 82.4 70.1
Overall
Overall 44.2 55.8 - 58.9 - 64.3 - 68.4

Usage

Installation

  1. The latest version of transformers is recommended (at least 4.42.0).
  2. We evaluate our models with python=3.8 and torch==2.1.2.
  3. If you use Rodimus, you need to install flash-linear-attention, causal_conv1d and triton>=2.2.0. If you use Rodimus+, you need to further install flash-attention.

Generation

generate APi

import os
import torch
from modeling_rodimus import RodimusForCausalLM
from tokenization_rodimus_fast import RodimusTokenizer
# load model
ckpt_dir = "model_path"
tokenizer = RodimusTokenizer.from_pretrained(ckpt_dir)
model = RodimusForCausalLM.from_pretrained(
    ckpt_dir,
    torch_dtype=torch.bfloat16,
    device_map="cuda"
).eval()
# inference
input_prompt = "Write a quick sort algorithm in python."
messages = [
    {"role": "HUMAN", "content": input_prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
)
model_inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**model_inputs, max_new_tokens=2048)
response = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
print(response)

Citation

If you find our work helpful, feel free to give us a cite.

@inproceedings{
    he2025rodimus,
    title={Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions},
    author={Zhihao He and Hang Yu and Zi Gong and Shizhan Liu and Jianguo Li and Weiyao Lin},
    booktitle={The Thirteenth International Conference on Learning Representations},
    year={2025},
    url={https://openreview.net/forum?id=IIVYiJ1ggK}
}
Downloads last month
4
Safetensors
Model size
4.79B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including codefuse-ai/Rodimus-Plus-Coder-4B-Chat