license: mit
library_name: transformers
tags:
- mergekit
- merge
base_model:
- Qwen/Qwen2.5-7B-Instruct-1M
- Sakalti/SJT-7B-1M
- Triangle104/Q2.5-Instruct-1M_Harmony
- bunnycore/Qwen2.5-7B-RRP-1M
- huihui-ai/Qwen2.5-7B-Instruct-1M-abliterated
model-index:
- name: Qwen2.5-7B-CelestialHarmony-1M
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: IFEval (0-Shot)
type: HuggingFaceH4/ifeval
args:
num_few_shot: 0
metrics:
- type: inst_level_strict_acc and prompt_level_strict_acc
value: 59.44
name: strict accuracy
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: BBH (3-Shot)
type: BBH
args:
num_few_shot: 3
metrics:
- type: acc_norm
value: 34.51
name: normalized accuracy
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MATH Lvl 5 (4-Shot)
type: hendrycks/competition_math
args:
num_few_shot: 4
metrics:
- type: exact_match
value: 33.01
name: exact match
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GPQA (0-shot)
type: Idavidrein/gpqa
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 9.17
name: acc_norm
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MuSR (0-shot)
type: TAUR-Lab/MuSR
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 16.74
name: acc_norm
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU-PRO (5-shot)
type: TIGER-Lab/MMLU-Pro
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 37.63
name: accuracy
source:
url: >-
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M
name: Open LLM Leaderboard
ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M
ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M is a custom merged language model based on Qwen2.5-7B with enhanced reasoning, roleplaying, and long-context capabilities. This model supports up to 1 million token context lengths, making it ideal for ultra-long text processing, deep reasoning tasks, and immersive roleplay interactions.
Quants are availble in GGUF format, provided by mradermacher. 1. GGUF 2. imatrix GGUF
π§ Model Details
- Base Model:
Qwen/Qwen2.5-7B-Instruct-1M
- Models Used in Merge:
Qwen/Qwen2.5-7B-Instruct-1M
bunnycore/Qwen2.5-7B-RRP-1M
Triangle104/Q2.5-Instruct-1M_Harmony
Sakalti/SJT-7B-1M
huihui-ai/Qwen2.5-7B-Instruct-1M-abliterated
- Merge Method:
MODEL_STOCK
(Optimized layer-wise weight averaging)
π Overview
Qwen2.5-7B-CelestialHarmony-1M enhances the Qwen2.5-7B series with a fine-tuned balance of roleplaying dynamics, structured reasoning, and long-context memory. The model is particularly well-suited for:
- Roleplaying π§ββοΈ: Immersive character-based storytelling with deep contextual awareness.
- Reasoning & Thought Processing π§ : Capable of structured logical thinking, especially when prompted with
<think>
tags. - Ultra-Long Context Handling π: Efficient processing of sequences up to 1,010,000 tokens using optimized sparse attention.
βοΈ Technical Specifications
Specification | Value |
---|---|
Model Type | Causal Language Model |
Parameters | 7.61B |
Non-Embedding Parameters | 6.53B |
Layers | 28 |
Attention Heads (GQA) | 28 (Q), 4 (KV) |
Max Context Length | 1,010,000 tokens |
Max Generation Length | 8,192 tokens |
Merge Method | Model Stock |
π¬ Merging Details
This model was merged using the Model Stock method, which optimally averages weights from multiple fine-tuned models to create a more efficient, balanced, and performant model.
Merge YAML Configuration
base_model: Qwen/Qwen2.5-7B-Instruct-1M
dtype: bfloat16
merge_method: model_stock
models:
- model: Qwen/Qwen2.5-7B-Instruct-1M
- model: Triangle104/Q2.5-Instruct-1M_Harmony
- model: Sakalti/SJT-7B-1M
- model: bunnycore/Qwen2.5-7B-RRP-1M
- model: huihui-ai/Qwen2.5-7B-Instruct-1M-abliterated
tokenizer_source: Qwen/Qwen2.5-7B-Instruct-1M
π Quickstart
Install Required Packages
Ensure you have the latest transformers
library installed:
pip install transformers torch accelerate
Load and Use the Model
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Tell me a short story about an ancient celestial warrior."
messages = [
{"role": "system", "content": "You are a wise celestial storyteller."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(**model_inputs, max_new_tokens=512)
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
β‘ Optimized Deployment with vLLM
For long-context inference, use vLLM:
git clone -b dev/dual-chunk-attn git@github.com:QwenLM/vllm.git
cd vllm
pip install -e . -v
Run the model:
vllm serve ZeroXClem/Qwen2.5-7B-CelestialHarmony-1M \
--tensor-parallel-size 4 \
--max-model-len 1010000 \
--enable-chunked-prefill --max-num-batched-tokens 131072 \
--enforce-eager \
--max-num-seqs 1
π― Model Capabilities
β
Roleplay & Storytelling β Designed for engaging interactions.
β
Long-Context Awareness β Handles texts up to 1M tokens.
β
Logical Thinking & Reasoning β Supports <think>
tag to enhance thought structuring.
β
Optimized Merge Strategy β Uses Model Stock
for superior generalization.
π Acknowledgments
This model is built on top of Qwen2.5-7B, with contributions from bunnycore, Triangle104, and Sakalti, leveraging the Model Stock merging methodology.
For further details, see:
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 31.75 |
IFEval (0-Shot) | 59.44 |
BBH (3-Shot) | 34.51 |
MATH Lvl 5 (4-Shot) | 33.01 |
GPQA (0-shot) | 9.17 |
MuSR (0-shot) | 16.74 |
MMLU-PRO (5-shot) | 37.63 |