--- language: - en - zh license: apache-2.0 library_name: transformers tags: - merge - mergekit - lazymergekit - Locutusque/StockQwen-2.5-7B - allknowingroger/QwenSlerp8-7B base_model: - allknowingroger/QwenSlerp8-7B - Locutusque/StockQwen-2.5-7B model-index: - name: Qwen-2.5-Aether-SlerpFusion-7B results: - task: type: text-generation name: Text Generation dataset: name: IFEval (0-Shot) type: HuggingFaceH4/ifeval args: num_few_shot: 0 metrics: - type: inst_level_strict_acc and prompt_level_strict_acc value: 62.62 name: strict accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ZeroXClem/Qwen-2.5-Aether-SlerpFusion-7B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: BBH (3-Shot) type: BBH args: num_few_shot: 3 metrics: - type: acc_norm value: 36.01 name: normalized accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ZeroXClem/Qwen-2.5-Aether-SlerpFusion-7B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MATH Lvl 5 (4-Shot) type: hendrycks/competition_math args: num_few_shot: 4 metrics: - type: exact_match value: 24.17 name: exact match source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ZeroXClem/Qwen-2.5-Aether-SlerpFusion-7B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GPQA (0-shot) type: Idavidrein/gpqa args: num_few_shot: 0 metrics: - type: acc_norm value: 6.49 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ZeroXClem/Qwen-2.5-Aether-SlerpFusion-7B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MuSR (0-shot) type: TAUR-Lab/MuSR args: num_few_shot: 0 metrics: - type: acc_norm value: 11.29 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ZeroXClem/Qwen-2.5-Aether-SlerpFusion-7B name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU-PRO (5-shot) type: TIGER-Lab/MMLU-Pro config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 36.96 name: accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ZeroXClem/Qwen-2.5-Aether-SlerpFusion-7B name: Open LLM Leaderboard --- # ZeroXClem/Qwen-2.5-Aether-SlerpFusion-7B **Qwen-2.5-Aether-SlerpFusion-7B** is a sophisticated model merge that combines the strengths of multiple pre-trained language models using the powerful [mergekit](https://github.com/ZeroXClem/mergekit) framework. This fusion leverages spherical linear interpolation (SLERP) to seamlessly blend architectural layers, resulting in a model that benefits from enhanced performance and versatility. ## πŸš€ Merged Models This model merge incorporates the following: - [**Locutusque/StockQwen-2.5-7B**](https://huggingface.co/Locutusque/StockQwen-2.5-7B): Serves as the foundational model, renowned for its robust language understanding and generation capabilities. - [**allknowingroger/QwenSlerp8-7B**](https://huggingface.co/allknowingroger/QwenSlerp8-7B): Contributes advanced task-specific fine-tuning, enhancing the model's adaptability across various applications. ## 🧩 Merge Configuration The configuration below outlines how the models are merged using **spherical linear interpolation (SLERP)**. This method ensures smooth transitions between the layers of both models, facilitating an optimal blend of their unique attributes: ```yaml # ZeroXClem/Qwen-2.5-Aether-SlerpFusion-7B Merge Configuration slices: - sources: - model: Locutusque/StockQwen-2.5-7B layer_range: [0, 28] - model: allknowingroger/QwenSlerp8-7B layer_range: [0, 28] merge_method: slerp base_model: Locutusque/StockQwen-2.5-7B parameters: t: - filter: self_attn value: [0, 0.5, 0.3, 0.7, 1] - filter: mlp value: [1, 0.5, 0.7, 0.3, 0] - value: 0.5 dtype: bfloat16 ``` ### πŸ”‘ Key Parameters - **Self-Attention Filtering** (`self_attn`): Controls the blending extent across self-attention layers, allowing for a dynamic mix between the two source models. - **MLP Filtering** (`mlp`): Adjusts the balance within the Multi-Layer Perceptrons, fine-tuning the model’s neural network layers for optimal performance. - **Global Weight (`t.value`)**: Sets a general interpolation factor for all unspecified layers, ensuring an equal contribution from both models. - **Data Type (`dtype`)**: Utilizes `bfloat16` to maintain computational efficiency while preserving high precision. ### πŸ—£οΈ Inference Below is an example of how to load and use the model for text generation: ```python from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline import torch # Define the model name model_name = "ZeroXClem/Qwen-2.5-Aether-SlerpFusion-7B" # Load the tokenizer tokenizer = AutoTokenizer.from_pretrained(model_name) # Load the model model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.bfloat16, device_map="auto" ) # Initialize the pipeline text_generator = pipeline( "text-generation", model=model, tokenizer=tokenizer, torch_dtype=torch.bfloat16, device_map="auto" ) # Define the input prompt prompt = "Explain the significance of artificial intelligence in modern healthcare." # Generate the output outputs = text_generator( prompt, max_new_tokens=150, do_sample=True, temperature=0.7, top_k=50, top_p=0.95 ) # Print the generated text print(outputs[0]["generated_text"]) ``` ## 🎯 Use Case & Applications **Qwen-2.5-Aether-SlerpFusion-7B** excels in scenarios that require both robust language understanding and specialized task performance. This merged model is ideal for: - **Advanced Text Generation and Comprehension**: Crafting coherent, contextually accurate, and nuanced text for applications like content creation, summarization, and translation. - **Domain-Specific Tasks**: Enhancing performance in specialized areas such as legal document analysis, medical information processing, and technical support. - **Interactive AI Systems**: Powering conversational agents and chatbots that require both general language capabilities and task-specific expertise. ## πŸ“œ License This model is open-sourced under the **Apache-2.0 License**. ## πŸ’‘ Tags - `merge` - `mergekit` - `slerp` - `Qwen` - `Locutusque/StockQwen-2.5-7B` - `allknowingroger/QwenSlerp8-7B` --- # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_ZeroXClem__Qwen-2.5-Aether-SlerpFusion-7B) | Metric |Value| |-------------------|----:| |Avg. |29.59| |IFEval (0-Shot) |62.62| |BBH (3-Shot) |36.01| |MATH Lvl 5 (4-Shot)|24.17| |GPQA (0-shot) | 6.49| |MuSR (0-shot) |11.29| |MMLU-PRO (5-shot) |36.96|