|
--- |
|
license: other |
|
license_name: tongyi-qianwen |
|
license_link: https://huggingface.co/Qwen/Qwen1.5-72B-Chat/blob/main/LICENSE |
|
tags: |
|
- merge |
|
- mergekit |
|
- qwen2 |
|
- chat |
|
- conversational |
|
language: |
|
- en |
|
- chi |
|
library_name: transformers |
|
--- |
|
# Qwen1.5-124B-Chat-Merge |
|
**--This is a 124b frankenmerge of [qwen1.5-72B-Chat](https://huggingface.co/Qwen/Qwen1.5-72B-Chat) created by interleaving layers of [qwen1.5-72B-Chat](https://huggingface.co/Qwen/Qwen1.5-72B-Chat) with itself using mergekit.--** |
|
|
|
*Inspired by other frankenmerge models like [**goliath-120b**](https://huggingface.co/alpindale/goliath-120b) and [**miqu-1-120b**](https://huggingface.co/wolfram/miqu-1-120b)* |
|
|
|
**-Quantize** |
|
|
|
*Coming soon...* |
|
|
|
**-Merge Configuration** |
|
|
|
This yaml below: |
|
```yaml |
|
dtype: float16 |
|
merge_method: passthrough |
|
slices: |
|
- sources: |
|
- layer_range: [0, 20] |
|
model: Qwen/Qwen1.5-72B-Chat |
|
- sources: |
|
- layer_range: [10, 30] |
|
model: Qwen/Qwen1.5-72B-Chat |
|
- sources: |
|
- layer_range: [20, 40] |
|
model: Qwen/Qwen1.5-72B-Chat |
|
- sources: |
|
- layer_range: [30, 50] |
|
model: Qwen/Qwen1.5-72B-Chat |
|
- sources: |
|
- layer_range: [40, 60] |
|
model: Qwen/Qwen1.5-72B-Chat |
|
- sources: |
|
- layer_range: [50, 70] |
|
model: Qwen/Qwen1.5-72B-Chat |
|
- sources: |
|
- layer_range: [60, 80] |
|
model: Qwen/Qwen1.5-72B-Chat |
|
``` |
|
**-Performance** |
|
|
|
* Tips:I don't have the capability to conduct benchmark tests, nor can I even use it extensively enough, so my test results might not be entirely accurate. |
|
|
|
It has better performance than the 72B version in most of my own tests (subjective) including comprehension, reasoning and coherence. |
|
|
|
**-Thanks** |
|
* 1.The tool used to merge this model [mergekit](https://github.com/arcee-ai/mergekit) |
|
* 2.Qwen team for the excellent base models. |