DisOOM
/

Qwen1.5-124B-Chat-Merge

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Qwen1.5-124B-Chat-Merge / README.md

DisOOM's picture

Update README.md

ba10c40 verified 8 months ago

|

1.75 kB

	---
	license: other
	license_name: tongyi-qianwen
	license_link: https://huggingface.co/Qwen/Qwen1.5-72B-Chat/blob/main/LICENSE
	tags:
	- merge
	- mergekit
	- qwen2
	- chat
	- conversational
	language:
	- en
	- chi
	library_name: transformers
	---
	# Qwen1.5-124B-Chat-Merge
	--This is a 124b frankenmerge of [qwen1.5-72B-Chat](https://huggingface.co/Qwen/Qwen1.5-72B-Chat) created by interleaving layers of [qwen1.5-72B-Chat](https://huggingface.co/Qwen/Qwen1.5-72B-Chat) with itself using mergekit.--

	Inspired by other frankenmerge models like [goliath-120b](https://huggingface.co/alpindale/goliath-120b) and [miqu-1-120b](https://huggingface.co/wolfram/miqu-1-120b)

	-Quantize

	Coming soon...

	-Merge Configuration

	This yaml below:
	```yaml
	dtype: float16
	merge_method: passthrough
	slices:
	- sources:
	- layer_range: [0, 20]
	model: Qwen/Qwen1.5-72B-Chat
	- sources:
	- layer_range: [10, 30]
	model: Qwen/Qwen1.5-72B-Chat
	- sources:
	- layer_range: [20, 40]
	model: Qwen/Qwen1.5-72B-Chat
	- sources:
	- layer_range: [30, 50]
	model: Qwen/Qwen1.5-72B-Chat
	- sources:
	- layer_range: [40, 60]
	model: Qwen/Qwen1.5-72B-Chat
	- sources:
	- layer_range: [50, 70]
	model: Qwen/Qwen1.5-72B-Chat
	- sources:
	- layer_range: [60, 80]
	model: Qwen/Qwen1.5-72B-Chat
	```
	-Performance

	* Tips:I don't have the capability to conduct benchmark tests, nor can I even use it extensively enough, so my test results might not be entirely accurate.

	It has better performance than the 72B version in most of my own tests (subjective) including comprehension, reasoning and coherence.

	-Thanks
	* 1.The tool used to merge this model [mergekit](https://github.com/arcee-ai/mergekit)
	* 2.Qwen team for the excellent base models.