RDson
/

Llama-3-Peach-Instruct-4x8B-MoE

Text Generation

Mixture of Experts

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Llama-3-Peach-Instruct-4x8B-MoE / README.md

RDson's picture

Update README.md

b6a7e43 verified 6 months ago

|

history blame contribute delete

2.5 kB

	---
	tags:
	- moe
	- llama
	- '3'
	- llama 3
	- 4x8b
	---
	# Llama-3-Peach-Instruct-4x8B-MoE

	<img src="https://i.imgur.com/MlnauLb.jpeg" width="640"/>

	## GGUF files are available here: [RDson/Llama-3-Peach-Instruct-4x8B-MoE-GGUF](https://huggingface.co/RDson/Llama-3-Peach-Instruct-4x8B-MoE-GGUF).

	This is a experimental MoE created using Mergekit from
	* [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)
	* [Salesforce/SFR-Iterative-DPO-LLaMA-3-8B-R](https://huggingface.co/Salesforce/SFR-Iterative-DPO-LLaMA-3-8B-R)
	* [NousResearch/Hermes-2-Theta-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B)
	* [rombodawg/Llama-3-8B-Instruct-Coder](https://huggingface.co/rombodawg/Llama-3-8B-Instruct-Coder)

	Evaluation:
	Q4_K_M:
	* GSM8K (5-shot): 0.6983 ± 0.0126
	* GSM8K (8-shot, cot): 0.674 ± 0.0129

	Mergekit yaml file:
	```
	base_model: Meta-Llama-3-8B-Instruct
	experts:
	- source_model: Meta-Llama-3-8B-Instruct
	positive_prompts:
	- "explain"
	- "chat"
	- "assistant"
	- "think"
	- "roleplay"
	- "versatile"
	- "helpful"
	- "factual"
	- "integrated"
	- "adaptive"
	- "comprehensive"
	- "balanced"
	negative_prompts:
	- "specialized"
	- "narrow"
	- "focused"
	- "limited"
	- "specific"
	- source_model: Llama-3-8B-Instruct-Coder
	positive_prompts:
	- "python"
	- "math"
	- "solve"
	- "code"
	- "programming"
	- "javascript"
	- "algorithm"
	- "factual"
	negative_prompts:
	- "sorry"
	- "cannot"
	- "concise"
	- "imaginative"
	- "creative"
	- source_model: SFR-Iterative-DPO-LLaMA-3-8B-R
	positive_prompts:
	- "AI"
	- "instructive"
	- "chat"
	- "assistant"
	- "clear"
	- "directive"
	- "helpful"
	- "informative"
	- source_model: Hermes-2-Theta-Llama-3-8B
	positive_prompts:
	- "chat"
	- "assistant"
	- "analytical"
	- "accurate"
	- "code"
	- "logical"
	- "knowledgeable"
	- "precise"
	- "calculate"
	- "compute"
	- "solve"
	- "work"
	- "python"
	- "javascript"
	- "programming"
	- "algorithm"
	- "tell me"
	- "assistant"
	- "factual"
	negative_prompts:
	- "abstract"
	- "artistic"
	- "emotional"
	- "mistake"
	- "inaccurate"
	gate_mode: hidden
	dtype: float16
	```
	Some inspiration for the Mergekit yaml file is from [LoneStriker/Umbra-MoE-4x10.7-2.4bpw-h6-exl2](https://huggingface.co/LoneStriker/Umbra-MoE-4x10.7-2.4bpw-h6-exl2).