Marcel Bischoff commited on
Commit
b3c4e71
1 Parent(s): 7730b56
Files changed (1) hide show
  1. README.md +8 -2
README.md CHANGED
@@ -17,10 +17,16 @@ tags:
17
 
18
  ![](https://i.imgur.com/UOb2fvh.jpg)
19
 
20
- # phixtral-4x2_8
 
 
 
 
21
 
22
  phixtral-4x2_8 is the first Mixure of Experts (MoE) made with four [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) models, inspired by the [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) architecture. It performs better than each individual expert.
23
 
 
 
24
  ## 🏆 Evaluation
25
 
26
  | Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
@@ -109,4 +115,4 @@ A special thanks to [vince62s](https://huggingface.co/vince62s) for the inferenc
109
 
110
  Thanks to [Charles Goddard](https://github.com/cg123) for the [mergekit](https://github.com/cg123/mergekit) library and the implementation of the [MoE for clowns](https://goddard.blog/posts/clown-moe/).
111
 
112
- Thanks to [ehartford](https://huggingface.co/ehartford), [lxuechen](https://huggingface.co/lxuechen), [Yhyu13](https://huggingface.co/Yhyu13), and [mrm8488](https://huggingface.co/mrm8488) for their fine-tuned phi-2 models.
 
17
 
18
  ![](https://i.imgur.com/UOb2fvh.jpg)
19
 
20
+ # phixtral-4x2_8-gates-poc
21
+ phixtral-4x2_8-gates-poc is [phixtral-4x2_8](https://huggingface.co/mlabonne/phixtral-4x2_8)
22
+ with finetuned gates for better selection of Expert and to break the symmetry.
23
+ As a POC we only used 400 shorter samples
24
+ from [openhermes](https://huggingface.co/datasets/teknium/openhermes).
25
 
26
  phixtral-4x2_8 is the first Mixure of Experts (MoE) made with four [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) models, inspired by the [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) architecture. It performs better than each individual expert.
27
 
28
+
29
+
30
  ## 🏆 Evaluation
31
 
32
  | Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
 
115
 
116
  Thanks to [Charles Goddard](https://github.com/cg123) for the [mergekit](https://github.com/cg123/mergekit) library and the implementation of the [MoE for clowns](https://goddard.blog/posts/clown-moe/).
117
 
118
+ Thanks to [ehartford](https://huggingface.co/ehartford), [lxuechen](https://huggingface.co/lxuechen), [Yhyu13](https://huggingface.co/Yhyu13), and [mrm8488](https://huggingface.co/mrm8488) for their fine-tuned phi-2 models.