File size: 2,532 Bytes
597338c
 
 
 
 
0170033
 
597338c
f95d435
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0170033
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
---
license: llama2
language:
- en
pipeline_tag: conversational
tags:
- merge
---
# Giant Macaroni 120b

![img](./giantmacaroni_small.png)

An auto-regressive causal LM created by combining 3x finetuned models into one via passthrough merging slices in a stacked order.  

I'll add more later but it is a combination of :

AIDC-ai-business/Marcoroni-70B-v1
garage-bAInd/Platypus2-70B-instruct
fangloveskari/ORCA_LLaMA_70B_QLoRA

This is an attempt to create a larger model capable of handling logic and reason well.  All three of these models score well in these categories and I used bertviz to find good splice points.

I hope to find out how much $$$ it is going to take to settle this merge with some fine tuning.  I think it will do very well once it is settled.  These three models could not mix with others I tried because of some changes in their layers involving the addition of a rotary embedding.  This seemed to mess with the tokens and could not be mixed with other models that didn't have it.  However, all models that do well in logic and reasoning seem to have this addition.


# Prompting Format

Both Vicuna and Alpaca will work, but due the final layers belonging primarily to Marcoroni.




# Benchmarks
Coming soon.

# Acknowledgements
[@chargoddard](https://huggingface.co/chargoddard) - [mergekit](https://github.com/cg123/mergekit).
[@migtissera](https://huggingface.co/migtissera) - for [Tess-XL](https://huggingface.co/migtissera/Tess-XL-v1.0) which inspired me to believe that open models can compete on logic tasks with the big commercial models.
[@alpindale](https://huggingface.co/alpindale) - for [Goliath-120B](https://huggingface.co/alpindale/goliath-120b?text=Hey+my+name+is+Thomas%21+How+are+you%3F) that started this crazy endeavor for us all
[@nsfwthrowitaway69](https://huggingface.co/nsfwthrowitaway69) - for sharing the merge config for [Venus-120B](https://huggingface.co/nsfwthrowitaway69/Venus-120b-v1.1) and getting me off the starting block with some questions on mergekit and tokenizers

Keep it open and keep sharing everyone!  With Mixtral and MOE changes to mergekit coupled with these larger merged models?  I think the sky is the limit for us all.  I can only imagine what will happen if we took a group of these 120 models, fin tuned them each a bit and applied the MOE Mixtral merge method to them?  I would also point out that if a clever VC came along and funded that work?  You have the people you need right here on huggingface and all they need is the equipment to do it on.