|
--- |
|
license: llama2 |
|
language: |
|
- en |
|
tags: |
|
- not-for-all-audiences |
|
--- |
|
4.5 bpw/bits exl2 quantization of [Venus-120b](https://huggingface.co/nsfwthrowitaway69/Venus-120b-v1.0), using the measurement.json and dataset posted in the model page. |
|
|
|
This size lets it fit on 72GB VRAM without using FP8 cache (on 3x24 GB, it uses about 69-70GB VRAM with context loaded) |
|
|
|
# Original model card |
|
|
|
# Venus 120b - version 1.0 |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/655febd724e0d359c1f21096/BSKlxWQSbh-liU8kGz4fF.png) |
|
|
|
## Overview |
|
|
|
The goal was to create a large model that's highly capable for RP/ERP scenarios. Goliath-120b is excellent for roleplay, and Venus-120b was created with the idea of attempting to mix more than two models together to see how well this method works. |
|
|
|
## Model Details |
|
|
|
- A result of interleaving layers of [Sao10K/Euryale-1.3-L2-70B](https://huggingface.co/Sao10K/Euryale-1.3-L2-70B), [NousResearch/Nous-Hermes-Llama2-70b](https://huggingface.co/NousResearch/Nous-Hermes-Llama2-70b), and [migtissera/SynthIA-70B-v1.5](https://huggingface.co/migtissera/SynthIA-70B-v1.5) using [mergekit](https://github.com/cg123/mergekit). |
|
- The resulting model has 140 layers and approximately 122 billion parameters. |
|
- See mergekit-config.yml for details on the merge method used. |
|
- See the `exl2-*` branches for exllama2 quantizations. The 4.85 bpw quant should fit in 80GB VRAM, and the 3.0 bpw quant should (just barely) fit in 48GB VRAM with 4k context. |
|
- Inspired by [Goliath-120b](https://huggingface.co/alpindale/goliath-120b) |
|
|
|
**Warning: This model will produce NSFW content!** |
|
|
|
## Results |
|
|
|
Initial tests show that Venus-120b functions fine, overall it seems to be comparable to Goliath-120b. Some differences I noticed: |
|
1. Venus needs lower temperature settings than Goliath. I recommend a temp of around 0.7, and no higher than 1.0. |
|
2. Venus tends to, on average, produce longer responses than Goliath. Probably due to the inclusion of SynthIA in the merge, which is trained to produce long chain-of-thought responses. |
|
3. Venus seems to be a bit less creative than Goliath when it comes to the prose it generates. Probably due to the lack of Xwin and the inclusion of Nous-Hermes. |
|
|
|
Keep in mind this is all anecdotal from some basic tests. The key takeaway is that Venus shows that Goliath is not a fluke. |