Panchovix
/

Venus-120b-v1.0-4.5bpw-h6-exl2

Text Generation

Not-For-All-Audiences

Inference Endpoints

Model card Files Files and versions Community

Venus-120b-v1.0-4.5bpw-h6-exl2 / README.md

Panchovix's picture

Update README.md

f21d6b7 about 1 year ago

|

history blame contribute delete

2.35 kB

	---
	license: llama2
	language:
	- en
	tags:
	- not-for-all-audiences
	---
	4.5 bpw/bits exl2 quantization of [Venus-120b](https://huggingface.co/nsfwthrowitaway69/Venus-120b-v1.0), using the measurement.json and dataset posted in the model page.

	This size lets it fit on 72GB VRAM without using FP8 cache (on 3x24 GB, it uses about 69-70GB VRAM with context loaded)

	# Original model card

	# Venus 120b - version 1.0

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/655febd724e0d359c1f21096/BSKlxWQSbh-liU8kGz4fF.png)

	## Overview

	The goal was to create a large model that's highly capable for RP/ERP scenarios. Goliath-120b is excellent for roleplay, and Venus-120b was created with the idea of attempting to mix more than two models together to see how well this method works.

	## Model Details

	- A result of interleaving layers of [Sao10K/Euryale-1.3-L2-70B](https://huggingface.co/Sao10K/Euryale-1.3-L2-70B), [NousResearch/Nous-Hermes-Llama2-70b](https://huggingface.co/NousResearch/Nous-Hermes-Llama2-70b), and [migtissera/SynthIA-70B-v1.5](https://huggingface.co/migtissera/SynthIA-70B-v1.5) using [mergekit](https://github.com/cg123/mergekit).
	- The resulting model has 140 layers and approximately 122 billion parameters.
	- See mergekit-config.yml for details on the merge method used.
	- See the `exl2-*` branches for exllama2 quantizations. The 4.85 bpw quant should fit in 80GB VRAM, and the 3.0 bpw quant should (just barely) fit in 48GB VRAM with 4k context.
	- Inspired by [Goliath-120b](https://huggingface.co/alpindale/goliath-120b)

	Warning: This model will produce NSFW content!

	## Results

	Initial tests show that Venus-120b functions fine, overall it seems to be comparable to Goliath-120b. Some differences I noticed:
	1. Venus needs lower temperature settings than Goliath. I recommend a temp of around 0.7, and no higher than 1.0.
	2. Venus tends to, on average, produce longer responses than Goliath. Probably due to the inclusion of SynthIA in the merge, which is trained to produce long chain-of-thought responses.
	3. Venus seems to be a bit less creative than Goliath when it comes to the prose it generates. Probably due to the lack of Xwin and the inclusion of Nous-Hermes.

	Keep in mind this is all anecdotal from some basic tests. The key takeaway is that Venus shows that Goliath is not a fluke.