migtissera
/

HelixNet

Model card Files Files and versions Community

LMoE version with 3 LoRAs on base Mistral model

by rhysjones - opened Nov 5, 2023

Nov 5, 2023

Created three separate LoRAs for each of the Actor, Critic and Regenerator models in HelixNet. Then combined into a modified script that dynamically enables them onto the Mistral base according to the actor / critic / regen mode. Memory requirement goes down from 3 x 14GB for the full models to 1 x 14GB + 3 x 320MB for the base + LoRAs.

LoRAs and modified code example here: https://huggingface.co/rhysjones/HelixNet-LMoE-Actor

migtissera

Owner Nov 5, 2023

Really awesome work! What’s the added delay for loading LoRAs?

rhysjones

Nov 5, 2023

Loading the LoRAs is very quick (ms). The tradeoff is in the inference performance, since the inference now goes through both the model weights and LoRA deltas - adding an extra step each time.

Initial testing on a 4090 using the demo script gives:

HelixNet Actor model: 44 tokens / second
Mistral + Actor LoRA : 27 tokens / second

migtissera

Owner Nov 5, 2023

That’s still very usable! Nice.

I’ve been running GPTQ 6-bit quantized versions with exllamav2 — getting like 120 tok/second on my 4090. I think the performance is about the same, maybe a slight degradation.

rhysjones

Nov 5, 2023

Yes, ExLlamaV2 is excellent!

Turns out exllamav2 also has support for loading multiple LoRAs. Adapting the LMoE to use the 6-bit exl2 quantized version of Mistral and loading in the LoRAs within exllamav2 gives much better results on the 4090:

3 separate models: 120 tokens / second, using 20GB GPU
LMoE combined model: 91 tokens / second, using 8GB GPU

Update at: https://huggingface.co/rhysjones/HelixNet-LMoE-6.0bpw-h6-exl2

migtissera

Owner Nov 6, 2023

Wow, 8GB is within reach of most people. Nice work!

dillfrescott

Nov 6, 2023

@rhysjones This implementation works fantastically!

migtissera

Owner Nov 6, 2023

The network is not yet perfect, I wanted to get it out to you guys first and then iterate. For example, the regenerator says stuff that are not ideal right now. I’ve started dataset creation for the v2. Will perfect this over time, I think the approach is sound. And much less compute is needed compared to a MoE.
Thanks for your contributions guys!

migtissera changed discussion status to closed Nov 6, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment