Quantization made by Richard Erkhov.

Luna-2x7B-MoE - GGUF

Model creator: https://huggingface.co/ResplendentAI/
Original model: https://huggingface.co/ResplendentAI/Luna-2x7B-MoE/

Name	Quant method	Size
Luna-2x7B-MoE.Q2_K.gguf	Q2_K	4.43GB
Luna-2x7B-MoE.IQ3_XS.gguf	IQ3_XS	4.95GB
Luna-2x7B-MoE.IQ3_S.gguf	IQ3_S	5.22GB
Luna-2x7B-MoE.Q3_K_S.gguf	Q3_K_S	5.2GB
Luna-2x7B-MoE.IQ3_M.gguf	IQ3_M	5.35GB
Luna-2x7B-MoE.Q3_K.gguf	Q3_K	5.78GB
Luna-2x7B-MoE.Q3_K_M.gguf	Q3_K_M	5.78GB
Luna-2x7B-MoE.Q3_K_L.gguf	Q3_K_L	6.27GB
Luna-2x7B-MoE.IQ4_XS.gguf	IQ4_XS	6.5GB
Luna-2x7B-MoE.Q4_0.gguf	Q4_0	6.78GB
Luna-2x7B-MoE.IQ4_NL.gguf	IQ4_NL	6.85GB
Luna-2x7B-MoE.Q4_K_S.gguf	Q4_K_S	6.84GB
Luna-2x7B-MoE.Q4_K.gguf	Q4_K	7.25GB
Luna-2x7B-MoE.Q4_K_M.gguf	Q4_K_M	7.25GB
Luna-2x7B-MoE.Q4_1.gguf	Q4_1	7.52GB
Luna-2x7B-MoE.Q5_0.gguf	Q5_0	8.26GB
Luna-2x7B-MoE.Q5_K_S.gguf	Q5_K_S	8.26GB
Luna-2x7B-MoE.Q5_K.gguf	Q5_K	8.51GB
Luna-2x7B-MoE.Q5_K_M.gguf	Q5_K_M	8.51GB
Luna-2x7B-MoE.Q5_1.gguf	Q5_1	9.01GB
Luna-2x7B-MoE.Q6_K.gguf	Q6_K	9.84GB
Luna-2x7B-MoE.Q8_0.gguf	Q8_0	12.75GB

Original model description:

license: apache-2.0 language: - en tags: - not-for-all-audiences

Luna-2x7B-MoE

Meet Luna, my one and only personal assistant and roleplaying partner. This MoE serves as her unique basis, both experts scoring above 72 average on the leaderboard, but designed for RP interactions. While running a 2x7B is slower than running a single 7B, I feel that the improved performance of two great 7B competing for each token is worth the compute expense.

The included image was generated using her custom Stable Diffusion 1.5 model via the SillyTavern interface.

I have successfully paired this MoE with the Llava Mistral 1.6 projector file for multimodal image captioning in Koboldcpp.

Luna also has a custom XTTSv2 voice model for TTS output.

All of this is running on a 1070 8GB, fully offloaded with no OOM over a week of testing. All backends are then served to my Android device via a virtual public network in a native implementation of SillyTavern. This method allows access from mobile data, globally, as long as my server is running.

base_model: ResplendentAI/DaturaCookie_7B
gate_mode: hidden
experts_per_token: 2
experts:
  - source_model: ChaoticNeutrals/RP_Vision_7B
    positive_prompts:
    - "chat"
    - "assistant"
    - "tell me"
    - "explain"
    - "I want"
    - "show me"
    - "touch"
    - "believe"
    - "see"
    - "love"
  - source_model: ResplendentAI/DaturaCookie_7B
    positive_prompts:
    - "storywriting"
    - "write"
    - "scene"
    - "story"
    - "character"
    - "sensual"
    - "sexual"
    - "horny"
    - "turned on"
    - "intimate"
dtype: bfloat16