image/png (Maybe i'll change the icon picture later.)

Experimental MoE, the idea is to have more active parameters than 7xX model would have and keep it's size lower than 20B.

This model has ~19.2B parameters.

Exl2, 4.0 bpw (Fits in 12GB VRAM/16k context/4-bit cache)

Exl2, 6.0 bpw

GGUF

Base model (self merge)

slices:
  - sources:
    - model: MistralInstruct-v0.2-128k
      layer_range: [0, 24]
  - sources:
    - model: MistralInstruct-v0.2-128k
      layer_range: [8, 24]
  - sources:
    - model: MistralInstruct-v0.2-128k
      layer_range: [24, 32]
merge_method: passthrough
dtype: bfloat16

First expert ("sandwich" merge)

xxx777xxxASD/PrimaSumika-10.7B-128k

slices:
  - sources:
    - model: EroSumika-128k
      layer_range: [0, 24]
  - sources:
    - model: Prima-Lelantacles-128k
      layer_range: [8, 24]
  - sources:
    - model: EroSumika-128k
      layer_range: [24, 32]
merge_method: passthrough
dtype: bfloat16

Second expert ("sandwich" merge)

slices:
  - sources:
    - model: AlphaMonarch-7B-128k
      layer_range: [0, 24]
  - sources:
    - model: NeuralHuman-128k
      layer_range: [8, 24]
  - sources:
    - model: AlphaMonarch-7B-128k
      layer_range: [24, 32]
merge_method: passthrough
dtype: bfloat16

Each 128k model is a slerp merge with Epiculous/Fett-uccine-Long-Noodle-7B-120k-Context

Models used

Downloads last month
10
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including xxx777xxxASD/PrimaMonarch-EroSumika-2x10.7B-128k