Edit model card

Lumina-RP

Lumina-4 is a Mixture of Experts (MoE) made with LazyMergekit. This model has improved roleplaying and storytelling from Lumina-3.5 while still retaining its strength. A slight improvement on the Open LLM leaderboard. It uses a context window of up to 32k.

Response Examples

Input: 25 - 4 * 2 + 3 = ?
------------------------------------
Output: The given mathematical expression is 25 - 4 * 2 + 3. To solve it, we should follow the order of operations (PEMDAS):

1. Parentheses: There are none in this equation.
2. Exponents: None present.
3. Multiplication and Division: We perform multiplications before divisions from left to right. In this case, 4 * 2 = 8. Now our equation becomes 25 - 8 + 3.
4. Addition and Subtraction: We perform additions before subtractions from left to right. Here, we have a subtraction then an addition: 25 - 8 = 17, then 17 + 3 = 20.

So the result of the expression 25 - 4 * 2 + 3 is 20.

πŸ† Open LLM Leaderboard Evaluation Results

Metric Value
Avg. 75.59
AI2 Reasoning Challenge (25-Shot) 72.61
HellaSwag (10-Shot) 88.45
MMLU (5-Shot) 64.73
TruthfulQA (0-shot) 74.24
Winogrande (5-shot) 83.90
GSM8k (5-shot) 69.60

πŸ’» Usage

!pip install -qU transformers bitsandbytes accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "Ppoyaa/Lumina-4"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True},
)

messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
Downloads last month
339
Safetensors
Model size
18.5B params
Tensor type
BF16
Β·
Inference API
Model is too large to load in Inference API (serverless). To try the model, launch it on Inference Endpoints (dedicated) instead.

Collection including Ppoyaa/Lumina-4