Minueza-2-96M

Summary

Minueza-2-96M is a compact language model based on the Llama architecture. It was trained from scratch on English and Portuguese datasets, utilising a context length of 4096 tokens and processing 185 billion tokens during the training process. With a parameter count of only 96 million, this model serves as a lightweight foundation that can be subsequently fine-tuned for specific applications.

Due to its compact size, the model has significant limitations in reasoning, factual knowledge, and general capabilities compared to larger models. It may generate incorrect, irrelevant, or nonsensical outputs. Furthermore, as it was trained on internet text data, it may harbour biases and potentially produce inappropriate content.

Usage

pip install transformers==4.50.0 torch==2.6.0

from transformers import pipeline, TextStreamer
import torch

prompt = "This book tells the story"

generate_text = pipeline(
    "text-generation",
    model="Felladrin/Minueza-2-96M",
    device=torch.device("cuda" if torch.cuda.is_available() else "cpu"),
)

generate_text(
    prompt,
    streamer=TextStreamer(generate_text.tokenizer, skip_special_tokens=True),
    do_sample=True,
    max_new_tokens=512,
    temperature=0.8,
    top_p=0.95,
    top_k=0,
    min_p=0.05,
    repetition_penalty=1.1,
)

Intended Uses

This model was created with the following objectives in mind:

Run on mobile web browsers via Wllama and Transformers.js.
Run fast on machines without GPU.
Serve as a base for fine-tunes using ChatML format.

Model Architecture

This is a transformer model with the Llama architecture, trained on a context window of 4096 tokens.

Configuration	Value
max_position_embeddings	4096
hidden_size	672
intermediate_size	2688
num_hidden_layers	8
num_attention_heads	12
num_key_value_heads	4
head_dim	56
attention_dropout	0.1
vocab_size	32000
rope_theta	500000

The pretraining was made with these hyperparameters:

Hyperparameter	Value
learning_rate	0.0003
warmup_steps	2000
weight_decay	0.1
max_grad_norm	2.0
total_train_batch_size	512 (2M tokens per batch)
seed	42
optimizer	Adam with betas=(0.9,0.95) and epsilon=1e-08
lr_scheduler_type	linear

License

This model is licensed under the Apache License 2.0.

Felladrin
/

Minueza-2-96M

Minueza-2-96M

Summary

Usage

Intended Uses

Model Architecture

License

Model tree for Felladrin/Minueza-2-96M

Datasets used to train Felladrin/Minueza-2-96M

Collections including Felladrin/Minueza-2-96M

Foundation Text-Generation Models Below 360M Parameters

Minueza-2-96M