Minueza-2-96M

Summary

Minueza-2-96M is a compact language model based on the Llama architecture. It was trained from scratch on English and Portuguese datasets, utilising a context length of 4096 tokens and processing 185 billion tokens during the training process. With a parameter count of only 96 million, this model serves as a lightweight foundation that can be subsequently fine-tuned for specific applications.

Due to its compact size, the model has significant limitations in reasoning, factual knowledge, and general capabilities compared to larger models. It may generate incorrect, irrelevant, or nonsensical outputs. Furthermore, as it was trained on internet text data, it may harbour biases and potentially produce inappropriate content.

Usage

pip install transformers==4.50.0 torch==2.6.0
from transformers import pipeline, TextStreamer
import torch

prompt = "This book tells the story"

generate_text = pipeline(
    "text-generation",
    model="Felladrin/Minueza-2-96M",
    device=torch.device("cuda" if torch.cuda.is_available() else "cpu"),
)

generate_text(
    prompt,
    streamer=TextStreamer(generate_text.tokenizer, skip_special_tokens=True),
    do_sample=True,
    max_new_tokens=512,
    temperature=0.8,
    top_p=0.95,
    top_k=0,
    min_p=0.05,
    repetition_penalty=1.1,
)

Intended Uses

This model was created with the following objectives in mind:

  • Run on mobile web browsers via Wllama and Transformers.js.
  • Run fast on machines without GPU.
  • Serve as a base for fine-tunes using ChatML format.

Model Architecture

This is a transformer model with the Llama architecture, trained on a context window of 4096 tokens.

Configuration Value
max_position_embeddings 4096
hidden_size 672
intermediate_size 2688
num_hidden_layers 8
num_attention_heads 12
num_key_value_heads 4
head_dim 56
attention_dropout 0.1
vocab_size 32000
rope_theta 500000

The pretraining was made with these hyperparameters:

Hyperparameter Value
learning_rate 0.0003
warmup_steps 2000
weight_decay 0.1
max_grad_norm 2.0
total_train_batch_size 512 (2M tokens per batch)
seed 42
optimizer Adam with betas=(0.9,0.95) and epsilon=1e-08
lr_scheduler_type linear

License

This model is licensed under the Apache License 2.0.

Downloads last month
36
Safetensors
Model size
96M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Felladrin/Minueza-2-96M

Finetunes
3 models

Datasets used to train Felladrin/Minueza-2-96M

Collections including Felladrin/Minueza-2-96M