Edit model card

Mambarim-110M

Camarim Logo


Model Summary

Mambarim-110M is the first Portuguese language model based on a state-space model architecture (Mamba), not a transformer.

WIP

Details

  • Architecture: a Mamba model pre-trained via causal language modeling
  • Size: 119,930,880 parameters
  • Context length: 2048 tokens
  • Dataset: Pt-Corpus Instruct (6.2B tokens)
  • Language: Portuguese
  • Number of steps: 758,423

This repository has the source code used to train this model.

Intended Uses

WIP

Out-of-scope Use

WIP

Basic usage

You need to install transformers from main until transformers=4.39.0 is released.

pip install git+https://github.com/huggingface/transformers@main

We also recommend you to install both causal_conv_1d and mamba-ssm using:

pip install causal-conv1d>=1.2.0
pip install mamba-ssm

You can use the classic generate API:

>>> from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
>>> import torch
>>> tokenizer = AutoTokenizer.from_pretrained("dominguesm/mambarim-110m")
>>> model = MambaForCausalLM.from_pretrained("dominguesm/mambarim-110m")
>>> input_ids = tokenizer("O Natal é uma", return_tensors="pt")["input_ids"]
>>> out = model.generate(
    input_ids,
    repetition_penalty=1.2,
    temperature=0.8,
    top_k=50,
    top_p=0.85,
    do_sample=True,
    max_new_tokens=10
)
>>> print(tokenizer.batch_decode(out))
["<s> O Natal é uma data em que as pessoas passam horas de lazer e"]

Benchmarks

Evaluations on Brazilian Portuguese benchmarks were performed using a Portuguese implementation of the EleutherAI LM Evaluation Harness (created by Eduardo Garcia).

ASSIN2 RTE ASSIN2 STS BLUEX ENEM FAQUAD NLI HateBR OAB Exams Average
Qwen-1.8B 64.83 19.53 26.15 30.23 43.97 33.33 27.20 35.03
TinyLlama-1.1B 58.93 13.57 22.81 22.25 43.97 36.92 23.64 31.72
TTL-460m 53.93 12.66 22.81 19.87 49.01 33.59 27.06 31.27
XGLM-564m 49.61 22.91 19.61 19.38 43.97 33.99 23.42 30.41
Bloom-1b7 53.60 4.81 21.42 18.96 43.97 34.89 23.05 28.67
TTL-160m 53.36 2.58 21.84 18.75 43.97 36.88 22.60 28.56
OPT-125m 39.77 2.00 21.84 17.42 43.97 47.04 22.78 27.83
Pythia-160 33.33 12.81 16.13 16.66 50.36 41.09 22.82 27.60
OLMo-1b 34.12 9.28 18.92 20.29 43.97 41.33 22.96 27.26
Bloom-560m 33.33 8.48 18.92 19.03 43.97 37.07 23.05 26.26
Pythia-410m 33.33 4.80 19.47 19.45 43.97 33.33 23.01 25.33
OPT-350m 33.33 3.65 20.72 17.35 44.71 33.33 23.01 25.15
GPT-2 small 33.26 0.00 10.43 11.20 43.52 33.68 13.12 20.74
GPorTuguese 33.33 3.85 14.74 3.01 28.81 33.33 21.23 19.75
Mambarim-110M 40.64 3.11 13.90 14.76 00.15 49.00 20.27 17.72
Samba-1.1B 33.33 1.30 8.07 10.22 17.72 35.79 15.03 17.35
Downloads last month
80
Safetensors
Model size
69.8M params
Tensor type
F32
·

Dataset used to train dominguesm/mambarim-110m