Edit model card

Mambarim-110M

Camarim Logo


Model Summary

Mambarim-110M is the first Portuguese language model based on a state-space model architecture (Mamba), not a transformer.

WIP

Details

  • Architecture: a Mamba model pre-trained via causal language modeling
  • Size: 119,930,880 parameters
  • Context length: 2048 tokens
  • Dataset: Pt-Corpus Instruct (6.2B tokens)
  • Language: Portuguese
  • Number of steps: 758,423

This repository has the source code used to train this model.

Intended Uses

WIP

Out-of-scope Use

WIP

Basic usage

You need to install transformers from main until transformers=4.39.0 is released.

pip install git+https://github.com/huggingface/transformers@main

We also recommend you to install both causal_conv_1d and mamba-ssm using:

pip install causal-conv1d>=1.2.0
pip install mamba-ssm

You can use the classic generate API:

>>> from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
>>> import torch
>>> tokenizer = AutoTokenizer.from_pretrained("dominguesm/mambarim-110m")
>>> model = MambaForCausalLM.from_pretrained("dominguesm/mambarim-110m")
>>> input_ids = tokenizer("O Natal é uma", return_tensors="pt")["input_ids"]
>>> out = model.generate(
    input_ids,
    repetition_penalty=1.2,
    temperature=0.8,
    top_k=50,
    top_p=0.85,
    do_sample=True,
    max_new_tokens=10
)
>>> print(tokenizer.batch_decode(out))
["<s> O Natal é uma data em que as pessoas passam horas de lazer e"]

Benchmarks

Evaluations on Brazilian Portuguese benchmarks were performed using a Portuguese implementation of the EleutherAI LM Evaluation Harness (created by Eduardo Garcia).

Detailed results can be found here

Model Average ENEM BLUEX OAB Exams ASSIN2 RTE ASSIN2 STS FAQNAD NLI HateBR PT Hate Speech tweetSentBR Architecture
TeenyTinyLlama-460m 28.86 20.15 25.73 27.02 53.61 13 46.41 33.59 22.99 17.28 LlamaForCausalLM
TeenyTinyLlama-160m 28.2 19.24 23.09 22.37 53.97 0.24 43.97 36.92 42.63 11.39 LlamaForCausalLM
MulaBR/Mula-4x160-v0.1 26.24 21.34 25.17 25.06 33.57 11.35 43.97 41.5 22.99 11.24 MixtralForCausalLM
TeenyTinyLlama-460m-Chat 25.49 20.29 25.45 26.74 43.77 4.52 34 33.49 22.99 18.13 LlamaForCausalLM
manbarim-110m 14.16 18.4 10.57 21.87 16.09 1.89 9.29 15.75 17.77 15.79 MambaForCausalLM
GloriaTA-3B 4.09 1.89 3.2 5.19 0 2.32 0.26 0.28 23.52 0.19 GPTNeoForCausalLM
Downloads last month
412
Safetensors
Model size
69.8M params
Tensor type
F32
·

Dataset used to train dominguesm/mambarim-110m

Space using dominguesm/mambarim-110m 1