Mambarim-110M

Camarim Logo

Model Summary

Mambarim-110M is the first Portuguese language model based on a state-space model architecture (Mamba), not a transformer.

WIP

Details

Architecture: a Mamba model pre-trained via causal language modeling
Size: 119,930,880 parameters
Context length: 2048 tokens
Dataset: Pt-Corpus Instruct (6.2B tokens)
Language: Portuguese
Number of steps: 758,423

This repository has the source code used to train this model.

Intended Uses

WIP

Out-of-scope Use

WIP

Basic usage

You need to install transformers from main until transformers=4.39.0 is released.

pip install git+https://github.com/huggingface/transformers@main

We also recommend you to install both causal_conv_1d and mamba-ssm using:

pip install causal-conv1d>=1.2.0
pip install mamba-ssm

You can use the classic generate API:

>>> from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
>>> import torch
>>> tokenizer = AutoTokenizer.from_pretrained("dominguesm/mambarim-110m")
>>> model = MambaForCausalLM.from_pretrained("dominguesm/mambarim-110m")
>>> input_ids = tokenizer("O Natal é uma", return_tensors="pt")["input_ids"]
>>> out = model.generate(
    input_ids,
    repetition_penalty=1.2,
    temperature=0.8,
    top_k=50,
    top_p=0.85,
    do_sample=True,
    max_new_tokens=10
)
>>> print(tokenizer.batch_decode(out))
["<s> O Natal é uma data em que as pessoas passam horas de lazer e"]

Benchmarks

Evaluations on Brazilian Portuguese benchmarks were performed using a Portuguese implementation of the EleutherAI LM Evaluation Harness (created by Eduardo Garcia).

Detailed results can be found here

Model	Average	ENEM	BLUEX	OAB Exams	ASSIN2 RTE	ASSIN2 STS	FAQNAD NLI	HateBR	PT Hate Speech	tweetSentBR	Architecture
TeenyTinyLlama-460m	28.86	20.15	25.73	27.02	53.61	13	46.41	33.59	22.99	17.28	LlamaForCausalLM
TeenyTinyLlama-160m	28.2	19.24	23.09	22.37	53.97	0.24	43.97	36.92	42.63	11.39	LlamaForCausalLM
MulaBR/Mula-4x160-v0.1	26.24	21.34	25.17	25.06	33.57	11.35	43.97	41.5	22.99	11.24	MixtralForCausalLM
TeenyTinyLlama-460m-Chat	25.49	20.29	25.45	26.74	43.77	4.52	34	33.49	22.99	18.13	LlamaForCausalLM
manbarim-110m	14.16	18.4	10.57	21.87	16.09	1.89	9.29	15.75	17.77	15.79	MambaForCausalLM
GloriaTA-3B	4.09	1.89	3.2	5.19	0	2.32	0.26	0.28	23.52	0.19	GPTNeoForCausalLM

dominguesm
/

mambarim-110m