Mamba2 for "mamba2-torch"
Collection
Converted models fitting working with https://github.com/vasqu/mamba2-torch
•
5 items
•
Updated
This is a mirror model to mamba2-780m which is compatible with mamba2-torch, a Hugging Face compatible mamba2 library that is not dependent on the original cuda wheels of the original mamba repo. Credit goes to the original authors of Mamba2 and the transformers library by Hugging Face. Without their work, this would not be possible.
NOTE: mamba2-torch
offers different optimisation paths to use:
You can follow the instructions in the mamba2-torch repo for a more detailed explanation. First of all, you should install the mamba2-torch lib:
git clone https://github.com/vasqu/mamba2-torch.git
cd mamba2-torch
pip install .
Then you can download this repository here via git lfs and then use the files locally the following way (after installing mamba2-torch):
from transformers import AutoTokenizer
from mamba2_torch import Mamba2Model, Mamba2ForCausalLM, Mamba2Config
device = "cuda"
mamba2_hf_path = "<path-to-converted-model>"
model = Mamba2ForCausalLM.from_pretrained(mamba2_hf_path, local_files_only=True).to(device)
tokenizer = AutoTokenizer.from_pretrained(mamba2_hf_path, local_files_only=True)
input_ids = tokenizer("Hey how are you doing?", return_tensors="pt")["input_ids"].to(device)
# expected output (780m): `["Hey how are you doing?\n\nI'm doing great. I'm"]`
out = model.generate(input_ids, max_new_tokens=10)
print(tokenizer.batch_decode(out))
BibTeX:
@inproceedings{mamba2,
title={Transformers are {SSM}s: Generalized Models and Efficient Algorithms Through Structured State Space Duality},
author={Dao, Tri and Gu, Albert},
booktitle={International Conference on Machine Learning (ICML)},
year={2024}
}