Quantization made by Richard Erkhov.

mamba-370m-hf - GGUF

Model creator: https://huggingface.co/state-spaces/
Original model: https://huggingface.co/state-spaces/mamba-370m-hf/

Name	Quant method	Size
mamba-370m-hf.Q2_K.gguf	Q2_K	0.2GB
mamba-370m-hf.IQ3_XS.gguf	IQ3_XS	0.23GB
mamba-370m-hf.IQ3_S.gguf	IQ3_S	0.23GB
mamba-370m-hf.Q3_K_S.gguf	Q3_K_S	0.23GB
mamba-370m-hf.IQ3_M.gguf	IQ3_M	0.23GB
mamba-370m-hf.Q3_K.gguf	Q3_K	0.23GB
mamba-370m-hf.Q3_K_M.gguf	Q3_K_M	0.23GB
mamba-370m-hf.Q3_K_L.gguf	Q3_K_L	0.23GB
mamba-370m-hf.IQ4_XS.gguf	IQ4_XS	0.26GB
mamba-370m-hf.Q4_0.gguf	Q4_0	0.27GB
mamba-370m-hf.IQ4_NL.gguf	IQ4_NL	0.27GB
mamba-370m-hf.Q4_K_S.gguf	Q4_K_S	0.27GB
mamba-370m-hf.Q4_K.gguf	Q4_K	0.27GB
mamba-370m-hf.Q4_K_M.gguf	Q4_K_M	0.27GB
mamba-370m-hf.Q4_1.gguf	Q4_1	0.28GB
mamba-370m-hf.Q5_0.gguf	Q5_0	0.3GB
mamba-370m-hf.Q5_K_S.gguf	Q5_K_S	0.3GB
mamba-370m-hf.Q5_K.gguf	Q5_K	0.3GB
mamba-370m-hf.Q5_K_M.gguf	Q5_K_M	0.3GB
mamba-370m-hf.Q5_1.gguf	Q5_1	0.32GB
mamba-370m-hf.Q6_K.gguf	Q6_K	0.34GB
mamba-370m-hf.Q8_0.gguf	Q8_0	0.42GB

Original model description:

library_name: transformers tags: []

Mamba

This repository contains the transfromers compatible mamba-2.8b. The checkpoints are untouched, but the full config.json and tokenizer are pushed to this repo.

Usage

You need to install transformers from main until transformers=4.39.0 is released.

pip install git+https://github.com/huggingface/transformers@main

We also recommend you to install both causal_conv_1d and mamba-ssm using:

pip install causal-conv1d>=1.2.0
pip install mamba-ssm

If any of these two is not installed, the "eager" implementation will be used. Otherwise the more optimised cuda kernels will be used.

Generation

You can use the classic generate API:

>>> from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
>>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("state-spaces/mamba-370m-hf")
>>> model = MambaForCausalLM.from_pretrained("state-spaces/mamba-370m-hf")
>>> input_ids = tokenizer("Hey how are you doing?", return_tensors="pt")["input_ids"]

>>> out = model.generate(input_ids, max_new_tokens=10)
>>> print(tokenizer.batch_decode(out))
["Hey how are you doing?\n\nI'm doing great.\n\nI"]

PEFT finetuning example

In order to finetune using the peft library, we recommend keeping the model in float32!

from datasets import load_dataset
from trl import SFTTrainer
from peft import LoraConfig
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments
tokenizer = AutoTokenizer.from_pretrained("state-spaces/mamba-370m-hf")
model = AutoModelForCausalLM.from_pretrained("state-spaces/mamba-370m-hf")
dataset = load_dataset("Abirate/english_quotes", split="train")
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    logging_dir='./logs',
    logging_steps=10,
    learning_rate=2e-3
)
lora_config =  LoraConfig(
        r=8,
        target_modules=["x_proj", "embeddings", "in_proj", "out_proj"],
        task_type="CAUSAL_LM",
        bias="none"
)
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    args=training_args,
    peft_config=lora_config,
    train_dataset=dataset,
    dataset_text_field="quote",
)
trainer.train()