walebadr
/

mamba-2.8b-SFT

Inference Endpoints

Model card Files Files and versions Community

walebadr commited on Jan 15

Commit

f7f2a3d

•

1 Parent(s): 38646c7

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -5,20 +5,20 @@ This is a the state-spaces mamba-2.8b model, fine-tuned using Supervised Fine-tu
 To run inference on this model, run the following code:
-```
 import torch
 from transformers import AutoTokenizer, AutoModelForCausalLM
 from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
 device = "cuda"
 messages = []
 user_message = f"[INST] what is a language model? [/INST]"
 input_ids = tokenizer(user_message, return_tensors="pt").input_ids.to("cuda")
 out = model.generate(input_ids=input_ids, max_length=500, temperature=0.9, top_p=0.7, eos_token_id=tokenizer.eos_token_id)
 decoded = tokenizer.batch_decode(out)

 To run inference on this model, run the following code:
+```python
 import torch
 from transformers import AutoTokenizer, AutoModelForCausalLM
 from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
+#Load the model
+model = MambaLMHeadModel.from_pretrained("walebadr/mamba-2.8b-SFT", dtype=torch.bfloat16, device="cuda")
 device = "cuda"
 messages = []
 user_message = f"[INST] what is a language model? [/INST]"
 input_ids = tokenizer(user_message, return_tensors="pt").input_ids.to("cuda")
 out = model.generate(input_ids=input_ids, max_length=500, temperature=0.9, top_p=0.7, eos_token_id=tokenizer.eos_token_id)
 decoded = tokenizer.batch_decode(out)