lightonai
/

mambaoutai

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

staghado commited on Apr 10, 2024

Commit

46c9330

·

verified ·

1 Parent(s): 286ff8d

Add inference llama.cpp example

Files changed (1) hide show

README.md +28 -0

README.md CHANGED Viewed

@@ -70,6 +70,34 @@ out = model.generate(input_ids, max_new_tokens=10)
 print(tokenizer.batch_decode(out))
 ```
 ### Model hyperparameters
 More details about the model hyperparameters are given in the table below :

 print(tokenizer.batch_decode(out))
 ```
+### On-device Inference
+Since Mambaoutai is only 1.6B parameters, it can run on a CPU at a a fast speed.
+Here is an example of how to run it on llama.cpp:
+```bash
+# Clone llama.cpp repository and compile it from source
+git clone https://github.com/ggerganov/llama.cpp\
+cd llama.cpp
+make
+# Create a venv and install dependencies
+conda create -n mamba-cpp python=3.10
+conda activate mamba-cpp
+pip install -r requirements/requirements-convert-hf-to-gguf.txt
+# Download the weights, tokenizer, config, tokenizer_config and special_tokens_map from this repo and
+# put them in a directory 'Mambaoutai/'
+mkdir Mambaoutai
+# Convert the weights to GGUF format
+python convert-hf-to-gguf.py Mambaoutai
+# Run inference with a prompt
+./main -m Mambaoutai/ggml-model-f16.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e -ngl 1
+```
 ### Model hyperparameters
 More details about the model hyperparameters are given in the table below :