Instructions to use kaisser/LLM-Maroc with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use kaisser/LLM-Maroc with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="kaisser/LLM-Maroc")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("kaisser/LLM-Maroc")
model = AutoModelForCausalLM.from_pretrained("kaisser/LLM-Maroc")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use kaisser/LLM-Maroc with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="kaisser/LLM-Maroc",
	filename="llama.cpp/models/ggml-vocab-aquila.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use kaisser/LLM-Maroc with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf kaisser/LLM-Maroc:BF16
# Run inference directly in the terminal:
llama-cli -hf kaisser/LLM-Maroc:BF16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf kaisser/LLM-Maroc:BF16
# Run inference directly in the terminal:
llama-cli -hf kaisser/LLM-Maroc:BF16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf kaisser/LLM-Maroc:BF16
# Run inference directly in the terminal:
./llama-cli -hf kaisser/LLM-Maroc:BF16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf kaisser/LLM-Maroc:BF16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf kaisser/LLM-Maroc:BF16

Use Docker

docker model run hf.co/kaisser/LLM-Maroc:BF16

LM Studio
Jan

vLLM

How to use kaisser/LLM-Maroc with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "kaisser/LLM-Maroc"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kaisser/LLM-Maroc",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/kaisser/LLM-Maroc:BF16

SGLang

How to use kaisser/LLM-Maroc with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "kaisser/LLM-Maroc" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kaisser/LLM-Maroc",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "kaisser/LLM-Maroc" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kaisser/LLM-Maroc",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use kaisser/LLM-Maroc with Ollama:
```
ollama run hf.co/kaisser/LLM-Maroc:BF16
```

Unsloth Studio new

How to use kaisser/LLM-Maroc with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for kaisser/LLM-Maroc to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for kaisser/LLM-Maroc to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for kaisser/LLM-Maroc to start chatting

Docker Model Runner
How to use kaisser/LLM-Maroc with Docker Model Runner:
```
docker model run hf.co/kaisser/LLM-Maroc:BF16
```

Lemonade

How to use kaisser/LLM-Maroc with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull kaisser/LLM-Maroc:BF16

Run and chat with the model

lemonade run user.LLM-Maroc-BF16

List all available models

lemonade list

LLM-Maroc / llama.cpp /docs /multimodal /llava.md

kaisser

Upload folder using huggingface_hub

305a42c verified 10 months ago

preview code

raw

history blame contribute delete

5.15 kB

LLaVA

Currently this implementation supports llava-v1.5 variants, as well as llava-1.6 llava-v1.6 variants.

The pre-converted 7b and 13b models are available. For llava-1.6 a variety of prepared gguf models are available as well 7b-34b

After API is confirmed, more models will be supported / uploaded.

Usage

Build the llama-mtmd-cli binary.

After building, run: ./llama-mtmd-cli to see the usage. For example:

./llama-mtmd-cli -m ../llava-v1.5-7b/ggml-model-f16.gguf \
    --mmproj ../llava-v1.5-7b/mmproj-model-f16.gguf \
    --chat-template vicuna

note: A lower temperature like 0.1 is recommended for better quality. add --temp 0.1 to the command to do so. note: For GPU offloading ensure to use the -ngl flag just like usual

LLaVA 1.5

Clone a LLaVA and a CLIP model (available options). For example:

git clone https://huggingface.co/liuhaotian/llava-v1.5-7b

git clone https://huggingface.co/openai/clip-vit-large-patch14-336

Install the required Python packages:

pip install -r tools/mtmd/requirements.txt

Use llava_surgery.py to split the LLaVA model to LLaMA and multimodel projector constituents:

python ./tools/mtmd/llava_surgery.py -m ../llava-v1.5-7b

Use convert_image_encoder_to_gguf.py to convert the LLaVA image encoder to GGUF:

python ./tools/mtmd/convert_image_encoder_to_gguf.py -m ../clip-vit-large-patch14-336 --llava-projector ../llava-v1.5-7b/llava.projector --output-dir ../llava-v1.5-7b

Use examples/convert_legacy_llama.py to convert the LLaMA part of LLaVA to GGUF:

python ./examples/convert_legacy_llama.py ../llava-v1.5-7b --skip-unknown

Now both the LLaMA part and the image encoder are in the llava-v1.5-7b directory.

LLaVA 1.6 gguf conversion

First clone a LLaVA 1.6 model:

git clone https://huggingface.co/liuhaotian/llava-v1.6-vicuna-7b

Install the required Python packages:

pip install -r tools/mtmd/requirements.txt

Use llava_surgery_v2.py which also supports llava-1.5 variants pytorch as well as safetensor models:

python tools/mtmd/llava_surgery_v2.py -C -m ../llava-v1.6-vicuna-7b/

you will find a llava.projector and a llava.clip file in your model directory

Copy the llava.clip file into a subdirectory (like vit), rename it to pytorch_model.bin and add a fitting vit configuration to the directory:

mkdir vit
cp ../llava-v1.6-vicuna-7b/llava.clip vit/pytorch_model.bin
cp ../llava-v1.6-vicuna-7b/llava.projector vit/
curl -s -q https://huggingface.co/cmp-nct/llava-1.6-gguf/raw/main/config_vit.json -o vit/config.json

Create the visual gguf model:

python ./tools/mtmd/convert_image_encoder_to_gguf.py -m vit --llava-projector vit/llava.projector --output-dir vit --clip-model-is-vision

This is similar to llava-1.5, the difference is that we tell the encoder that we are working with the pure vision model part of CLIP

Then convert the model to gguf format:

python ./examples/convert_legacy_llama.py ../llava-v1.6-vicuna-7b/ --skip-unknown

And finally we can run the llava cli using the 1.6 model version:

./llama-mtmd-cli -m ../llava-v1.6-vicuna-7b/ggml-model-f16.gguf --mmproj vit/mmproj-model-f16.gguf

note llava-1.6 needs more context than llava-1.5, at least 3000 is needed (just run it at -c 4096)

note llava-1.6 greatly benefits from batched prompt processing (defaults work)

note if the language model in step 6) is incompatible with the legacy conversion script, the easiest way handle the LLM model conversion is to load the model in transformers, and export only the LLM from the llava next model.

import os
import transformers

model_path = ...
llm_export_path = ...

tokenizer = transformers.AutoTokenizer.from_pretrained(model_path)
model = transformers.AutoModelForImageTextToText.from_pretrained(model_path)

tokenizer.save_pretrained(llm_export_path)
model.language_model.save_pretrained(llm_export_path)

Then, you can convert the LLM using the convert_hf_to_gguf.py script, which handles more LLM architectures.

Chat template

For llava-1.5 and llava-1.6, you need to use vicuna chat template. Simply add --chat-template vicuna to activate this template.

How to know if you are running in llava-1.5 or llava-1.6 mode

When running llava-cli you will see a visual information right before the prompt is being processed:

Llava-1.5: encode_image_with_clip: image embedding created: 576 tokens

Llava-1.6 (anything above 576): encode_image_with_clip: image embedding created: 2880 tokens

Alternatively just pay notice to how many "tokens" have been used for your prompt, it will also show 1000+ tokens for llava-1.6