Instructions to use Muse-research/Muse-3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Muse-research/Muse-3B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Muse-research/Muse-3B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Muse-research/Muse-3B") model = AutoModelForCausalLM.from_pretrained("Muse-research/Muse-3B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Muse-research/Muse-3B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Muse-research/Muse-3B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Muse-research/Muse-3B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Muse-research/Muse-3B
- SGLang
How to use Muse-research/Muse-3B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Muse-research/Muse-3B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Muse-research/Muse-3B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Muse-research/Muse-3B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Muse-research/Muse-3B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Muse-research/Muse-3B with Docker Model Runner:
docker model run hf.co/Muse-research/Muse-3B
Muse-3B
Muse-3B is a compact 3B chat language model from Muse Research Lab. It is built for helpful everyday conversation, writing, simple coding help, English/German/French assistance, and safe general-purpose responses.
Model Details
Model Developer: Muse Research Lab
Model Architecture: Muse-3B is an auto-regressive, Llama-style decoder-only transformer optimized for compact chat and general assistance.
| Model | Params | Input modalities | Output modalities | Context Length | GQA | Shared Embeddings | Knowledge cutoff |
|---|---|---|---|---|---|---|---|
| Muse-3B | ~3B | Multilingual text | Multilingual text and code | 8,192 tokens | Yes | Yes | Not specified |
Supported Languages: English, German, and French.
Status: This is an early compact chat model intended for lightweight assistant-style use and experimentation.
Capabilities
- General chat and question answering
- Writing, brainstorming, and rewriting
- Simple coding help and explanations
- Multilingual responses in English, German, and French
- Safe refusal behavior for harmful requests
Quickstart
pip install "transformers>=4.43.0" accelerate torch
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
MODEL_ID = "Muse-research/Muse-3B"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
MODEL_ID,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "system", "content": "You are Muse-3B, a helpful chat assistant from Muse Research Lab."},
{"role": "user", "content": "Hi, who are you?"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.inference_mode():
output_ids = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.7,
top_p=0.9,
do_sample=True,
)
response = tokenizer.decode(output_ids[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)
print(response)
Intended Use
Muse-3B is intended for lightweight assistant-style use, including chat, drafting, summarization, simple programming support, and English/German/French/Italian/Spanish/Portuguese everyday help.
Limitations
- May produce incorrect or incomplete answers.
- May struggle with advanced reasoning, long coding tasks, or highly specialized domains.
- Multilingual support is useful but may be less reliable than English.
- Should not be used as the only source for medical, legal, financial, or safety-critical decisions.
- Applications should add their own safeguards when deployed to users.
Safety
Muse-3B is designed to be helpful while refusing clearly harmful requests. For production use, pair the model with application-level safety checks, monitoring, and domain-specific policies.
- Downloads last month
- 21