Instructions to use ttrpg/mosslight-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ttrpg/mosslight-4b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="ttrpg/mosslight-4b")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("ttrpg/mosslight-4b")
model = AutoModelForImageTextToText.from_pretrained("ttrpg/mosslight-4b")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use ttrpg/mosslight-4b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ttrpg/mosslight-4b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ttrpg/mosslight-4b",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/ttrpg/mosslight-4b

SGLang

How to use ttrpg/mosslight-4b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ttrpg/mosslight-4b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ttrpg/mosslight-4b",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ttrpg/mosslight-4b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ttrpg/mosslight-4b",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use ttrpg/mosslight-4b with Docker Model Runner:
```
docker model run hf.co/ttrpg/mosslight-4b
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Mosslight 4B

Mosslight 4B is a fine-tuned, merged derivative of Qwen3.5-4B, packaged in Hugging Face Transformers format for local inference, serving, and downstream experimentation.

This repository contains the model weights, tokenizer, chat template, and multimodal preprocessor files needed to load the model with compatible Qwen3.5 tooling.

Model Details

Model name: Mosslight 4B
Model ID: ttrpg/mosslight-4b
Base model: Qwen/Qwen3.5-4B
Derivative type: fine-tuned and merged full-weight release
Architecture: Qwen3_5ForConditionalGeneration
Model type: vision-language causal generation
Parameters: approximately 4B
Native context length: 262,144 tokens, as inherited from the base config
License: Apache 2.0, inherited from the base model

Lineage

This model is a fine-tuned, merged derivative of Qwen3.5-4B from Alibaba Cloud/Qwen. The original Apache 2.0 license is preserved in LICENSE, and derivative attribution is documented in NOTICE.

Training and merge details should be completed before publishing a final public version.

Training Details

Base checkpoint: Qwen/Qwen3.5-4B
Fine-tuning method: TODO
Training data: TODO
Merge method: TODO
Output format: merged full weights in sharded Safetensors format
Post-training evaluation: TODO

Files

config.json: model architecture and multimodal configuration.
model.safetensors-00001-of-00002.safetensors
model.safetensors-00002-of-00002.safetensors
model.safetensors.index.json
tokenizer.json, tokenizer_config.json, vocab.json, merges.txt
chat_template.jinja
preprocessor_config.json, video_preprocessor_config.json
LICENSE, NOTICE

Usage

Install a Transformers build that supports Qwen3.5, then load the model using the standard Hugging Face APIs.

from transformers import AutoProcessor, AutoModelForImageTextToText

model_id = "ttrpg/mosslight-4b"

processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype="auto",
    trust_remote_code=True,
)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Briefly introduce yourself."},
        ],
    }
]

inputs = processor.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=256)
print(processor.decode(outputs[0], skip_special_tokens=True))

Serving

Use serving frameworks only after confirming they support Qwen3.5 model classes and the required multimodal processor files.

Example model identifier:

ttrpg/mosslight-4b

Intended Use

Mosslight 4B is intended for experimentation with compact multimodal assistant workflows, text generation, visual question answering, and local model serving.

Limitations

No independent benchmark results are published for this custom release yet.
Behavior and safety characteristics should be evaluated for your target use case before deployment.
This model inherits limitations from the Qwen3.5-4B base model and from the fine-tuning and merge process used for this release.

Attribution

Mosslight 4B is a fine-tuned, merged derivative based on Qwen3.5-4B. Please retain the Apache 2.0 license and attribution notices when redistributing this model or derivatives of it.