Image-Text-to-Text
Transformers
Safetensors
gemma4
on-device
agentic
multimodal
vision
audio
speech-to-text
conversational
Instructions to use Loke-60000/rin-mobile-preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Loke-60000/rin-mobile-preview with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Loke-60000/rin-mobile-preview") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("Loke-60000/rin-mobile-preview") model = AutoModelForMultimodalLM.from_pretrained("Loke-60000/rin-mobile-preview") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Loke-60000/rin-mobile-preview with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Loke-60000/rin-mobile-preview" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Loke-60000/rin-mobile-preview", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Loke-60000/rin-mobile-preview
- SGLang
How to use Loke-60000/rin-mobile-preview with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Loke-60000/rin-mobile-preview" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Loke-60000/rin-mobile-preview", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Loke-60000/rin-mobile-preview" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Loke-60000/rin-mobile-preview", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use Loke-60000/rin-mobile-preview with Docker Model Runner:
docker model run hf.co/Loke-60000/rin-mobile-preview
Rin-mobile is a compact model destined to run agentic work directly on a phone or a laptop, with no server and no cloud. It was trained on about 895,000 tokens to give it one steady voice, named Rin, that is clear and composed.
What it does
- Text to text. Chat, coding, technical help, and long horizon agentic tasks that hold together across many steps.
- Image to text. Look at a picture and describe or reason about it.
- Speech to text. Take an audio clip and transcribe or answer from it.
It also does private step by step reasoning and tool calls.
Run it on device
Quantized for phone class hardware (about 4.4 GB):
ollama pull Loke-60000/rin-mobile-preview
ollama run Loke-60000/rin-mobile-preview "what is in this photo? image.png"
ollama run Loke-60000/rin-mobile-preview "transcribe this clip.wav"
Run it with transformers
from transformers import AutoProcessor, AutoModelForImageTextToText
model = AutoModelForImageTextToText.from_pretrained("Loke-60000/rin-mobile-preview")
processor = AutoProcessor.from_pretrained("Loke-60000/rin-mobile-preview")
- Downloads last month
- 50
