Instructions to use protoLabsAI/Ornith-1.0-35B-FP8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use protoLabsAI/Ornith-1.0-35B-FP8 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="protoLabsAI/Ornith-1.0-35B-FP8") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("protoLabsAI/Ornith-1.0-35B-FP8") model = AutoModelForMultimodalLM.from_pretrained("protoLabsAI/Ornith-1.0-35B-FP8") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use protoLabsAI/Ornith-1.0-35B-FP8 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "protoLabsAI/Ornith-1.0-35B-FP8" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "protoLabsAI/Ornith-1.0-35B-FP8", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/protoLabsAI/Ornith-1.0-35B-FP8
- SGLang
How to use protoLabsAI/Ornith-1.0-35B-FP8 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "protoLabsAI/Ornith-1.0-35B-FP8" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "protoLabsAI/Ornith-1.0-35B-FP8", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "protoLabsAI/Ornith-1.0-35B-FP8" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "protoLabsAI/Ornith-1.0-35B-FP8", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use protoLabsAI/Ornith-1.0-35B-FP8 with Docker Model Runner:
docker model run hf.co/protoLabsAI/Ornith-1.0-35B-FP8
Dual RTX 5090 test
I have found that running this model on my dual RTX 5090 setup using vLLM delivers excellent performance. I requested that it generate an FPS shooter game in HTML, and it produced approximately 5,400 lines of HTML, JavaScript, and CSS within five minutes. However, the resulting game was entirely non-functional, and the model spent the subsequent hour attempting to resolve the issues without success. It repeatedly entered repetitive loops without making any progress.
I then asked it to rewrite the same game as a single HTML file. It generated approximately 2,800 lines of HTML, JavaScript, and CSS in about eight minutes. Unfortunately, the game contained numerous defects, and the model was unable to resolve them over the following 20 minutes, again cycling through the same unproductive patterns for the entire duration.
In comparison, Qwen3.6-27B remains the superior model for this type of task.