Instructions to use deepreinforce-ai/Ornith-1.0-397B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use deepreinforce-ai/Ornith-1.0-397B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="deepreinforce-ai/Ornith-1.0-397B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("deepreinforce-ai/Ornith-1.0-397B") model = AutoModelForMultimodalLM.from_pretrained("deepreinforce-ai/Ornith-1.0-397B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use deepreinforce-ai/Ornith-1.0-397B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "deepreinforce-ai/Ornith-1.0-397B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "deepreinforce-ai/Ornith-1.0-397B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/deepreinforce-ai/Ornith-1.0-397B
- SGLang
How to use deepreinforce-ai/Ornith-1.0-397B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "deepreinforce-ai/Ornith-1.0-397B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "deepreinforce-ai/Ornith-1.0-397B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "deepreinforce-ai/Ornith-1.0-397B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "deepreinforce-ai/Ornith-1.0-397B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use deepreinforce-ai/Ornith-1.0-397B with Docker Model Runner:
docker model run hf.co/deepreinforce-ai/Ornith-1.0-397B
I once again, ask for your support (if possible)
Hello,
Thank you for the release of these post trained models! I humbly ask if @paragon-of-brah is willing to create GGUFs for this model in order to test its claims. In my personal case, IQ2_M would be perfect as this model seems to do fantastic under quantization.
Thank you.
All right, downloading rn
All right, downloading rn
Thank you very much! Take all the time you need, no rush at all!
looks like its better then the nex pro ?
looks like its better then the nex pro ?
that's what we will be testing soon. I am very excited to see.
looks like its better then the nex pro ?
that's what we will be testing soon. I am very excited to see.
its looks like it is better then the nex mini i just tested with claude code and its started to fixing the error that nex mini didt fixed it . one think is clear that a 35B with 3B active will give really difficult situation to the sonnet.
edit used q8
looks like its better then the nex pro ?
that's what we will be testing soon. I am very excited to see.
its looks like it is better then the nex mini i just tested with claude code and its started to fixing the error that nex mini didt fixed it . one think is clear that a 35B with 3B active will give really difficult situation to the sonnet.
edit used q8
fantastic! hopefully they release the 31B dense version as well!
All right, so the MTP graft strategy just doesn't really work for ik_llama. While the MTP works, it's trained to predict the output of base Qwen 3.5 and results in low acceptance rate and low TG when used with other models such as Ornith, at least on my setup.
So now I'm looking into DFLASH instead, a novel diffusion based MTP-like that might give better TG. Ofc, this means that things are going to be slightly delayed. I'll keep you updated.
All right, so the MTP graft strategy just doesn't really work for ik_llama. While the MTP works, it's trained to predict the output of base Qwen 3.5 and results in low acceptance rate and low TG when used with other models such as Ornith, at least on my setup.
So now I'm looking into DFLASH instead, a novel diffusion based MTP-like that might give better TG. Ofc, this means that things are going to be slightly delayed. I'll keep you updated.
Not a problem, thank you for taking the time to do this!


