Instructions to use deepreinforce-ai/Ornith-1.0-397B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use deepreinforce-ai/Ornith-1.0-397B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="deepreinforce-ai/Ornith-1.0-397B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("deepreinforce-ai/Ornith-1.0-397B")
model = AutoModelForMultimodalLM.from_pretrained("deepreinforce-ai/Ornith-1.0-397B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use deepreinforce-ai/Ornith-1.0-397B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "deepreinforce-ai/Ornith-1.0-397B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "deepreinforce-ai/Ornith-1.0-397B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/deepreinforce-ai/Ornith-1.0-397B

SGLang

How to use deepreinforce-ai/Ornith-1.0-397B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "deepreinforce-ai/Ornith-1.0-397B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "deepreinforce-ai/Ornith-1.0-397B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "deepreinforce-ai/Ornith-1.0-397B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "deepreinforce-ai/Ornith-1.0-397B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use deepreinforce-ai/Ornith-1.0-397B with Docker Model Runner:
```
docker model run hf.co/deepreinforce-ai/Ornith-1.0-397B
```

I once again, ask for your support (if possible)

by InfernalDread - opened 2 days ago

Discussion

InfernalDread

2 days ago

Hello,

Thank you for the release of these post trained models! I humbly ask if @paragon-of-brah is willing to create GGUFs for this model in order to test its claims. In my personal case, IQ2_M would be perfect as this model seems to do fantastic under quantization.

Thank you.

paragon-of-brah

2 days ago

All right, downloading rn

InfernalDread

2 days ago

All right, downloading rn

Thank you very much! Take all the time you need, no rush at all!

DrRos

2 days ago

@paragon-of-brah and some type of Q4 (q4-k-l would be great) pretty please too. Thank you!

gopi87

1 day ago

looks like its better then the nex pro ?

InfernalDread

1 day ago

looks like its better then the nex pro ?

that's what we will be testing soon. I am very excited to see.

gopi87

1 day ago

•

edited 1 day ago

looks like its better then the nex pro ?

that's what we will be testing soon. I am very excited to see.

its looks like it is better then the nex mini i just tested with claude code and its started to fixing the error that nex mini didt fixed it . one think is clear that a 35B with 3B active will give really difficult situation to the sonnet.

edit used q8

InfernalDread

about 24 hours ago

looks like its better then the nex pro ?

that's what we will be testing soon. I am very excited to see.

its looks like it is better then the nex mini i just tested with claude code and its started to fixing the error that nex mini didt fixed it . one think is clear that a 35B with 3B active will give really difficult situation to the sonnet.

edit used q8

fantastic! hopefully they release the 31B dense version as well!

Hunterx

about 16 hours ago

Testing it now. Will be adding Svgs below: just of the bat it seems okay hasn't looped yet. On the hard tests its not on par with Kimi or GLM but here is what i got for the SVG tests:
Q8 MLX

InfernalDread

about 16 hours ago

Testing it now. Will be adding Svgs below: just of the bat it seems okay hasn't looped yet. On the hard tests its not on par with Kimi or GLM but here is what i got for the SVG tests:
Q8 MLX

could you share the hard tests where it underperforms?

gopi87

about 14 hours ago

Testing it now. Will be adding Svgs below: just of the bat it seems okay hasn't looped yet. On the hard tests its not on par with Kimi or GLM but here is what i got for the SVG tests:
Q8 MLX

try with bf16 somethinking is not good with q8

paragon-of-brah

about 8 hours ago

All right, so the MTP graft strategy just doesn't really work for ik_llama. While the MTP works, it's trained to predict the output of base Qwen 3.5 and results in low acceptance rate and low TG when used with other models such as Ornith, at least on my setup.

So now I'm looking into DFLASH instead, a novel diffusion based MTP-like that might give better TG. Ofc, this means that things are going to be slightly delayed. I'll keep you updated.

InfernalDread

21 minutes ago

All right, so the MTP graft strategy just doesn't really work for ik_llama. While the MTP works, it's trained to predict the output of base Qwen 3.5 and results in low acceptance rate and low TG when used with other models such as Ornith, at least on my setup.

So now I'm looking into DFLASH instead, a novel diffusion based MTP-like that might give better TG. Ofc, this means that things are going to be slightly delayed. I'll keep you updated.

Not a problem, thank you for taking the time to do this!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment