Instructions to use nvidia/NVIDIA-Nemotron-Parse-v1.2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nvidia/NVIDIA-Nemotron-Parse-v1.2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="nvidia/NVIDIA-Nemotron-Parse-v1.2", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("nvidia/NVIDIA-Nemotron-Parse-v1.2", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use nvidia/NVIDIA-Nemotron-Parse-v1.2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "nvidia/NVIDIA-Nemotron-Parse-v1.2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nvidia/NVIDIA-Nemotron-Parse-v1.2",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/nvidia/NVIDIA-Nemotron-Parse-v1.2

SGLang

How to use nvidia/NVIDIA-Nemotron-Parse-v1.2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "nvidia/NVIDIA-Nemotron-Parse-v1.2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nvidia/NVIDIA-Nemotron-Parse-v1.2",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "nvidia/NVIDIA-Nemotron-Parse-v1.2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nvidia/NVIDIA-Nemotron-Parse-v1.2",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use nvidia/NVIDIA-Nemotron-Parse-v1.2 with Docker Model Runner:
```
docker model run hf.co/nvidia/NVIDIA-Nemotron-Parse-v1.2
```

Normalize Nemotron Parse images in the processor for vLLM compatibility

by nvidia-oliver-holworthy - opened 24 days ago

base: refs/heads/main

←

from: refs/pr/6

Discussion Files changed

+203

-15

nvidia-oliver-holworthy

NVIDIA org 24 days ago

•

edited 24 days ago

Summary

Move Nemotron Parse image normalization into the Hugging Face processor so the checkpoint matches vLLM’s RADIO input contract while preserving Transformers generation output.

Background context

vLLM has changed how Nemotron Parse image preprocessing is handled across recent PRs:

Initial Nemotron Parse support landed in vLLM PR #30864.
vLLM PR #37260 moved the custom Nemotron Parse processor into vllm/transformers_utils/processors/nemotron_parse.py. At this point, CLIP image normalization was still implemented in vLLM’s local processor.
- Commit: f340324
- Moved processor: nemotron_parse.py
vLLM PR #37456 removed the local Nemotron Parse processor and its custom get_hf_processor() override, so vLLM now relies on the processor loaded from the HF checkpoint.
- Commit: 7476d14
vLLM’s RADIO implementation still skips input_conditioner.* weights because it expects image normalization to happen before tensors reach the model.
- Current code: radio.py#L763-L766
Current vLLM Nemotron Parse processing now goes through the generic HF processor path.
- Current code: nemotron_parse.py#L429-L446
vLLM PR #38748 addresses a Transformers v5 image_size tuple/scalar compatibility issue, but does not restore the removed image normalization behavior.

This PR makes the HF checkpoint’s processor perform the normalization that vLLM already expects, while updating the HF model path to avoid applying the RADIO input conditioner a second time. This keeps the final generated output unchanged in Transformers while restoring correct behavior in recent vLLM releases.

Changes

Add CLIP mean/std normalization support to NemotronParseImageProcessor.
Set do_normalize=true, image_mean, and image_std in preprocessor_config.json.
Add encoder.processor_normalizes=true to signal that RADIO preprocessing is handled by the processor.
Call RADIO’s existing make_preprocessor_external() in the HF model path to avoid double normalization in Transformers.
Cast normalized pixels to the encoder dtype after externalizing RADIO preprocessing.
Return BatchFeature from the combined processor output.
Add a vLLM golden generation test.
Add .dockerignore to avoid copying local model weights into Docker builds.

Compatibility

For normal Transformers usage with AutoProcessor + AutoModel, final generation output is unchanged.

The intentional contract change is that:

processor(...).pixel_values

now returns CLIP-normalized tensors rather than raw [0, 1] tensors.

Direct callers that bypass AutoProcessor and pass pixel_values manually should pass normalized tensors for this checkpoint.

Validation

Transformers golden tests:

transformers==4.51.3: pass
transformers==4.57.6: pass
transformers==5.5.4: pass
transformers==5.7.0: pass

vLLM golden tests:

vllm==0.20.0: pass
vllm==0.19.1: pass
vllm==0.17.0: pass
vllm==0.14.1: pass

Known vLLM failures unrelated to this normalization issue:

vllm==0.18.0 and vllm==0.18.1 fail during weight loading with a decoder shard-id error before generation.

Normalize Nemotron Parse images in the processorf200396a

emelryan changed pull request status to open 20 days ago

emelryan

NVIDIA org 20 days ago

Looks good to me. Validated on omnidocbench to not change results for transformers and reduces disrepancy between transformers and vllm.

emelryan changed pull request status to merged 20 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment