Instructions to use nvidia/NVIDIA-Nemotron-Parse-v1.2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nvidia/NVIDIA-Nemotron-Parse-v1.2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="nvidia/NVIDIA-Nemotron-Parse-v1.2", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("nvidia/NVIDIA-Nemotron-Parse-v1.2", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use nvidia/NVIDIA-Nemotron-Parse-v1.2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "nvidia/NVIDIA-Nemotron-Parse-v1.2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nvidia/NVIDIA-Nemotron-Parse-v1.2", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/nvidia/NVIDIA-Nemotron-Parse-v1.2
- SGLang
How to use nvidia/NVIDIA-Nemotron-Parse-v1.2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "nvidia/NVIDIA-Nemotron-Parse-v1.2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nvidia/NVIDIA-Nemotron-Parse-v1.2", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "nvidia/NVIDIA-Nemotron-Parse-v1.2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nvidia/NVIDIA-Nemotron-Parse-v1.2", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use nvidia/NVIDIA-Nemotron-Parse-v1.2 with Docker Model Runner:
docker model run hf.co/nvidia/NVIDIA-Nemotron-Parse-v1.2
Normalize Nemotron Parse images in the processor for vLLM compatibility
Summary
Move Nemotron Parse image normalization into the Hugging Face processor so the checkpoint matches vLLM’s RADIO input contract while preserving Transformers generation output.
Background context
vLLM has changed how Nemotron Parse image preprocessing is handled across recent PRs:
- Initial Nemotron Parse support landed in vLLM PR #30864.
- vLLM PR #37260 moved the custom Nemotron Parse processor into
vllm/transformers_utils/processors/nemotron_parse.py. At this point, CLIP image normalization was still implemented in vLLM’s local processor.- Commit:
f340324 - Moved processor:
nemotron_parse.py
- Commit:
- vLLM PR #37456 removed the local Nemotron Parse processor and its custom
get_hf_processor()override, so vLLM now relies on the processor loaded from the HF checkpoint.- Commit:
7476d14
- Commit:
- vLLM’s RADIO implementation still skips
input_conditioner.*weights because it expects image normalization to happen before tensors reach the model.- Current code:
radio.py#L763-L766
- Current code:
- Current vLLM Nemotron Parse processing now goes through the generic HF processor path.
- Current code:
nemotron_parse.py#L429-L446
- Current code:
- vLLM PR #38748 addresses a Transformers v5
image_sizetuple/scalar compatibility issue, but does not restore the removed image normalization behavior.
This PR makes the HF checkpoint’s processor perform the normalization that vLLM already expects, while updating the HF model path to avoid applying the RADIO input conditioner a second time. This keeps the final generated output unchanged in Transformers while restoring correct behavior in recent vLLM releases.
Changes
- Add CLIP mean/std normalization support to
NemotronParseImageProcessor. - Set
do_normalize=true,image_mean, andimage_stdinpreprocessor_config.json. - Add
encoder.processor_normalizes=trueto signal that RADIO preprocessing is handled by the processor. - Call RADIO’s existing
make_preprocessor_external()in the HF model path to avoid double normalization in Transformers. - Cast normalized pixels to the encoder dtype after externalizing RADIO preprocessing.
- Return
BatchFeaturefrom the combined processor output. - Add a vLLM golden generation test.
- Add
.dockerignoreto avoid copying local model weights into Docker builds.
Compatibility
For normal Transformers usage with AutoProcessor + AutoModel, final generation output is unchanged.
The intentional contract change is that:
processor(...).pixel_values
now returns CLIP-normalized tensors rather than raw [0, 1] tensors.
Direct callers that bypass AutoProcessor and pass pixel_values manually should pass normalized tensors for this checkpoint.
Validation
Transformers golden tests:
transformers==4.51.3: passtransformers==4.57.6: passtransformers==5.5.4: passtransformers==5.7.0: pass
vLLM golden tests:
vllm==0.20.0: passvllm==0.19.1: passvllm==0.17.0: passvllm==0.14.1: pass
Known vLLM failures unrelated to this normalization issue:
vllm==0.18.0andvllm==0.18.1fail during weight loading with a decoder shard-id error before generation.
