Is that deepseek-ocr-2 compatible for H200

#29

by balajivasudevan - opened 27 days ago

Hi everyone,

I’m currently testing deployment of deepseek-ai/DeepSeek-OCR-2 using OpenShift AI on NVIDIA H200 GPUs with a vLLM-based serving stack.

The model loads successfully and /v1/models responds correctly. Text-only chat requests also work via:

/v1/chat/completions

However, when sending OCR/multimodal requests (image input using image_url or base64 payload), the inference server crashes and the engine core dies.

Environment:

OpenShift AI
NVIDIA H200
vLLM serving backend
Python 3.12
OpenAI-compatible API endpoint

Observed behavior:

Health endpoint remains OK initially
POST to /v1/chat/completions triggers engine crash
API returns HTTP 500
Server process shuts down immediately after request

Relevant logs:

(APIServer pid=1) ERROR [core_client.py:667] Engine core proc EngineCore died unexpectedly, shutting down client.
(APIServer pid=1) ERROR [async_llm.py:707]
vllm.v1.engine.exceptions.EngineDeadError:
EngineCore encountered an issue.
POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
resource_tracker:
There appear to be leaked semaphore objects to clean up at shutdown

Questions:

Has anyone successfully deployed DeepSeek-OCR-2 with vLLM?
Is multimodal/OCR inference currently supported for this model in vLLM?
Are there known compatibility issues with DeepSeek-OCR-2 + OpenAI chat endpoint?
Did anyone require a custom serving runtime instead of standard vLLM?

Would appreciate any guidance or confirmation from others who have tested this model in production or OpenShift/Kubernetes environments.

Thanks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment