Is that deepseek-ocr-2 compatible for H200

#29
by balajivasudevan - opened

Hi everyone,

I’m currently testing deployment of deepseek-ai/DeepSeek-OCR-2 using OpenShift AI on NVIDIA H200 GPUs with a vLLM-based serving stack.

The model loads successfully and /v1/models responds correctly. Text-only chat requests also work via:

/v1/chat/completions

However, when sending OCR/multimodal requests (image input using image_url or base64 payload), the inference server crashes and the engine core dies.

Environment:

  • OpenShift AI
  • NVIDIA H200
  • vLLM serving backend
  • Python 3.12
  • OpenAI-compatible API endpoint

Observed behavior:

  • Health endpoint remains OK initially
  • POST to /v1/chat/completions triggers engine crash
  • API returns HTTP 500
  • Server process shuts down immediately after request

Relevant logs:

(APIServer pid=1) ERROR [core_client.py:667] Engine core proc EngineCore died unexpectedly, shutting down client.
(APIServer pid=1) ERROR [async_llm.py:707]
vllm.v1.engine.exceptions.EngineDeadError:
EngineCore encountered an issue.
POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
resource_tracker:
There appear to be leaked semaphore objects to clean up at shutdown

Questions:

  1. Has anyone successfully deployed DeepSeek-OCR-2 with vLLM?
  2. Is multimodal/OCR inference currently supported for this model in vLLM?
  3. Are there known compatibility issues with DeepSeek-OCR-2 + OpenAI chat endpoint?
  4. Did anyone require a custom serving runtime instead of standard vLLM?

Would appreciate any guidance or confirmation from others who have tested this model in production or OpenShift/Kubernetes environments.

Thanks.

Sign up or log in to comment