Instructions to use deepseek-ai/DeepSeek-OCR-2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use deepseek-ai/DeepSeek-OCR-2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="deepseek-ai/DeepSeek-OCR-2", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("deepseek-ai/DeepSeek-OCR-2", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use deepseek-ai/DeepSeek-OCR-2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "deepseek-ai/DeepSeek-OCR-2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "deepseek-ai/DeepSeek-OCR-2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/deepseek-ai/DeepSeek-OCR-2
- SGLang
How to use deepseek-ai/DeepSeek-OCR-2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "deepseek-ai/DeepSeek-OCR-2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "deepseek-ai/DeepSeek-OCR-2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "deepseek-ai/DeepSeek-OCR-2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "deepseek-ai/DeepSeek-OCR-2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use deepseek-ai/DeepSeek-OCR-2 with Docker Model Runner:
docker model run hf.co/deepseek-ai/DeepSeek-OCR-2
Is that deepseek-ocr-2 compatible for H200
Hi everyone,
I’m currently testing deployment of deepseek-ai/DeepSeek-OCR-2 using OpenShift AI on NVIDIA H200 GPUs with a vLLM-based serving stack.
The model loads successfully and /v1/models responds correctly. Text-only chat requests also work via:
/v1/chat/completions
However, when sending OCR/multimodal requests (image input using image_url or base64 payload), the inference server crashes and the engine core dies.
Environment:
- OpenShift AI
- NVIDIA H200
- vLLM serving backend
- Python 3.12
- OpenAI-compatible API endpoint
Observed behavior:
- Health endpoint remains OK initially
- POST to /v1/chat/completions triggers engine crash
- API returns HTTP 500
- Server process shuts down immediately after request
Relevant logs:
(APIServer pid=1) ERROR [core_client.py:667] Engine core proc EngineCore died unexpectedly, shutting down client.
(APIServer pid=1) ERROR [async_llm.py:707]
vllm.v1.engine.exceptions.EngineDeadError:
EngineCore encountered an issue.
POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
resource_tracker:
There appear to be leaked semaphore objects to clean up at shutdown
Questions:
- Has anyone successfully deployed DeepSeek-OCR-2 with vLLM?
- Is multimodal/OCR inference currently supported for this model in vLLM?
- Are there known compatibility issues with DeepSeek-OCR-2 + OpenAI chat endpoint?
- Did anyone require a custom serving runtime instead of standard vLLM?
Would appreciate any guidance or confirmation from others who have tested this model in production or OpenShift/Kubernetes environments.
Thanks.