Image-Text-to-Text
Transformers
Safetensors
hunyuan_vl
text-generation
Generated from Trainer
sft
trl
conversational
Instructions to use jaqja/HunYuanOCR-SFT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use jaqja/HunYuanOCR-SFT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="jaqja/HunYuanOCR-SFT") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModelForSeq2SeqLM model = AutoModelForSeq2SeqLM.from_pretrained("jaqja/HunYuanOCR-SFT", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use jaqja/HunYuanOCR-SFT with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "jaqja/HunYuanOCR-SFT" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jaqja/HunYuanOCR-SFT", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/jaqja/HunYuanOCR-SFT
- SGLang
How to use jaqja/HunYuanOCR-SFT with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "jaqja/HunYuanOCR-SFT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jaqja/HunYuanOCR-SFT", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "jaqja/HunYuanOCR-SFT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jaqja/HunYuanOCR-SFT", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use jaqja/HunYuanOCR-SFT with Docker Model Runner:
docker model run hf.co/jaqja/HunYuanOCR-SFT
Model Card for HunYuanOCR-SFT
This model is a fine-tuned version of None. It has been trained using TRL.
Quick start
from transformers import AutoProcessor
from transformers import HunYuanVLForConditionalGeneration
from PIL import Image
import torch
model_name_or_path = "jaqja/HunYuanOCR-SFT"
PROMPT = "Extract all information of the document image and represent it in markdown format. Ensure the parsing follows the logical reading order. Do not describe or extract any figures, signatures, or seals."
processor = AutoProcessor.from_pretrained(model_name_or_path, use_fast=False)
img_path = "example.png"
image_inputs = Image.open(img_path)
messages1 = [
{
"role": "user",
"content": [
{"type": "image", "image": img_path},
{"type": "text", "text": PROMPT},
],
}
]
messages = [messages1]
texts = [
processor.apply_chat_template(msg, tokenize=False, add_generation_prompt=True)
for msg in messages
]
inputs = processor(
text=texts,
images=image_inputs,
padding=True,
return_tensors="pt",
)
model = HunYuanVLForConditionalGeneration.from_pretrained(
model_name_or_path,
attn_implementation="eager",
dtype=torch.bfloat16,
device_map="auto"
)
with torch.no_grad():
device = next(model.parameters()).device
inputs = inputs.to(device)
generated_ids = model.generate(**inputs, max_new_tokens=4096, do_sample=False)
if "input_ids" in inputs:
input_ids = inputs.input_ids
else:
print("inputs: # fallback", inputs)
input_ids = inputs.inputs
generated_ids_trimmed = [
out_ids[len(in_ids):] for in_ids, out_ids in zip(input_ids, generated_ids)
]
output_texts = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_texts[0])
Training procedure
This model was trained with SFT.
Framework versions
- TRL: 0.29.0
- Transformers: 4.57.1.dev0
- Pytorch: 2.10.0+cu128
- Datasets: 4.0.0
- Tokenizers: 0.22.2
Citations
Cite TRL as:
@software{vonwerra2020trl,
title = {{TRL: Transformers Reinforcement Learning}},
author = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
license = {Apache-2.0},
url = {https://github.com/huggingface/trl},
year = {2020}
}
@software{tencenthunyuan,
title = {{HunyuanOCR}},
author = {ManaEstras manayang and memorywxy xingyuwan},
license = {TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT},
url = {https://github.com/Tencent-Hunyuan/HunyuanOCR},
year = {2025}
}
- Downloads last month
- 4
Model tree for jaqja/HunYuanOCR-SFT
Unable to build the model tree, the base model loops to the model itself. Learn more.