Model Card for HunYuanOCR-SFT

This model is a fine-tuned version of None. It has been trained using TRL.

Quick start

from transformers import AutoProcessor
from transformers import HunYuanVLForConditionalGeneration
from PIL import Image
import torch

model_name_or_path = "jaqja/HunYuanOCR-SFT"
PROMPT = "Extract all information of the document image and represent it in markdown format. Ensure the parsing follows the logical reading order. Do not describe or extract any figures, signatures, or seals."
processor = AutoProcessor.from_pretrained(model_name_or_path, use_fast=False)
img_path = "example.png"
image_inputs = Image.open(img_path)
messages1 = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": img_path},
            {"type": "text", "text": PROMPT},
        ],
    }
]
messages = [messages1]
texts = [
    processor.apply_chat_template(msg, tokenize=False, add_generation_prompt=True)
    for msg in messages
]
inputs = processor(
    text=texts,
    images=image_inputs,
    padding=True,
    return_tensors="pt",
)
model = HunYuanVLForConditionalGeneration.from_pretrained(
    model_name_or_path,
    attn_implementation="eager",
    dtype=torch.bfloat16,
    device_map="auto"
)
with torch.no_grad():
    device = next(model.parameters()).device
    inputs = inputs.to(device)
    generated_ids = model.generate(**inputs, max_new_tokens=4096, do_sample=False)
if "input_ids" in inputs:
    input_ids = inputs.input_ids
else:
    print("inputs: # fallback", inputs)
    input_ids = inputs.inputs
generated_ids_trimmed = [
    out_ids[len(in_ids):] for in_ids, out_ids in zip(input_ids, generated_ids)
]
output_texts = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_texts[0])

Training procedure

This model was trained with SFT.

GitHub

Framework versions

  • TRL: 0.29.0
  • Transformers: 4.57.1.dev0
  • Pytorch: 2.10.0+cu128
  • Datasets: 4.0.0
  • Tokenizers: 0.22.2

Citations

Cite TRL as:

@software{vonwerra2020trl,
  title   = {{TRL: Transformers Reinforcement Learning}},
  author  = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
  license = {Apache-2.0},
  url     = {https://github.com/huggingface/trl},
  year    = {2020}
}
@software{tencenthunyuan,
  title   = {{HunyuanOCR}},
  author  = {ManaEstras manayang and memorywxy xingyuwan},
  license = {TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT},
  url     = {https://github.com/Tencent-Hunyuan/HunyuanOCR},
  year    = {2025}
}
Downloads last month
4
Safetensors
Model size
1.0B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jaqja/HunYuanOCR-SFT

Unable to build the model tree, the base model loops to the model itself. Learn more.