Image-Text-to-Text
PaddleOCR
Safetensors
English
Chinese
multilingual
paddleocr_vl
ERNIE4.5
PaddlePaddle
image-to-text
ocr
document-parse
layout
table
formula
chart
seal
spotting
conversational
custom_code
Eval Results
Instructions to use PaddlePaddle/PaddleOCR-VL-1.6 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PaddleOCR
How to use PaddlePaddle/PaddleOCR-VL-1.6 with PaddleOCR:
# See https://www.paddleocr.ai/latest/version3.x/pipeline_usage/PaddleOCR-VL.html to installation from paddleocr import PaddleOCRVL pipeline = PaddleOCRVL(pipeline_version="v1.6") output = pipeline.predict("path/to/document_image.png") for res in output: res.print() res.save_to_json(save_path="output") res.save_to_markdown(save_path="output") - Notebooks
- Google Colab
- Kaggle
Optimized fine tuning config for RTX 4070 ti super
#5
by milad13731995 - opened
Hi, i have a RTX 4070 ti super, what params should i change in the config file run_ocr_vl_sft_16k.yaml?
I want to fine tune with Bengali training dataset.
milad13731995 changed discussion title from optimized fine tuning config for RTX 4070 ti super to Optimized fine tuning config for RTX 4070 ti super
Does PaddleOCR-VL-1.6 have official fine tuning support or tutorial?
Because I was fine tuning with hugging face trl, but the model was is hallucinating a lot.
Is there any valid framework to finetune this?