Image-to-Text
Transformers
Safetensors
English
paddleocr_vl
image-text-to-text
ocsr
chemistry
paddleocr-vl
smiles
molecular-structure-recognition
custom_code
Instructions to use L2658183739/PaddleOCR-VL-1.5-OCSR with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use L2658183739/PaddleOCR-VL-1.5-OCSR with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "image-to-text" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("image-to-text", model="L2658183739/PaddleOCR-VL-1.5-OCSR", trust_remote_code=True)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("L2658183739/PaddleOCR-VL-1.5-OCSR", trust_remote_code=True) model = AutoModelForMultimodalLM.from_pretrained("L2658183739/PaddleOCR-VL-1.5-OCSR", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
PaddleOCR-VL-1.5-OCSR
This repository hosts a merged fine-tuned OCSR model based on PaddleOCR-VL-1.5.
The task is:
- input: molecular structure image
- output: canonical SMILES
Base Model
- Base model family:
PaddleOCR-VL-1.5 - Fine-tuning method: LoRA SFT, then merged export
Training Objective
The fine-tuning line is restricted to a single output space:
- canonical SMILES only
Incompatible target spaces such as ssml_normed, chemfig, and LaTeX formula targets were excluded from the main line.
Current Local Evaluation
canonical_smiles_main_v1
- canonical exact accuracy:
32.86% - token micro F1:
70.35% - valid SMILES rate:
71.84% - mean fingerprint Tanimoto:
0.6992
ocsr_realworld_mixed_eval_v1p1
- canonical exact accuracy:
33.77% - token micro F1:
70.18% - valid SMILES rate:
75.84% - mean fingerprint Tanimoto:
0.6849
Source-level observations
- relatively strong on
uob - moderate on
uspto - weak on
real_world - very weak on
decimerhandwritten structures - weak on
edu_chemc
Intended Use
This model is intended for:
- OCSR research
- benchmarking
- pipeline prototyping
- weak-domain error analysis
It is not yet a state-of-the-art production OCSR model.
Limitations
- performance is highly uneven across data domains
- handwritten chemical structures remain a major failure mode
- real-world photos/scans are still weak
- exact benchmark performance is below strong public OCSR baselines such as MolNexTR / MolScribe on standard UOB and USPTO comparisons
Files
This repository includes:
- merged model weights
- tokenizer files
- processor/preprocessor files
- remote-code files required by the custom model class
Notes
This model card reflects the current V2-1 baseline state in the associated project workspace.
- Downloads last month
- 11