Update README.md
Browse files
README.md
CHANGED
|
@@ -6,103 +6,57 @@ tags:
|
|
| 6 |
- vision
|
| 7 |
- ocr
|
| 8 |
- segmentation
|
| 9 |
-
datasets:
|
| 10 |
-
- yifeihu/TF-ID-arxiv-papers
|
| 11 |
---
|
| 12 |
-
|
| 13 |
-
# TF-ID: Table/Figure IDentifier for academic papers
|
| 14 |
|
| 15 |
## Model Summary
|
| 16 |
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
| TF-ID-base[[HF]](https://huggingface.co/yifeihu/TF-ID-base) | 0.23B | Extract tables/figures and their caption text
|
| 21 |
-
| TF-ID-large[[HF]](https://huggingface.co/yifeihu/TF-ID-large) (Recommended) | 0.77B | Extract tables/figures and their caption text
|
| 22 |
-
| TF-ID-base-no-caption[[HF]](https://huggingface.co/yifeihu/TF-ID-base-no-caption) | 0.23B | Extract tables/figures without caption text
|
| 23 |
-
| TF-ID-large-no-caption[[HF]](https://huggingface.co/yifeihu/TF-ID-large-no-caption) (Recommended) | 0.77B | Extract tables/figures without caption text
|
| 24 |
-
All TF-ID models are finetuned from [microsoft/Florence-2](https://huggingface.co/microsoft/Florence-2-large-ft) checkpoints.
|
| 25 |
|
| 26 |
-
|
| 27 |
-
- TF-ID models take an image of a single paper page as the input, and return bounding boxes for all tables and figures in the given page.
|
| 28 |
-
- TF-ID-base and TF-ID-large draw bounding boxes around tables/figures and their caption text.
|
| 29 |
-
- TF-ID-base-no-caption and TF-ID-large-no-caption draw bounding boxes around tables/figures without their caption text.
|
| 30 |
|
| 31 |
-
|
| 32 |
|
| 33 |
-
|
|
|
|
| 34 |
|
| 35 |
-
Object Detection results format:
|
| 36 |
-
{'\<OD>': {'bboxes': [[x1, y1, x2, y2], ...],
|
| 37 |
-
'labels': ['label1', 'label2', ...]} }
|
| 38 |
|
| 39 |
## Training Code and Dataset
|
| 40 |
-
- Dataset: [
|
| 41 |
-
- Code: [github.com/
|
| 42 |
|
| 43 |
## Benchmarks
|
| 44 |
|
| 45 |
-
We
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
| Model | Total Images | Correct Output | Success Rate |
|
| 50 |
-
|---------------------------------------------------------------|--------------|----------------|--------------|
|
| 51 |
-
| TF-ID-base[[HF]](https://huggingface.co/yifeihu/TF-ID-base) | 258 | 251 | 97.29% |
|
| 52 |
-
| TF-ID-large[[HF]](https://huggingface.co/yifeihu/TF-ID-large) | 258 | 253 | 98.06% |
|
| 53 |
-
|
| 54 |
-
| Model | Total Images | Correct Output | Success Rate |
|
| 55 |
-
|---------------------------------------------------------------|--------------|----------------|--------------|
|
| 56 |
-
| TF-ID-base-no-caption[[HF]](https://huggingface.co/yifeihu/TF-ID-base-no-caption) | 261 | 253 | 96.93% |
|
| 57 |
-
| TF-ID-large-no-caption[[HF]](https://huggingface.co/yifeihu/TF-ID-large-no-caption) | 261 | 254 | 97.32% |
|
| 58 |
-
|
| 59 |
-
Depending on the use cases, some "incorrect" output could be totally usable. For example, the model draw two bounding boxes for one figure with two child components.
|
| 60 |
-
|
| 61 |
-
## How to Get Started with the Model
|
| 62 |
-
|
| 63 |
-
Use the code below to get started with the model.
|
| 64 |
-
|
| 65 |
-
```python
|
| 66 |
-
import requests
|
| 67 |
-
from PIL import Image
|
| 68 |
-
from transformers import AutoProcessor, AutoModelForCausalLM
|
| 69 |
-
|
| 70 |
-
model = AutoModelForCausalLM.from_pretrained("yifeihu/TF-ID-base", trust_remote_code=True)
|
| 71 |
-
processor = AutoProcessor.from_pretrained("yifeihu/TF-ID-base", trust_remote_code=True)
|
| 72 |
|
| 73 |
-
|
|
|
|
| 74 |
|
| 75 |
-
|
| 76 |
-
image = Image.open(requests.get(url, stream=True).raw)
|
| 77 |
|
| 78 |
-
|
| 79 |
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
print(parsed_answer)
|
| 92 |
|
| 93 |
-
|
| 94 |
|
| 95 |
-
To visualize the results, see [this tutorial notebook](https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/how-to-finetune-florence-2-on-detection-dataset.ipynb) for more details.
|
| 96 |
|
| 97 |
## BibTex and citation info
|
| 98 |
|
| 99 |
```
|
| 100 |
-
|
| 101 |
-
author = {Yifei Hu},
|
| 102 |
-
title = {TF-ID: Table/Figure IDentifier for academic papers},
|
| 103 |
-
year = {2024},
|
| 104 |
-
publisher = {GitHub},
|
| 105 |
-
journal = {GitHub repository},
|
| 106 |
-
howpublished = {\url{https://github.com/ai8hyf/TF-ID}},
|
| 107 |
-
}
|
| 108 |
```
|
|
|
|
| 6 |
- vision
|
| 7 |
- ocr
|
| 8 |
- segmentation
|
|
|
|
|
|
|
| 9 |
---
|
| 10 |
+
# VisualHeist - figure, scheme and table segmentation from PDFs (with captions, headers & footnotes)
|
|
|
|
| 11 |
|
| 12 |
## Model Summary
|
| 13 |
|
| 14 |
+
VisualHeist is an object detection model finetuned to extract tables and figures from PDFs. VisualHeist has two versions:
|
| 15 |
+
- visualheist-base[[HF]](https://huggingface.co/shixuanleong/visualheist-base) (0.23B)
|
| 16 |
+
- visualheist-large[[HF]](https://huggingface.co/shixuanleong/visualheist-large) (0.77B)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
|
| 18 |
+
**The base model is recommended if you are running it on low-RAM systems**
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
+
The models are finetuned from [microsoft/Florence-2](https://huggingface.co/microsoft/Florence-2-large-ft) checkpoints. VisualHeist is inspired by and adapted from [yifeihu/TF-ID](https://huggingface.co/yifeihu/TF-ID-large)
|
| 21 |
|
| 22 |
+
- The models were finetuned with 3435 figures and 1716 tables from 110 PDF articles across various publishers. All bounding boxes are manually annotated using [CoCo Annotator](https://github.com/jsbroks/coco-annotator).
|
| 23 |
+
- TF-ID models take an image of a single paper page as the input, and return image files for all figures, schemes and tables in the given page.
|
| 24 |
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
## Training Code and Dataset
|
| 27 |
+
- Dataset: [Zenodo repository](https://doi.org/10.5281/zenodo.14917752)
|
| 28 |
+
- Code: [github.com/aspuru-guzik-group/MERMaid](https://github.com/aspuru-guzik-group/MERMaid)
|
| 29 |
|
| 30 |
## Benchmarks
|
| 31 |
|
| 32 |
+
We manually curated a diverse evaluation dataset consisting of 121 literature articles covering a range of topics, including
|
| 33 |
+
organic and inorganic chemistry, atmospheric science, batteries, materials science, metal-organic frameworks (MOFs), biology,
|
| 34 |
+
and science education. These PDFs, published between 1949 and 2025, include both main articles and supplementary materials.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
|
| 36 |
+
We also additionally curated another collection of 98 literature articles (MERMaid-100) reporting novel reaction methodologies that spans
|
| 37 |
+
three distinct chemical domains: organic electrosynthesis, photocatalysis, and organic synthesis.
|
| 38 |
|
| 39 |
+
Additional performance discussion can be found from our [preprint article](XXXXXXX)
|
|
|
|
| 40 |
|
| 41 |
+
The full DOI lists can be downloaded from our[Zenodo repository](https://doi.org/10.5281/zenodo.14917752).
|
| 42 |
|
| 43 |
+
The evaluation results for visualheist-large are:
|
| 44 |
+
| | Total Images | F1 score |
|
| 45 |
+
|---------------------------------------------------------------|--------------|----------------|
|
| 46 |
+
| All | 1935 | 93% |
|
| 47 |
+
| Main | 423 | 96% |
|
| 48 |
+
| pre-2000 | 260 | 93% |
|
| 49 |
+
| Supplementary Materials | 1252 | 92% |
|
| 50 |
+
| MERMaid-100 | 100 | 99% |
|
| 51 |
|
| 52 |
+
|
| 53 |
+
## Running the Model
|
|
|
|
| 54 |
|
| 55 |
+
Refer to our [github repository](https://github.com/aspuru-guzik-group/MERMaid) for detailed instructions on how to run the model
|
| 56 |
|
|
|
|
| 57 |
|
| 58 |
## BibTex and citation info
|
| 59 |
|
| 60 |
```
|
| 61 |
+
<To be updated with our archive citation>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 62 |
```
|