shixuanleong commited on
Commit
3743383
·
verified ·
1 Parent(s): 4c1bf84

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -75
README.md CHANGED
@@ -6,103 +6,57 @@ tags:
6
  - vision
7
  - ocr
8
  - segmentation
9
- datasets:
10
- - yifeihu/TF-ID-arxiv-papers
11
  ---
12
-
13
- # TF-ID: Table/Figure IDentifier for academic papers
14
 
15
  ## Model Summary
16
 
17
- TF-ID (Table/Figure IDentifier) is a family of object detection models finetuned to extract tables and figures in academic papers created by [Yifei Hu](https://x.com/hu_yifei). They come in four versions:
18
- | Model | Model size | Model Description |
19
- | ------- | ------------- | ------------- |
20
- | TF-ID-base[[HF]](https://huggingface.co/yifeihu/TF-ID-base) | 0.23B | Extract tables/figures and their caption text
21
- | TF-ID-large[[HF]](https://huggingface.co/yifeihu/TF-ID-large) (Recommended) | 0.77B | Extract tables/figures and their caption text
22
- | TF-ID-base-no-caption[[HF]](https://huggingface.co/yifeihu/TF-ID-base-no-caption) | 0.23B | Extract tables/figures without caption text
23
- | TF-ID-large-no-caption[[HF]](https://huggingface.co/yifeihu/TF-ID-large-no-caption) (Recommended) | 0.77B | Extract tables/figures without caption text
24
- All TF-ID models are finetuned from [microsoft/Florence-2](https://huggingface.co/microsoft/Florence-2-large-ft) checkpoints.
25
 
26
- - The models were finetuned with papers from Hugging Face Daily Papers. All bounding boxes are manually annotated and checked by humans.
27
- - TF-ID models take an image of a single paper page as the input, and return bounding boxes for all tables and figures in the given page.
28
- - TF-ID-base and TF-ID-large draw bounding boxes around tables/figures and their caption text.
29
- - TF-ID-base-no-caption and TF-ID-large-no-caption draw bounding boxes around tables/figures without their caption text.
30
 
31
- **Large models are always recommended!**
32
 
33
- ![image/png](https://huggingface.co/yifeihu/TF-ID-base/resolve/main/td-id-caption.png)
 
34
 
35
- Object Detection results format:
36
- {'\<OD>': {'bboxes': [[x1, y1, x2, y2], ...],
37
- 'labels': ['label1', 'label2', ...]} }
38
 
39
  ## Training Code and Dataset
40
- - Dataset: [yifeihu/TF-ID-arxiv-papers](https://huggingface.co/datasets/yifeihu/TF-ID-arxiv-papers)
41
- - Code: [github.com/ai8hyf/TF-ID](https://github.com/ai8hyf/TF-ID)
42
 
43
  ## Benchmarks
44
 
45
- We tested the models on paper pages outside the training dataset. The papers are a subset of huggingface daily paper.
46
-
47
- Correct output - the model draws correct bounding boxes for every table/figure in the given page.
48
-
49
- | Model | Total Images | Correct Output | Success Rate |
50
- |---------------------------------------------------------------|--------------|----------------|--------------|
51
- | TF-ID-base[[HF]](https://huggingface.co/yifeihu/TF-ID-base) | 258 | 251 | 97.29% |
52
- | TF-ID-large[[HF]](https://huggingface.co/yifeihu/TF-ID-large) | 258 | 253 | 98.06% |
53
-
54
- | Model | Total Images | Correct Output | Success Rate |
55
- |---------------------------------------------------------------|--------------|----------------|--------------|
56
- | TF-ID-base-no-caption[[HF]](https://huggingface.co/yifeihu/TF-ID-base-no-caption) | 261 | 253 | 96.93% |
57
- | TF-ID-large-no-caption[[HF]](https://huggingface.co/yifeihu/TF-ID-large-no-caption) | 261 | 254 | 97.32% |
58
-
59
- Depending on the use cases, some "incorrect" output could be totally usable. For example, the model draw two bounding boxes for one figure with two child components.
60
-
61
- ## How to Get Started with the Model
62
-
63
- Use the code below to get started with the model.
64
-
65
- ```python
66
- import requests
67
- from PIL import Image
68
- from transformers import AutoProcessor, AutoModelForCausalLM
69
-
70
- model = AutoModelForCausalLM.from_pretrained("yifeihu/TF-ID-base", trust_remote_code=True)
71
- processor = AutoProcessor.from_pretrained("yifeihu/TF-ID-base", trust_remote_code=True)
72
 
73
- prompt = "<OD>"
 
74
 
75
- url = "https://huggingface.co/yifeihu/TF-ID-base/resolve/main/arxiv_2305_10853_5.png?download=true"
76
- image = Image.open(requests.get(url, stream=True).raw)
77
 
78
- inputs = processor(text=prompt, images=image, return_tensors="pt")
79
 
80
- generated_ids = model.generate(
81
- input_ids=inputs["input_ids"],
82
- pixel_values=inputs["pixel_values"],
83
- max_new_tokens=1024,
84
- do_sample=False,
85
- num_beams=3
86
- )
87
- generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
88
 
89
- parsed_answer = processor.post_process_generation(generated_text, task="<OD>", image_size=(image.width, image.height))
90
-
91
- print(parsed_answer)
92
 
93
- ```
94
 
95
- To visualize the results, see [this tutorial notebook](https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/how-to-finetune-florence-2-on-detection-dataset.ipynb) for more details.
96
 
97
  ## BibTex and citation info
98
 
99
  ```
100
- @misc{TF-ID,
101
- author = {Yifei Hu},
102
- title = {TF-ID: Table/Figure IDentifier for academic papers},
103
- year = {2024},
104
- publisher = {GitHub},
105
- journal = {GitHub repository},
106
- howpublished = {\url{https://github.com/ai8hyf/TF-ID}},
107
- }
108
  ```
 
6
  - vision
7
  - ocr
8
  - segmentation
 
 
9
  ---
10
+ # VisualHeist - figure, scheme and table segmentation from PDFs (with captions, headers & footnotes)
 
11
 
12
  ## Model Summary
13
 
14
+ VisualHeist is an object detection model finetuned to extract tables and figures from PDFs. VisualHeist has two versions:
15
+ - visualheist-base[[HF]](https://huggingface.co/shixuanleong/visualheist-base) (0.23B)
16
+ - visualheist-large[[HF]](https://huggingface.co/shixuanleong/visualheist-large) (0.77B)
 
 
 
 
 
17
 
18
+ **The base model is recommended if you are running it on low-RAM systems**
 
 
 
19
 
20
+ The models are finetuned from [microsoft/Florence-2](https://huggingface.co/microsoft/Florence-2-large-ft) checkpoints. VisualHeist is inspired by and adapted from [yifeihu/TF-ID](https://huggingface.co/yifeihu/TF-ID-large)
21
 
22
+ - The models were finetuned with 3435 figures and 1716 tables from 110 PDF articles across various publishers. All bounding boxes are manually annotated using [CoCo Annotator](https://github.com/jsbroks/coco-annotator).
23
+ - TF-ID models take an image of a single paper page as the input, and return image files for all figures, schemes and tables in the given page.
24
 
 
 
 
25
 
26
  ## Training Code and Dataset
27
+ - Dataset: [Zenodo repository](https://doi.org/10.5281/zenodo.14917752)
28
+ - Code: [github.com/aspuru-guzik-group/MERMaid](https://github.com/aspuru-guzik-group/MERMaid)
29
 
30
  ## Benchmarks
31
 
32
+ We manually curated a diverse evaluation dataset consisting of 121 literature articles covering a range of topics, including
33
+ organic and inorganic chemistry, atmospheric science, batteries, materials science, metal-organic frameworks (MOFs), biology,
34
+ and science education. These PDFs, published between 1949 and 2025, include both main articles and supplementary materials.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
+ We also additionally curated another collection of 98 literature articles (MERMaid-100) reporting novel reaction methodologies that spans
37
+ three distinct chemical domains: organic electrosynthesis, photocatalysis, and organic synthesis.
38
 
39
+ Additional performance discussion can be found from our [preprint article](XXXXXXX)
 
40
 
41
+ The full DOI lists can be downloaded from our[Zenodo repository](https://doi.org/10.5281/zenodo.14917752).
42
 
43
+ The evaluation results for visualheist-large are:
44
+ | | Total Images | F1 score |
45
+ |---------------------------------------------------------------|--------------|----------------|
46
+ | All | 1935 | 93% |
47
+ | Main | 423 | 96% |
48
+ | pre-2000 | 260 | 93% |
49
+ | Supplementary Materials | 1252 | 92% |
50
+ | MERMaid-100 | 100 | 99% |
51
 
52
+
53
+ ## Running the Model
 
54
 
55
+ Refer to our [github repository](https://github.com/aspuru-guzik-group/MERMaid) for detailed instructions on how to run the model
56
 
 
57
 
58
  ## BibTex and citation info
59
 
60
  ```
61
+ <To be updated with our archive citation>
 
 
 
 
 
 
 
62
  ```