yifeihu
/

TF-ID-base-no-caption

@@ -15,18 +15,17 @@ TF-ID (Table/Figure IDentifier) is a family of object detection models finetuned
 | Model   | Model size | Model Description |
 | ------- | ------------- |   ------------- |
 | TF-ID-base[[HF]](https://huggingface.co/yifeihu/TF-ID-base) | 0.23B  | Extract tables/figures and their caption text
-| TF-ID-large[[HF]](https://huggingface.co/yifeihu/TF-ID-large) | 0.77B  | Extract tables/figures and their caption text
 | TF-ID-base-no-caption[[HF]](https://huggingface.co/yifeihu/TF-ID-base-no-caption) | 0.23B  | Extract tables/figures without caption text
-| TF-ID-large-no-caption[[HF]](https://huggingface.co/yifeihu/TF-ID-large-no-caption) | 0.77B  | Extract tables/figures without caption text
 All TF-ID models are finetuned from [microsoft/Florence-2](https://huggingface.co/microsoft/Florence-2-large-ft) checkpoints.
-The models were finetuned with papers from Hugging Face Daily Papers. All bounding boxes are manually annotated and checked by humans.
-TF-ID models take an image of a single paper page as the input, and return bounding boxes for all tables and figures in the given page.
-TF-ID-base and TF-ID-large draw bounding boxes around tables/figures and their caption text.
-TF-ID-base-no-caption and TF-ID-large-no-caption draw bounding boxes around tables/figures without their caption text.
 ![image/png](https://huggingface.co/yifeihu/TF-ID-base/resolve/main/td-id-caption.png)
@@ -34,6 +33,10 @@ Object Detection results format:
 {'\<OD>': {'bboxes': [[x1, y1, x2, y2], ...],
 'labels': ['label1', 'label2', ...]} }
 ## Benchmarks
 We tested the models on paper pages outside the training dataset. The papers are a subset of huggingface daily paper.
@@ -59,18 +62,16 @@ Use the code below to get started with the model.
 ```python
 import requests
 from PIL import Image
-from transformers import AutoProcessor, AutoModelForCausalLM
-model = AutoModelForCausalLM.from_pretrained("yifeihu/TF-ID-base-no-caption", trust_remote_code=True)
-processor = AutoProcessor.from_pretrained("yifeihu/TF-ID-base-no-caption", trust_remote_code=True)
 prompt = "<OD>"
 url = "https://huggingface.co/yifeihu/TF-ID-base/resolve/main/arxiv_2305_10853_5.png?download=true"
 image = Image.open(requests.get(url, stream=True).raw)
 inputs = processor(text=prompt, images=image, return_tensors="pt")
 generated_ids = model.generate(
     input_ids=inputs["input_ids"],
     pixel_values=inputs["pixel_values"],
@@ -78,8 +79,8 @@ generated_ids = model.generate(
     do_sample=False,
     num_beams=3
 )
-generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
 parsed_answer = processor.post_process_generation(generated_text, task="<OD>", image_size=(image.width, image.height))
 print(parsed_answer)
@@ -87,16 +88,15 @@ print(parsed_answer)
 To visualize the results, see [this tutorial notebook](https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/how-to-finetune-florence-2-on-detection-dataset.ipynb) for more details.
-## Finetuning Code and Dataset
-Coming soon!
 ## BibTex and citation info
 ```
-@misc{TF-ID,
-      url={[https://huggingface.co/yifeihu/TF-ID-base](https://huggingface.co/yifeihu/TF-ID-base)},
-      title={TF-ID: Table/Figure IDentifier for academic papers},
-      author={"Yifei Hu"}
 }
 ```

 | Model   | Model size | Model Description |
 | ------- | ------------- |   ------------- |
 | TF-ID-base[[HF]](https://huggingface.co/yifeihu/TF-ID-base) | 0.23B  | Extract tables/figures and their caption text
+| TF-ID-large[[HF]](https://huggingface.co/yifeihu/TF-ID-large) (Recommended) | 0.77B  | Extract tables/figures and their caption text
 | TF-ID-base-no-caption[[HF]](https://huggingface.co/yifeihu/TF-ID-base-no-caption) | 0.23B  | Extract tables/figures without caption text
+| TF-ID-large-no-caption[[HF]](https://huggingface.co/yifeihu/TF-ID-large-no-caption) (Recommended) | 0.77B  | Extract tables/figures without caption text
 All TF-ID models are finetuned from [microsoft/Florence-2](https://huggingface.co/microsoft/Florence-2-large-ft) checkpoints.
+- The models were finetuned with papers from Hugging Face Daily Papers. All bounding boxes are manually annotated and checked by humans.
+- TF-ID models take an image of a single paper page as the input, and return bounding boxes for all tables and figures in the given page.
+- TF-ID-base and TF-ID-large draw bounding boxes around tables/figures and their caption text.
+- TF-ID-base-no-caption and TF-ID-large-no-caption draw bounding boxes around tables/figures without their caption text.
+**Large models are always recommended!**
 ![image/png](https://huggingface.co/yifeihu/TF-ID-base/resolve/main/td-id-caption.png)
 {'\<OD>': {'bboxes': [[x1, y1, x2, y2], ...],
 'labels': ['label1', 'label2', ...]} }
+## Training Code and Dataset
+- Dataset: [yifeihu/TF-ID-arxiv-papers](https://huggingface.co/datasets/yifeihu/TF-ID-arxiv-papers)
+- Code: [github.com/ai8hyf/TF-ID](https://github.com/ai8hyf/TF-ID)
 ## Benchmarks
 We tested the models on paper pages outside the training dataset. The papers are a subset of huggingface daily paper.
 ```python
 import requests
 from PIL import Image
+from transformers import AutoProcessor, AutoModelForCausalLM
+model = AutoModelForCausalLM.from_pretrained("yifeihu/TF-ID-base", trust_remote_code=True)
+processor = AutoProcessor.from_pretrained("yifeihu/TF-ID-base", trust_remote_code=True)
 prompt = "<OD>"
 url = "https://huggingface.co/yifeihu/TF-ID-base/resolve/main/arxiv_2305_10853_5.png?download=true"
 image = Image.open(requests.get(url, stream=True).raw)
 inputs = processor(text=prompt, images=image, return_tensors="pt")
 generated_ids = model.generate(
     input_ids=inputs["input_ids"],
     pixel_values=inputs["pixel_values"],
     do_sample=False,
     num_beams=3
 )
+generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
 parsed_answer = processor.post_process_generation(generated_text, task="<OD>", image_size=(image.width, image.height))
 print(parsed_answer)
 To visualize the results, see [this tutorial notebook](https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/how-to-finetune-florence-2-on-detection-dataset.ipynb) for more details.
 ## BibTex and citation info
 ```
+@misc{TF-ID,
+  author = {Yifei Hu},
+  title = {TF-ID: Table/Figure IDentifier for academic papers},
+  year = {2024},
+  publisher = {GitHub},
+  journal = {GitHub repository},
+  howpublished = {\url{https://github.com/ai8hyf/TF-ID}},
 }
 ```