google
/

deplot

Visual Question Answering

text2text-generation

Model card Files Files and versions Community

nielsr HF staff commited on Jul 23, 2023

Commit

7f04d6c

•

1 Parent(s): b46b9db

Update README.md

Files changed (1) hide show

README.md +21 -20

README.md CHANGED Viewed

@@ -11,7 +11,8 @@ license: apache-2.0
 ---
 # Model card for DePlot
-![pull_figure](https://s3.amazonaws.com/moonup/production/uploads/62441d1d9fdefb55a0b7d12c/u8rWTawSyUegF4jzwOpNO.png)
 #  Table of Contents
@@ -30,7 +31,25 @@ The abstract of the paper states that:
 # Using the model
-## Converting from T5x to huggingface
 You can use the [`convert_pix2struct_checkpoint_to_pytorch.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/pix2struct/convert_pix2struct_original_pytorch_to_hf.py) script as follows:
 ```bash
@@ -51,24 +70,6 @@ model.push_to_hub("USERNAME/MODEL_NAME")
 processor.push_to_hub("USERNAME/MODEL_NAME")
 ```
-## Run a prediction
-You can run a prediction by querying an input image together with a question as follows:
-```python
-from transformers import Pix2StructForConditionalGeneration, Pix2StructProcessor
-import requests
-from PIL import Image
-model = Pix2StructForConditionalGeneration.from_pretrained('google/deplot')
-processor = Pix2StructProcessor.from_pretrained('google/deplot')
-url = "https://raw.githubusercontent.com/vis-nlp/ChartQA/main/ChartQA%20Dataset/val/png/5090.png"
-image = Image.open(requests.get(url, stream=True).raw)
-inputs = processor(images=image, text="Generate underlying data table of the figure below:", return_tensors="pt")
-predictions = model.generate(**inputs, max_new_tokens=512)
-print(processor.decode(predictions[0], skip_special_tokens=True))
-```
 # Contribution
 This model was originally contributed by Fangyu Liu, Julian Martin Eisenschlos et al. and added to the Hugging Face ecosystem by [Younes Belkada](https://huggingface.co/ybelkada).

 ---
 # Model card for DePlot
+<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/deplot_architecture.png"
+alt="drawing" width="600"/>
 #  Table of Contents
 # Using the model
+You can run a prediction by querying an input image together with a question as follows:
+```python
+from transformers import Pix2StructProcessor, Pix2StructForConditionalGeneration
+import requests
+from PIL import Image
+processor = Pix2StructProcessor.from_pretrained('google/deplot')
+model = Pix2StructForConditionalGeneration.from_pretrained('google/deplot')
+url = "https://raw.githubusercontent.com/vis-nlp/ChartQA/main/ChartQA%20Dataset/val/png/5090.png"
+image = Image.open(requests.get(url, stream=True).raw)
+inputs = processor(images=image, text="Generate underlying data table of the figure below:", return_tensors="pt")
+predictions = model.generate(**inputs, max_new_tokens=512)
+print(processor.decode(predictions[0], skip_special_tokens=True))
+```
+# Converting from T5x to huggingface
 You can use the [`convert_pix2struct_checkpoint_to_pytorch.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/pix2struct/convert_pix2struct_original_pytorch_to_hf.py) script as follows:
 ```bash
 processor.push_to_hub("USERNAME/MODEL_NAME")
 ```
 # Contribution
 This model was originally contributed by Fangyu Liu, Julian Martin Eisenschlos et al. and added to the Hugging Face ecosystem by [Younes Belkada](https://huggingface.co/ybelkada).