ydshieh HF staff commited on
Commit
747dd4c
1 Parent(s): 4e37291

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -16,4 +16,8 @@ The model is trained on 65000 images from the COCO dataset for about 1500 steps
16
 
17
  - The provided training script `run_summarization.py` is modified to send pixel values to the model instead of a sequence of input token ids, and a necessary change due to the ViT model not accepting an `attention_mask` argument.
18
 
 
 
 
 
19
  A HuggingFace Spaces demo for this model: [🖼️ French Image Captioning Demo 📝](https://huggingface.co/spaces/flax-community/image-caption-french)
16
 
17
  - The provided training script `run_summarization.py` is modified to send pixel values to the model instead of a sequence of input token ids, and a necessary change due to the ViT model not accepting an `attention_mask` argument.
18
 
19
+ - We first tried to use [WIT : Wikipedia-based Image Text Dataset](https://github.com/google-research-datasets/wit), but found it is a very changeling task since, unlike traditional image captioning tasks, it requires the model to be able to generate different texts even if two images are similar (for example, two famous dogs might have completely different Wikipedia texts).
20
+
21
+ - We finally decided to use [COCO image dataset](https://cocodataset.org/#home) at the final day of this Flax community event. We were able to translate only about 65000 examples to French for training, and the model is trained for only 5 epochs (beyond this, it started to overfit). This leads to the poor performance.
22
+
23
  A HuggingFace Spaces demo for this model: [🖼️ French Image Captioning Demo 📝](https://huggingface.co/spaces/flax-community/image-caption-french)