Training data details

#8
by floschne - opened

Hi, and thanks for this amazing work!

Could you please elaborate on the training data? I have the following questions :-)

  • "558K filtered image-text pairs from LAION/CC/SBU, captioned by BLIP." --> What do you mean by captioned by BLIP? All of the mentioned datasets already have captions, no?
  • "40K ShareGPT data." --> ShareGPT is text-only. Does that mean, you trained on text-only CLM or do you actually mean ShareGPT4V, which is multi-modal?
  • I assume that most if not all of the textual data is in English, correct?

Sign up or log in to comment