Exact Training data used?

by nlpguy - opened Apr 10

Apr 10

•

Thanks for this amazing model. Is there an exact breakdown by source of the 1T Tokens used for training, or is there a specific collection of public corpuses that were used available?

psinger

H2O.ai org Apr 16

Please take a look at the updated section in the technical report: https://arxiv.org/abs/2401.16818

psinger changed discussion status to closed Apr 16

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment