Did you use the full database for training?

by Hoioi - opened Jul 24, 2023

Jul 24, 2023

Did you use the full database of dolphin and RedPajama to train this model? (In this case it should be more than 8 million rows).

andreaskoepf

OpenAssistant org Jul 24, 2023

•

edited Jul 24, 2023

shahules786/orca-chat combines similar examples of the GPT-4 subset of ehartford/dolphin (i.e. only the GPT-4 entries are used).
25% of RedPajama was used .. use find the numbers also in the readme:

Dataset Composition:
    Tain (sampled):
       orca-chat: 188842
       fanfics: 47760
       red_pajama: 188262
    Valid:
       orca-chat: 5000
       fanfics: 1000
       red_pajama: 1000

andreaskoepf

OpenAssistant org Jul 24, 2023

changed number formatting

Hoioi changed discussion status to closed Jul 24, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment