Specify RLHF data for the Instruct and Chat versions in model card

by markding - opened Jul 27, 2023

Jul 27, 2023

The model card doesn't seem to offer details or info on how the Instruct and Chat versions were RLHF'd/instruction-tuned. This is what the release blog post says:

RedPajama-INCITE-Chat-7B-v0.1 is its chat counterpart trained over Dolly 2.0 and Open Assistant

RedPajama-INCITE-Instruct-7B-v0.1 is instruction tuned for few-shot applications. We follow the recipe for GPT-JT but eliminate all datasets that overlap with the HELM benchmark.

It would be useful to specify exactly which datasets were included / excluded, to spare interested users the trouble of figure out what the HELM benchmark includes and how it does or does not overlap with GPT-JT.

mauriceweber

Together org Jul 31, 2023

Hi @markding , thanks for your question!

The instruction-tuned model was trained on instructions from the P3 and Natural Instructions datasets, which were decontaminated against HELM (you can find more details about the decontamination strategy in this blog bost).

You can find the resulting decontaminated dataset used to train the instruction tuned model here: https://huggingface.co/datasets/togethercomputer/RedPajama-Data-Instruct -- the metadata here contains a source field pointing to the task / dataset where the instance comes from.

markding

Oct 8, 2023

Excellent! It probably won't surprise you given the evident care taken in releasing and documenting this, but we find that RedPajama-INCITE makes it into the top 5 of our openness leaderboard https://opening-up-chatgpt.github.io/

KaiserWhoLearns

Oct 17, 2023

Thanks for the information @mauriceweber ! I just want to confirm that, Chain of Thought and the Pile are not used in the fine-tuning of INCITE-7B-Instruct?
Asking this because We follow the recipe for GPT-JT and the GPT-JT doc says that Specifically, we first conduct training for 2.62 billion tokens using the UL2 loss on the Pile, followed by 0.92 billion tokens with a mixture of the above datasets: 5% of COT, 20% of P3, 20% of NI, and 55% of the Pile., while I have only seen P3 and Natural Questions in https://huggingface.co/datasets/togethercomputer/RedPajama-Data-Instruct

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment