TruthfulQA contamination

by YodelJudo - opened Dec 7, 2023

Dec 7, 2023

Since this model was trained on HuggingFaceH4/ultrafeedback_binarized and Allen AI has shown that the dataset suffers from TruthfulQA contamination, is it safe to conclude that this model is also subject to this contamination, or did you filter out specific entries during training?
If it is indeed trained on the contaminated dataset, can we expect a v2 with it trained on the clean dataset, such as allenai/ultrafeedback_binarized_cleaned or argilla/ultrafeedback-binarized-preferences-cleaned ?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment