UltraFeedback contamination with TruthfulQA

#361
by natolambert - opened

Zephyr-7b-beta and a model we're building at AI2 are trained on this dataset which has TruthfulQA prompts https://huggingface.co/datasets/openbmb/UltraFeedback. Not sure the right way to filter these models, but it likely gives a not realistic boost in performance.

Hugging Face H4 org

Hi!
Good to know, thank you for your comment - can you make a list of the models you'd like to flag for having TruthfulQA in their training set? (Plus ideally the sources for all models?)

Hugging Face H4 org

Closing for inactivity

clefourrier changed discussion status to closed

Sign up or log in to comment