Text Generation
Inference Endpoints

Which datasets?

by teknium - opened

Besides Vicuna, which datasets were used?

Also, do you happen to have the cleaned vicuna dataset? Would you mind publishing?

I guess you found the cleaned vicuna dataset, but for the reference it's this one: https://huggingface.co/datasets/gozfarb/ShareGPT_Vicuna_unfiltered. The other used datasets are listed in the model card, and more info about them and where they come from can be found on their corresponding pages. Despite being listed, anon8231489123/ShareGPT_Vicuna_unfiltered was not used in this version, it's mostly an honorary mention for the early cleanup operation (it was used on earlier vicuna-13b-free models).

Sign up or log in to comment