Sharing training data & reproducing training

#4
by xhluca - opened

Congratulations on the paper and score! Since this was trained on public data, would it be possible for you to release the dataset you used to train on Huggingface? It'd also be great to have a training script to reproduce the training, similar to this training script recently released by LLM2Vec:

image.png

xhluca changed discussion title from Training data & running the training to Sharing training data & reproducing training

It would be great to also have access to the unidirectional models listed in the paper for research purposes. Unidirectional models are not far behind bi-directional ones so it would be great to explore them side-by-side.

Sign up or log in to comment