YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Model
Pythia 160M (Biderman et al., 2023), autoregressive, GPT-NeoX based, with 12 attention heads and 12 layers.
Training
Training was done for 2 epochs, with a maximum sequence length of 2048 tokens, a learning rate of 1e-4, and an effective batch size of 4.
Training loss: 2.6503
Validation loss: 2.5354
Test loss: 2.5394
Training data
Text-only English dataset of staged dialogues, comprising the following corpora:
Topical Chat (6.4M tokens) by Gopalakrishnan et al. (2023)
Persona Chat (4.3M tokens) by Zhang et al. (2018)
Daily Dialogue (1.8M tokens) by Li et al. (2017)
Total 12.5M tokens (approximately 10M words)
All corpora are publicly available for research purposes. The raw training data is not redistributed with this model; users wishing to reproduce the training setup should obtain each corpus through the official papers and sources.
References
Biderman, S., Schoelkopf, H., Anthony, Q. G., Bradley, H., O’Brien, K., Hallahan, E., ... & Van Der Wal, O. (2023, July). Pythia: A suite for analyzing large language models across training and scaling. In International conference on machine learning (pp. 2397-2430). PMLR.
Gopalakrishnan, K., Hedayatnia, B., Chen, Q., Gottardi, A., Kwatra, S., Venkatesh, A., ... & Hakkani-Tur, D. (2023). Topical-chat: Towards knowledge-grounded open-domain conversations. arXiv preprint arXiv:2308.11995.
Li, Y., Su, H., Shen, X., Li, W., Cao, Z., & Niu, S. (2017, November). Dailydialog: A manually labelled multi-turn dialogue dataset. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 986-995).
Zhang, S., Dinan, E., Urbanek, J., Szlam, A., Kiela, D., & Weston, J. (2018, July). Personalizing dialogue agents: I have a dog, do you have pets too?. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 2204-2213).
- Downloads last month
- 50