YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Model

Pythia 160M (Biderman et al., 2023), autoregressive, GPT-NeoX based, with 12 attention heads and 12 layers.

Training

Training was done for 2 epochs, with a maximum sequence length of 2048 tokens, a learning rate of 1e-4, and an effective batch size of 4.

Training loss: 2.6503

Validation loss: 2.5354

Test loss: 2.5394

Training data

Text-only English dataset of staged dialogues, comprising the following corpora:

Topical Chat (6.4M tokens) by Gopalakrishnan et al. (2023)

Persona Chat (4.3M tokens) by Zhang et al. (2018)

Daily Dialogue (1.8M tokens) by Li et al. (2017)

Total 12.5M tokens (approximately 10M words)

All corpora are publicly available for research purposes. The raw training data is not redistributed with this model; users wishing to reproduce the training setup should obtain each corpus through the official papers and sources.

References

Biderman, S., Schoelkopf, H., Anthony, Q. G., Bradley, H., O’Brien, K., Hallahan, E., ... & Van Der Wal, O. (2023, July). Pythia: A suite for analyzing large language models across training and scaling. In International conference on machine learning (pp. 2397-2430). PMLR.

Gopalakrishnan, K., Hedayatnia, B., Chen, Q., Gottardi, A., Kwatra, S., Venkatesh, A., ... & Hakkani-Tur, D. (2023). Topical-chat: Towards knowledge-grounded open-domain conversations. arXiv preprint arXiv:2308.11995.

Li, Y., Su, H., Shen, X., Li, W., Cao, Z., & Niu, S. (2017, November). Dailydialog: A manually labelled multi-turn dialogue dataset. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 986-995).

Zhang, S., Dinan, E., Urbanek, J., Szlam, A., Kiela, D., & Weston, J. (2018, July). Personalizing dialogue agents: I have a dog, do you have pets too?. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 2204-2213).

Downloads last month
50
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including anonymous-sub1/staged-dialog-model

Paper for anonymous-sub1/staged-dialog-model