Model

Pythia 160M (Biderman et al., 2023), autoregressive, GPT-NeoX based, with 12 attention heads and 12 layers.

Training

Training was done for 2 epochs, with a maximum sequence length of 2048 tokens, a learning rate of 1e-4, and an effective batch size of 4.

Training loss: 2.6503

Validation loss: 2.5354

Test loss: 2.5394

Training data

Text-only English dataset of staged dialogues, comprising the following corpora:

Topical Chat (6.4M tokens) by Gopalakrishnan et al. (2023)

Persona Chat (4.3M tokens) by Zhang et al. (2018)

Daily Dialogue (1.8M tokens) by Li et al. (2017)

Total 12.5M tokens (approximately 10M words)

All corpora are publicly available for research purposes. The raw training data is not redistributed with this model; users wishing to reproduce the training setup should obtain each corpus through the official papers and sources.

References

Biderman, S., Schoelkopf, H., Anthony, Q. G., Bradley, H., O’Brien, K., Hallahan, E., ... & Van Der Wal, O. (2023, July). Pythia: A suite for analyzing large language models across training and scaling. In International conference on machine learning (pp. 2397-2430). PMLR.

Gopalakrishnan, K., Hedayatnia, B., Chen, Q., Gottardi, A., Kwatra, S., Venkatesh, A., ... & Hakkani-Tur, D. (2023). Topical-chat: Towards knowledge-grounded open-domain conversations. arXiv preprint arXiv:2308.11995.

Li, Y., Su, H., Shen, X., Li, W., Cao, Z., & Niu, S. (2017, November). Dailydialog: A manually labelled multi-turn dialogue dataset. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 986-995).

Zhang, S., Dinan, E., Urbanek, J., Szlam, A., Kiela, D., & Weston, J. (2018, July). Personalizing dialogue agents: I have a dog, do you have pets too?. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 2204-2213).

Downloads last month: 50

Safetensors

Model size

0.2B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including anonymous-sub1/staged-dialog-model

models

Collection

4 items • Updated 20 days ago

Paper for anonymous-sub1/staged-dialog-model

Topical-Chat: Towards Knowledge-Grounded Open-Domain Conversations

Paper • 2308.11995 • Published Aug 23, 2023