spankevich/llm-course-hw1

The model was developed as part of an NLP course at HSE. The task was to create a model capable of generating anecdotes in Russian. This involved writing a tokenizer and implementing Byte Pair Encoding (BPE), followed by building a custom Transformer model. The model incorporates SwiGLU activation functions, Grouped Query Attention for optimization, and ALiBI positional embeddings. It was then trained on a dataset of Russian anecdotes.

The training resulted in a validation cross-entropy loss of 1.300, while the training loss reached 1.17.

Here are some examples of generated anecdotes starting with the prefix "Заходит":

"Заходит как-то мужик в магазин. Видит - бармен, а вокруг него, снимает голову. - Ну, как ты думаешь, что ли? - Да нет, сын мой!"
"Заходит в бар и говорит: — Девушка, а что это вы так много плохая? — А как же вы хотите, что вы не видите, что вы не знаете? — А какая разница? — Потому, что вы можете? — Подумайте, что этот фильм? — Да нет, но ведь этот факт, какой-то я не могу."

Although the cross-entropy loss is relatively low (1.17, with a vocabulary size of 1024), the actual quality of the generated anecdotes is not very good. The generated text often lacks coherence and logical structure.

Attached are the charts for quality, learning rate, and training epochs:

spankevich
/

llm-course-hw1

Dataset used to train spankevich/llm-course-hw1