xiryss
/

llm-course-hw1

Text2Text Generation

model_hub_mixin

pytorch_model_hub_mixin

Model card Files Files and versions Community

Description:

Language model, based on Transformer architecture, trained to generate various russian jokes, using russian_jokes dataset.

Loss on validation: 2.488

Architecture details:

Multi-Head Latent Attention layer, as in DeepSeekV3 with latent dimension equal to 96
SwiGLU activation in the FeedForward layer of the Transformer Block
Decoupled Rotary Positional Embeddings, as in DeepSeekV3

Generation examples:

Заходит в бар -> Заходит в бар мужик и видит, что у него барин за рюмкой. Официант ему и говорит:- Мужики, вы, мужик, батюшки, исполнилось несколько слов, не волнуйтесь. Моя теща говорит врачу:- Доктор, да вы не болит.
Заходит в бар -> Заходит в бар русский, подходит к бармену и видит: у кого-то среди клоунов сидят 3 еврея в небе и читают: "Давай 50 грамм и играть с косой, и зачем теперь это все 98%"

Version without MHLA is available at commit 076a6c7. It has a slightly smaller loss, but 10% more parameters.

Downloads last month: 127

Safetensors

Model size

99.8M params

Tensor type

F32

·

BOOL

·

Inference Providers NEW

Text2Text Generation

This model is not currently available via any of the supported Inference Providers.

The model cannot be deployed to the HF Inference API: The model has no library tag.

Dataset used to train xiryss/llm-course-hw1