Craftic/llm-course-hw1-mini

Educational causal language model trained as part of a Deep Learning homework assignment on a Russian jokes corpus.

This repository contains:

a custom Byte-Level BPE tokenizer (vocabulary.json, merges.json)
model weights in model.safetensors
a minimal training/inference config in config.json

Model summary

Model type: decoder-only Transformer for causal language modeling
Tokenizer: custom Byte-Level BPE
Vocabulary size: 1024
Context length: 128 tokens
Attention: Grouped-Query Attention (n_head=6, n_kv_head=3)
Feed-forward: SwiGLU
Normalization: RMSNorm
Positional bias: ALiBi
Dropout: 0.1

Configuration

Variant: mini
Layers: 6
Hidden size: 384
Attention heads: 6
KV heads: 3
Intermediate size: 1024

Training data

Dataset: IgorVolochay/russian_jokes
Domain: Russian jokes / short humorous texts

Training setup

Optimizer: AdamW
Learning rate: 3e-4
Training steps: 10,000
Max sequence length: 128

Usage

These weights were uploaded from a custom homework implementation, so loading requires the same Python classes that were used during training.

import torch

# Define ByteLevelBPETokenizer and TransformerForCausalLM exactly as in the homework notebook.
# Then load artifacts from the Hub:

tokenizer = ByteLevelBPETokenizer.from_pretrained("Craftic/llm-course-hw1-mini")
model = TransformerForCausalLM.from_pretrained("Craftic/llm-course-hw1-mini")
model.eval()

prompt = "Муж приходит домой и говорит:"
input_ids = torch.tensor([tokenizer.encode(prompt)], dtype=torch.long)
attention_mask = torch.ones_like(input_ids)

with torch.no_grad():
    logits = model(input_ids, attention_mask)

next_token_id = logits[0, -1].argmax().item()
print(tokenizer.decode(tokenizer.encode(prompt, add_eos_token=False) + [next_token_id]))

Notes

This is an educational model, not a production-ready checkpoint.
Because the model is trained on jokes, generations may be low-quality, repetitive, or stylistically narrow.
The repository does not include a packaged inference library; the custom tokenizer/model classes should be copied from the homework notebook or moved into a Python module before loading.

Limitations

Small context window
Small vocabulary
Trained on a narrow-domain dataset
No safety alignment or moderation tuning

Intended use

Coursework demonstration
Experiments with custom tokenization and compact LMs
Lightweight local generation experiments

Not intended use

Factual QA
Safety-critical tasks
Production deployment without additional packaging, evaluation, and safeguards

Downloads last month: 28

Safetensors

Model size

10.6M params

Tensor type

F32

BOOL

Craftic
/

llm-course-hw1-mini