Craftic/llm-course-hw1-mini
Educational causal language model trained as part of a Deep Learning homework assignment on a Russian jokes corpus.
This repository contains:
- a custom Byte-Level BPE tokenizer (
vocabulary.json, merges.json)
- model weights in
model.safetensors
- a minimal training/inference config in
config.json
Model summary
- Model type: decoder-only Transformer for causal language modeling
- Tokenizer: custom Byte-Level BPE
- Vocabulary size: 1024
- Context length: 128 tokens
- Attention: Grouped-Query Attention (
n_head=6, n_kv_head=3)
- Feed-forward: SwiGLU
- Normalization: RMSNorm
- Positional bias: ALiBi
- Dropout: 0.1
Configuration
- Variant:
mini
- Layers: 6
- Hidden size: 384
- Attention heads: 6
- KV heads: 3
- Intermediate size: 1024
Training data
Training setup
- Optimizer: AdamW
- Learning rate: 3e-4
- Training steps: 10,000
- Max sequence length: 128
Usage
These weights were uploaded from a custom homework implementation, so loading requires the same Python classes that were used during training.
import torch
tokenizer = ByteLevelBPETokenizer.from_pretrained("Craftic/llm-course-hw1-mini")
model = TransformerForCausalLM.from_pretrained("Craftic/llm-course-hw1-mini")
model.eval()
prompt = "Муж приходит домой и говорит:"
input_ids = torch.tensor([tokenizer.encode(prompt)], dtype=torch.long)
attention_mask = torch.ones_like(input_ids)
with torch.no_grad():
logits = model(input_ids, attention_mask)
next_token_id = logits[0, -1].argmax().item()
print(tokenizer.decode(tokenizer.encode(prompt, add_eos_token=False) + [next_token_id]))
Notes
- This is an educational model, not a production-ready checkpoint.
- Because the model is trained on jokes, generations may be low-quality, repetitive, or stylistically narrow.
- The repository does not include a packaged inference library; the custom tokenizer/model classes should be copied from the homework notebook or moved into a Python module before loading.
Limitations
- Small context window
- Small vocabulary
- Trained on a narrow-domain dataset
- No safety alignment or moderation tuning
Intended use
- Coursework demonstration
- Experiments with custom tokenization and compact LMs
- Lightweight local generation experiments
Not intended use
- Factual QA
- Safety-critical tasks
- Production deployment without additional packaging, evaluation, and safeguards