Russian Jokes Transformer Model
A model for generating Russian jokes based on a modified Transformer architecture.
Model Features
- Specialization: trained on a dataset of Russian jokes (135k examples)
- Tokenization: Byte-Level BPE with a vocabulary size of 1024
- Architecture Features:
- ALiBi (Attention with Linear Biases) for positional encoding
- GQA (Grouped-Query Attention)
- SwiGLU in FFN layers
- RMSNorm instead of LayerNorm
- Configurations:
- Nano (3 layers, 4 heads, 96 hidden)
- Mini (6 layers, 6 heads, 384 hidden)
- Small (12 layers, 12 heads, 768 hidden)
Technical Specifications
- Context Window: 128 tokens
- Special Tokens: [EOS] for sequence end
- Average Token Length: ~70 per example
- Regularization: Dropout 0.1
- Optimizer: AdamW with weight decay 0.01
- Training: 10k steps with linear warmup
Usage
REPO_NAME = 'bikmish/llm-course-hw1'
device = torch.device("cuda")
tokenizer = ByteLevelBPETokenizer.from_pretrained(REPO_NAME)
check_model = TransformerForCausalLM.from_pretrained(REPO_NAME)
check_model = check_model.to(device)
check_model = check_model.eval()
text = "Штирлиц пришел домой"
input_ids = torch.tensor(tokenizer.encode(text), device=device)
model_output = check_model.generate(
input_ids[None, :], max_new_tokens=200, eos_token_id=tokenizer.eos_token_id, do_sample=True, top_k=10
)
tokenizer.decode(model_output[0].tolist())
Example of output (разрыв всего)
Штирлиц пришел домой с работы, приехал.
Преподаватель к себе и вижу: - Давай зайдем сегодня на работу!
- А как ты думаешь, что мы тебя не пьем?
- Дык нет.
- А ты что, тогда находишься?
- А ты не знаешь - кто?
- Дверь откроется!
- Downloads last month
- 0
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support