Craftic/llm-course-hw1-mini

Educational causal language model trained as part of a Deep Learning homework assignment on a Russian jokes corpus.

This repository contains:

  • a custom Byte-Level BPE tokenizer (vocabulary.json, merges.json)
  • model weights in model.safetensors
  • a minimal training/inference config in config.json

Model summary

  • Model type: decoder-only Transformer for causal language modeling
  • Tokenizer: custom Byte-Level BPE
  • Vocabulary size: 1024
  • Context length: 128 tokens
  • Attention: Grouped-Query Attention (n_head=6, n_kv_head=3)
  • Feed-forward: SwiGLU
  • Normalization: RMSNorm
  • Positional bias: ALiBi
  • Dropout: 0.1

Configuration

  • Variant: mini
  • Layers: 6
  • Hidden size: 384
  • Attention heads: 6
  • KV heads: 3
  • Intermediate size: 1024

Training data

Training setup

  • Optimizer: AdamW
  • Learning rate: 3e-4
  • Training steps: 10,000
  • Max sequence length: 128

Usage

These weights were uploaded from a custom homework implementation, so loading requires the same Python classes that were used during training.

import torch

# Define ByteLevelBPETokenizer and TransformerForCausalLM exactly as in the homework notebook.
# Then load artifacts from the Hub:

tokenizer = ByteLevelBPETokenizer.from_pretrained("Craftic/llm-course-hw1-mini")
model = TransformerForCausalLM.from_pretrained("Craftic/llm-course-hw1-mini")
model.eval()

prompt = "Муж приходит домой и говорит:"
input_ids = torch.tensor([tokenizer.encode(prompt)], dtype=torch.long)
attention_mask = torch.ones_like(input_ids)

with torch.no_grad():
    logits = model(input_ids, attention_mask)

next_token_id = logits[0, -1].argmax().item()
print(tokenizer.decode(tokenizer.encode(prompt, add_eos_token=False) + [next_token_id]))

Notes

  • This is an educational model, not a production-ready checkpoint.
  • Because the model is trained on jokes, generations may be low-quality, repetitive, or stylistically narrow.
  • The repository does not include a packaged inference library; the custom tokenizer/model classes should be copied from the homework notebook or moved into a Python module before loading.

Limitations

  • Small context window
  • Small vocabulary
  • Trained on a narrow-domain dataset
  • No safety alignment or moderation tuning

Intended use

  • Coursework demonstration
  • Experiments with custom tokenization and compact LMs
  • Lightweight local generation experiments

Not intended use

  • Factual QA
  • Safety-critical tasks
  • Production deployment without additional packaging, evaluation, and safeguards
Downloads last month
28
Safetensors
Model size
10.6M params
Tensor type
F32
·
BOOL
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Craftic/llm-course-hw1-mini