exnivo
/

tinybrain-100m-base

Text Generation

Model card Files Files and versions

TinyBrain-100M Base

TinyBrain-100M Base is a small causal language model trained from scratch.

Model details

Parameters: ~103M
Architecture: LLaMA-style causal transformer
Tokenizer: custom TinyBrain byte-level BPE
Vocab size: 24,000
Context length during training: 1024 tokens
Training tokens: ~2.1B
Best validation loss: 2.6779
Dataset: TinyBrain Base mixed pretraining dataset

Intended use

This is a base model, not an instruct/chat model yet. It is meant to be used for further supervised fine-tuning.

Limitations

This model may hallucinate, make factual mistakes, fail at math, and produce unreliable answers. It has not yet been instruction-tuned.

Training

Trained from scratch on TinyBrain Base using bf16 on an NVIDIA RTX PRO 6000 Blackwell Server Edition.

Downloads last month: 36

Safetensors

Model size

0.1B params

Tensor type

F32

·

Dataset used to train exnivo/tinybrain-100m-base

Collection including exnivo/tinybrain-100m-base

tinybrain

4 items • Updated 3 days ago