TinyBrain-100M Base

TinyBrain-100M Base is a small causal language model trained from scratch.

Model details

  • Parameters: ~103M
  • Architecture: LLaMA-style causal transformer
  • Tokenizer: custom TinyBrain byte-level BPE
  • Vocab size: 24,000
  • Context length during training: 1024 tokens
  • Training tokens: ~2.1B
  • Best validation loss: 2.6779
  • Dataset: TinyBrain Base mixed pretraining dataset

Intended use

This is a base model, not an instruct/chat model yet. It is meant to be used for further supervised fine-tuning.

Limitations

This model may hallucinate, make factual mistakes, fail at math, and produce unreliable answers. It has not yet been instruction-tuned.

Training

Trained from scratch on TinyBrain Base using bf16 on an NVIDIA RTX PRO 6000 Blackwell Server Edition.

Downloads last month
36
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train exnivo/tinybrain-100m-base

Collection including exnivo/tinybrain-100m-base