SoloLLM v3 150M Base

SoloLLM v3 150M Base is a from-scratch GPT-style decoder-only language model trained on one RTX 3090 as part of the SoloLLM project. It is a base text completion model, not an instruction-tuned chatbot.

The project goal was to build a full small-LM engineering loop: dataset construction, PyTorch model implementation, single-GPU pretraining, checkpoint recovery, evaluation, ablation, and an honest comparison against GPT-2 small.

Headline Result

The final 150M model beats GPT-2 small overall on the fixed SoloLLM v3 evaluation suite. A smaller 123M ablation also beats GPT-2 on most external checks, but it does not beat GPT-2 across every metric.

Model Params Train tokens Held-out PPL WikiText-2 PPL LAMBADA PPL MC avg acc norm
GPT-2 small 124.44M public 25.32 45.32 40.62 41.05%
SoloLLM v3 123M 123.55M 9.80B 25.64 41.87 36.28 42.46%
SoloLLM v3 150M 151.87M 10.00B 24.90 41.18 35.35 42.71%

The honest claim is:

SoloLLM v3 trains GPT-2-class base LMs from scratch on one RTX 3090. The final 150M model beats GPT-2 small overall on a fixed evaluation suite, while a slightly smaller 123M model beats GPT-2 on most external benchmarks but does not fully clear the strict across-board smaller-than-GPT-2 bar.

Model Details

Item Value
Architecture Decoder-only GPT-style transformer
Parameters 151,868,928
Context length 1024
Tokenizer GPT-2 tokenizer
Embedding width 768
Layers 16
Attention heads 12
Positional method RoPE
Normalization RMSNorm
MLP SwiGLU
Weight tying Input/output embeddings tied
Training hardware Single RTX 3090
Training tokens 10,000,007,168

Training Data

The model was trained on a curated 10B-token mixture:

Source Accepted tokens Share
FineWeb-Edu sample-10BT 4,000,001,532 40%
DCLM baseline 2,500,001,319 25%
FineWeb sample-10BT 1,499,997,774 15%
English Wikipedia 999,998,937 10%
OpenWebText 1,000,000,972 10%

The dataset was filtered, deduplicated by normalized document hash, and packed into 1024-token training shards.

Files

File Purpose
model.safetensors Final model state dict
config.json Model/training config used to instantiate SoloGPT_v2
config_resolved.json Resolved run config from training
metrics_summary.json Training summary for the final checkpoint
model.py Minimal SoloGPT model implementation used by this checkpoint
configuration_sologpt.py Hugging Face AutoConfig remote-code wrapper
modeling_sologpt.py Hugging Face AutoModelForCausalLM remote-code wrapper
tokenizer.json GPT-2 tokenizer used for training and inference
tokenizer_config.json Tokenizer metadata with 1024-token context and EOS-as-pad
load_example.py Example loading and sampling script
docs/v3_final_gpt2_comparison.md Full final result writeup
docs/project_page.md Short portfolio-style project page

Usage

This repo supports Hugging Face AutoModelForCausalLM loading through custom remote code. Pass trust_remote_code=True when loading the model.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "bmax16634/sologpt-v3-150m-base"

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(repo_id, trust_remote_code=True).to(device)
model.eval()

prompt = "The future of artificial intelligence is"
inputs = tokenizer(prompt, return_tensors="pt").to(device)

with torch.no_grad():
    output_ids = model.generate(
        **inputs,
        max_new_tokens=40,
        do_sample=True,
        temperature=0.8,
        top_k=40,
        use_cache=False,
        remove_invalid_values=True,
        renormalize_logits=True,
        pad_token_id=tokenizer.eos_token_id,
    )

print(tokenizer.decode(output_ids[0], skip_special_tokens=True))

For a runnable example, see load_example.py. For low-level state-dict loading, the raw PyTorch implementation is still included as model.py.

Intended Use

This model is intended for:

  • educational inspection of a small from-scratch base LM,
  • text-completion experiments,
  • reproducing the SoloLLM v3 evaluation story,
  • portfolio/research engineering review.

It is not intended for production use, high-stakes decisions, factual QA, or chat/instruction-following use without additional tuning and safety evaluation.

Limitations

  • This is a small base model, not an assistant.
  • It can generate incorrect, biased, repetitive, or unsafe text.
  • It has no retrieval, tool use, or instruction tuning.
  • The strict smaller-than-GPT-2 across-board claim is not proven by this model; the winning 150M checkpoint is larger than GPT-2 small.
  • Training data came from broad public web/text sources and may contain undesirable content despite filtering.

License

The SoloLLM code and released weights are provided under the MIT License by the author. Training data sources retain their own licenses and terms.

Project Links

Downloads last month
108
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train bmax16634/sologpt-v3-150m-base

Space using bmax16634/sologpt-v3-150m-base 1

Collection including bmax16634/sologpt-v3-150m-base

Evaluation results

  • Held-out perplexity on SoloLLM project held-out OpenWebText-style shards
    self-reported
    24.899
  • WikiText-2 perplexity on WikiText-2 test
    self-reported
    41.181
  • LAMBADA perplexity on LAMBADA
    self-reported
    35.347
  • LAMBADA last-word accuracy on LAMBADA
    self-reported
    0.331