Instructions to use bmax16634/sologpt-v3-150m-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use bmax16634/sologpt-v3-150m-base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="bmax16634/sologpt-v3-150m-base", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("bmax16634/sologpt-v3-150m-base", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use bmax16634/sologpt-v3-150m-base with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "bmax16634/sologpt-v3-150m-base" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bmax16634/sologpt-v3-150m-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/bmax16634/sologpt-v3-150m-base
- SGLang
How to use bmax16634/sologpt-v3-150m-base with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "bmax16634/sologpt-v3-150m-base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bmax16634/sologpt-v3-150m-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "bmax16634/sologpt-v3-150m-base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bmax16634/sologpt-v3-150m-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use bmax16634/sologpt-v3-150m-base with Docker Model Runner:
docker model run hf.co/bmax16634/sologpt-v3-150m-base
SoloLLM v3 150M Base
SoloLLM v3 150M Base is a from-scratch GPT-style decoder-only language model trained on one RTX 3090 as part of the SoloLLM project. It is a base text completion model, not an instruction-tuned chatbot.
The project goal was to build a full small-LM engineering loop: dataset construction, PyTorch model implementation, single-GPU pretraining, checkpoint recovery, evaluation, ablation, and an honest comparison against GPT-2 small.
Headline Result
The final 150M model beats GPT-2 small overall on the fixed SoloLLM v3 evaluation suite. A smaller 123M ablation also beats GPT-2 on most external checks, but it does not beat GPT-2 across every metric.
| Model | Params | Train tokens | Held-out PPL | WikiText-2 PPL | LAMBADA PPL | MC avg acc norm |
|---|---|---|---|---|---|---|
| GPT-2 small | 124.44M | public | 25.32 | 45.32 | 40.62 | 41.05% |
| SoloLLM v3 123M | 123.55M | 9.80B | 25.64 | 41.87 | 36.28 | 42.46% |
| SoloLLM v3 150M | 151.87M | 10.00B | 24.90 | 41.18 | 35.35 | 42.71% |
The honest claim is:
SoloLLM v3 trains GPT-2-class base LMs from scratch on one RTX 3090. The final 150M model beats GPT-2 small overall on a fixed evaluation suite, while a slightly smaller 123M model beats GPT-2 on most external benchmarks but does not fully clear the strict across-board smaller-than-GPT-2 bar.
Model Details
| Item | Value |
|---|---|
| Architecture | Decoder-only GPT-style transformer |
| Parameters | 151,868,928 |
| Context length | 1024 |
| Tokenizer | GPT-2 tokenizer |
| Embedding width | 768 |
| Layers | 16 |
| Attention heads | 12 |
| Positional method | RoPE |
| Normalization | RMSNorm |
| MLP | SwiGLU |
| Weight tying | Input/output embeddings tied |
| Training hardware | Single RTX 3090 |
| Training tokens | 10,000,007,168 |
Training Data
The model was trained on a curated 10B-token mixture:
| Source | Accepted tokens | Share |
|---|---|---|
FineWeb-Edu sample-10BT |
4,000,001,532 | 40% |
| DCLM baseline | 2,500,001,319 | 25% |
FineWeb sample-10BT |
1,499,997,774 | 15% |
| English Wikipedia | 999,998,937 | 10% |
| OpenWebText | 1,000,000,972 | 10% |
The dataset was filtered, deduplicated by normalized document hash, and packed into 1024-token training shards.
Files
| File | Purpose |
|---|---|
model.safetensors |
Final model state dict |
config.json |
Model/training config used to instantiate SoloGPT_v2 |
config_resolved.json |
Resolved run config from training |
metrics_summary.json |
Training summary for the final checkpoint |
model.py |
Minimal SoloGPT model implementation used by this checkpoint |
configuration_sologpt.py |
Hugging Face AutoConfig remote-code wrapper |
modeling_sologpt.py |
Hugging Face AutoModelForCausalLM remote-code wrapper |
tokenizer.json |
GPT-2 tokenizer used for training and inference |
tokenizer_config.json |
Tokenizer metadata with 1024-token context and EOS-as-pad |
load_example.py |
Example loading and sampling script |
docs/v3_final_gpt2_comparison.md |
Full final result writeup |
docs/project_page.md |
Short portfolio-style project page |
Usage
This repo supports Hugging Face AutoModelForCausalLM loading through custom
remote code. Pass trust_remote_code=True when loading the model.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
repo_id = "bmax16634/sologpt-v3-150m-base"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(repo_id, trust_remote_code=True).to(device)
model.eval()
prompt = "The future of artificial intelligence is"
inputs = tokenizer(prompt, return_tensors="pt").to(device)
with torch.no_grad():
output_ids = model.generate(
**inputs,
max_new_tokens=40,
do_sample=True,
temperature=0.8,
top_k=40,
use_cache=False,
remove_invalid_values=True,
renormalize_logits=True,
pad_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
For a runnable example, see load_example.py. For low-level state-dict loading,
the raw PyTorch implementation is still included as model.py.
Intended Use
This model is intended for:
- educational inspection of a small from-scratch base LM,
- text-completion experiments,
- reproducing the SoloLLM v3 evaluation story,
- portfolio/research engineering review.
It is not intended for production use, high-stakes decisions, factual QA, or chat/instruction-following use without additional tuning and safety evaluation.
Limitations
- This is a small base model, not an assistant.
- It can generate incorrect, biased, repetitive, or unsafe text.
- It has no retrieval, tool use, or instruction tuning.
- The strict smaller-than-GPT-2 across-board claim is not proven by this model; the winning 150M checkpoint is larger than GPT-2 small.
- Training data came from broad public web/text sources and may contain undesirable content despite filtering.
License
The SoloLLM code and released weights are provided under the MIT License by the author. Training data sources retain their own licenses and terms.
Project Links
- Author: Benjamin Maxwell
- Final 150M model: https://huggingface.co/bmax16634/sologpt-v3-150m-base
- Smaller 123M ablation: https://huggingface.co/bmax16634/sologpt-v3-123m-base
- Public completion demo: https://huggingface.co/spaces/bmax16634/sologpt-v3-150m-demo
- Original v1 model: https://huggingface.co/bmax16634/sologpt-base-v1
- Main result artifact in this repo:
docs/v3_final_gpt2_comparison.md
- Downloads last month
- 108
Datasets used to train bmax16634/sologpt-v3-150m-base
HuggingFaceFW/fineweb-edu
HuggingFaceFW/fineweb
Space using bmax16634/sologpt-v3-150m-base 1
Collection including bmax16634/sologpt-v3-150m-base
Evaluation results
- Held-out perplexity on SoloLLM project held-out OpenWebText-style shardsself-reported24.899
- WikiText-2 perplexity on WikiText-2 testself-reported41.181
- LAMBADA perplexity on LAMBADAself-reported35.347
- LAMBADA last-word accuracy on LAMBADAself-reported0.331