Instructions to use bmax16634/sologpt-v3-150m-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use bmax16634/sologpt-v3-150m-base with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="bmax16634/sologpt-v3-150m-base", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("bmax16634/sologpt-v3-150m-base", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use bmax16634/sologpt-v3-150m-base with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "bmax16634/sologpt-v3-150m-base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "bmax16634/sologpt-v3-150m-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/bmax16634/sologpt-v3-150m-base

SGLang

How to use bmax16634/sologpt-v3-150m-base with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "bmax16634/sologpt-v3-150m-base" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "bmax16634/sologpt-v3-150m-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "bmax16634/sologpt-v3-150m-base" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "bmax16634/sologpt-v3-150m-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use bmax16634/sologpt-v3-150m-base with Docker Model Runner:
```
docker model run hf.co/bmax16634/sologpt-v3-150m-base
```

SoloLLM v3 150M Base

SoloLLM v3 150M Base is a from-scratch GPT-style decoder-only language model trained on one RTX 3090 as part of the SoloLLM project. It is a base text completion model, not an instruction-tuned chatbot.

The project goal was to build a full small-LM engineering loop: dataset construction, PyTorch model implementation, single-GPU pretraining, checkpoint recovery, evaluation, ablation, and an honest comparison against GPT-2 small.

Headline Result

The final 150M model beats GPT-2 small overall on the fixed SoloLLM v3 evaluation suite. A smaller 123M ablation also beats GPT-2 on most external checks, but it does not beat GPT-2 across every metric.

Model	Params	Train tokens	Held-out PPL	WikiText-2 PPL	LAMBADA PPL	MC avg acc norm
GPT-2 small	124.44M	public	25.32	45.32	40.62	41.05%
SoloLLM v3 123M	123.55M	9.80B	25.64	41.87	36.28	42.46%
SoloLLM v3 150M	151.87M	10.00B	24.90	41.18	35.35	42.71%

The honest claim is:

SoloLLM v3 trains GPT-2-class base LMs from scratch on one RTX 3090. The final 150M model beats GPT-2 small overall on a fixed evaluation suite, while a slightly smaller 123M model beats GPT-2 on most external benchmarks but does not fully clear the strict across-board smaller-than-GPT-2 bar.

Model Details

Item	Value
Architecture	Decoder-only GPT-style transformer
Parameters	151,868,928
Context length	1024
Tokenizer	GPT-2 tokenizer
Embedding width	768
Layers	16
Attention heads	12
Positional method	RoPE
Normalization	RMSNorm
MLP	SwiGLU
Weight tying	Input/output embeddings tied
Training hardware	Single RTX 3090
Training tokens	10,000,007,168

Training Data

The model was trained on a curated 10B-token mixture:

Source	Accepted tokens	Share
FineWeb-Edu `sample-10BT`	4,000,001,532	40%
DCLM baseline	2,500,001,319	25%
FineWeb `sample-10BT`	1,499,997,774	15%
English Wikipedia	999,998,937	10%
OpenWebText	1,000,000,972	10%

The dataset was filtered, deduplicated by normalized document hash, and packed into 1024-token training shards.

Files

File	Purpose
`model.safetensors`	Final model state dict
`config.json`	Model/training config used to instantiate `SoloGPT_v2`
`config_resolved.json`	Resolved run config from training
`metrics_summary.json`	Training summary for the final checkpoint
`model.py`	Minimal SoloGPT model implementation used by this checkpoint
`configuration_sologpt.py`	Hugging Face `AutoConfig` remote-code wrapper
`modeling_sologpt.py`	Hugging Face `AutoModelForCausalLM` remote-code wrapper
`tokenizer.json`	GPT-2 tokenizer used for training and inference
`tokenizer_config.json`	Tokenizer metadata with 1024-token context and EOS-as-pad
`load_example.py`	Example loading and sampling script
`docs/v3_final_gpt2_comparison.md`	Full final result writeup
`docs/project_page.md`	Short portfolio-style project page

Usage

This repo supports Hugging Face AutoModelForCausalLM loading through custom remote code. Pass trust_remote_code=True when loading the model.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "bmax16634/sologpt-v3-150m-base"

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(repo_id, trust_remote_code=True).to(device)
model.eval()

prompt = "The future of artificial intelligence is"
inputs = tokenizer(prompt, return_tensors="pt").to(device)

with torch.no_grad():
    output_ids = model.generate(
        **inputs,
        max_new_tokens=40,
        do_sample=True,
        temperature=0.8,
        top_k=40,
        use_cache=False,
        remove_invalid_values=True,
        renormalize_logits=True,
        pad_token_id=tokenizer.eos_token_id,
    )

print(tokenizer.decode(output_ids[0], skip_special_tokens=True))

For a runnable example, see load_example.py. For low-level state-dict loading, the raw PyTorch implementation is still included as model.py.

Intended Use

This model is intended for:

educational inspection of a small from-scratch base LM,
text-completion experiments,
reproducing the SoloLLM v3 evaluation story,
portfolio/research engineering review.

It is not intended for production use, high-stakes decisions, factual QA, or chat/instruction-following use without additional tuning and safety evaluation.

Limitations

This is a small base model, not an assistant.
It can generate incorrect, biased, repetitive, or unsafe text.
It has no retrieval, tool use, or instruction tuning.
The strict smaller-than-GPT-2 across-board claim is not proven by this model; the winning 150M checkpoint is larger than GPT-2 small.
Training data came from broad public web/text sources and may contain undesirable content despite filtering.

License

The SoloLLM code and released weights are provided under the MIT License by the author. Training data sources retain their own licenses and terms.

Project Links

Author: Benjamin Maxwell
Final 150M model: https://huggingface.co/bmax16634/sologpt-v3-150m-base
Smaller 123M ablation: https://huggingface.co/bmax16634/sologpt-v3-123m-base
Public completion demo: https://huggingface.co/spaces/bmax16634/sologpt-v3-150m-demo
Original v1 model: https://huggingface.co/bmax16634/sologpt-base-v1
Main result artifact in this repo: docs/v3_final_gpt2_comparison.md

Downloads last month: 108

Safetensors

Model size

0.2B params

Tensor type

F32

Datasets used to train bmax16634/sologpt-v3-150m-base

Space using bmax16634/sologpt-v3-150m-base 1

Collection including bmax16634/sologpt-v3-150m-base

SoloLLM

Collection

SoloLLM project artifacts: final v3 GPT-2-class base model, smaller 123M ablation, public completion demo, and legacy v1 baseline. • 5 items • Updated 4 days ago

Evaluation results

Held-out perplexity on SoloLLM project held-out OpenWebText-style shards
self-reported

24.899
WikiText-2 perplexity on WikiText-2 test
self-reported

41.181
LAMBADA perplexity on LAMBADA
self-reported

35.347
LAMBADA last-word accuracy on LAMBADA
self-reported

0.331