Instructions to use bmax16634/sologpt-v3-123m-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use bmax16634/sologpt-v3-123m-base with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="bmax16634/sologpt-v3-123m-base", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("bmax16634/sologpt-v3-123m-base", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use bmax16634/sologpt-v3-123m-base with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "bmax16634/sologpt-v3-123m-base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "bmax16634/sologpt-v3-123m-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/bmax16634/sologpt-v3-123m-base

SGLang

How to use bmax16634/sologpt-v3-123m-base with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "bmax16634/sologpt-v3-123m-base" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "bmax16634/sologpt-v3-123m-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "bmax16634/sologpt-v3-123m-base" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "bmax16634/sologpt-v3-123m-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use bmax16634/sologpt-v3-123m-base with Docker Model Runner:
```
docker model run hf.co/bmax16634/sologpt-v3-123m-base
```

SoloLLM v3 123M Base

SoloLLM v3 123M Base is the smaller-than-GPT-2 ablation from the SoloLLM v3 project. It is a from-scratch GPT-style decoder-only base language model trained on one RTX 3090.

This is not the final best SoloLLM checkpoint. The final best model is bmax16634/sologpt-v3-150m-base. This 123M model is published because it is slightly smaller than GPT-2 small and documents the strict smaller-model test.

Bottom Line

The 123M model is slightly smaller than GPT-2 small and beats GPT-2 on most external checks, but it does not beat GPT-2 across every metric. It loses the project held-out perplexity comparison and some fixed-prompt generation diversity/repetition diagnostics.

Model	Params	Train tokens	Held-out PPL	WikiText-2 PPL	LAMBADA PPL	MC avg acc norm
GPT-2 small	124.44M	public	25.32	45.32	40.62	41.05%
SoloLLM v3 123M	123.55M	9.80B	25.64	41.87	36.28	42.46%
SoloLLM v3 150M	151.87M	10.00B	24.90	41.18	35.35	42.71%

The honest read:

The 123M model is a strong smaller-than-GPT-2 ablation, but it does not prove that a smaller model beats GPT-2 small across the board.

Model Details

Item	Value
Architecture	Decoder-only GPT-style transformer
Parameters	123,551,232
Context length	1024
Tokenizer	GPT-2 tokenizer
Embedding width	768
Layers	12
Attention heads	12
Positional method	RoPE
Normalization	RMSNorm
MLP	SwiGLU
Weight tying	Input/output embeddings tied
Training hardware	Single RTX 3090
Training tokens	9,800,728,576

Training Data

The model was trained on the same curated 10B-token SoloLLM v3 dataset as the 150M final model:

Source	Accepted tokens	Share
FineWeb-Edu `sample-10BT`	4,000,001,532	40%
DCLM baseline	2,500,001,319	25%
FineWeb `sample-10BT`	1,499,997,774	15%
English Wikipedia	999,998,937	10%
OpenWebText	1,000,000,972	10%

Multiple-Choice Detail

Length-normalized accuracy:

Benchmark	GPT-2 small	SoloLLM v3 123M
HellaSwag	29.53%	29.85%
PIQA	63.60%	63.40%
ARC-Easy	40.35%	44.04%
ARC-Challenge	22.07%	24.08%
WinoGrande	49.72%	50.91%
Average	41.05%	42.46%

Files

File	Purpose
`model.safetensors`	Final model state dict
`config.json`	Model/training config used to instantiate `SoloGPT_v2`
`config_resolved.json`	Resolved run config from training
`metrics_summary.json`	Training summary for the final checkpoint
`model.py`	Minimal SoloGPT model implementation used by this checkpoint
`configuration_sologpt.py`	Hugging Face `AutoConfig` remote-code wrapper
`modeling_sologpt.py`	Hugging Face `AutoModelForCausalLM` remote-code wrapper
`tokenizer.json`	GPT-2 tokenizer used for training and inference
`tokenizer_config.json`	Tokenizer metadata with 1024-token context and EOS-as-pad
`load_example.py`	Example loading and sampling script
`docs/v3_final_gpt2_comparison.md`	Full final result writeup
`docs/project_page.md`	Short portfolio-style project page

Usage

This repo supports Hugging Face AutoModelForCausalLM loading through custom remote code. Pass trust_remote_code=True when loading the model.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "bmax16634/sologpt-v3-123m-base"

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(repo_id, trust_remote_code=True).to(device)
model.eval()

prompt = "The future of artificial intelligence is"
inputs = tokenizer(prompt, return_tensors="pt").to(device)

with torch.no_grad():
    output_ids = model.generate(
        **inputs,
        max_new_tokens=40,
        do_sample=True,
        temperature=0.8,
        top_k=40,
        use_cache=False,
        remove_invalid_values=True,
        renormalize_logits=True,
        pad_token_id=tokenizer.eos_token_id,
    )

print(tokenizer.decode(output_ids[0], skip_special_tokens=True))

For a runnable example, see load_example.py. For low-level state-dict loading, the raw PyTorch implementation is still included as model.py.

Intended Use

This model is intended for:

educational inspection of a smaller GPT-2-class base LM,
ablation comparison against sologpt-v3-150m-base,
text-completion experiments,
reproducing the SoloLLM v3 evaluation story.

It is not intended for production use, high-stakes decisions, factual QA, or chat/instruction-following use without additional tuning and safety evaluation.

Limitations

This is a small base model, not an assistant.
It can generate incorrect, biased, repetitive, or unsafe text.
It has no retrieval, tool use, or instruction tuning.
It does not beat GPT-2 small across every metric.
Training data came from broad public web/text sources and may contain undesirable content despite filtering.

Related Artifacts

Final best model: https://huggingface.co/bmax16634/sologpt-v3-150m-base
Public completion demo: https://huggingface.co/spaces/bmax16634/sologpt-v3-150m-demo
Legacy v1 baseline: https://huggingface.co/bmax16634/sologpt-base-v1

License

The SoloLLM code and released weights are provided under the MIT License by the author. Training data sources retain their own licenses and terms.

Downloads last month: 57

Safetensors

Model size

0.2B params

Tensor type

F32

Datasets used to train bmax16634/sologpt-v3-123m-base

Collection including bmax16634/sologpt-v3-123m-base

SoloLLM

Collection

SoloLLM project artifacts: final v3 GPT-2-class base model, smaller 123M ablation, public completion demo, and legacy v1 baseline. • 5 items • Updated 4 days ago

Evaluation results

Held-out perplexity on SoloLLM project held-out OpenWebText-style shards
self-reported

25.637
WikiText-2 perplexity on WikiText-2 test
self-reported

41.874
LAMBADA perplexity on LAMBADA
self-reported

36.278
LAMBADA last-word accuracy on LAMBADA
self-reported

0.328