Instructions to use bmax16634/sologpt-v3-123m-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use bmax16634/sologpt-v3-123m-base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="bmax16634/sologpt-v3-123m-base", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("bmax16634/sologpt-v3-123m-base", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use bmax16634/sologpt-v3-123m-base with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "bmax16634/sologpt-v3-123m-base" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bmax16634/sologpt-v3-123m-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/bmax16634/sologpt-v3-123m-base
- SGLang
How to use bmax16634/sologpt-v3-123m-base with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "bmax16634/sologpt-v3-123m-base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bmax16634/sologpt-v3-123m-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "bmax16634/sologpt-v3-123m-base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bmax16634/sologpt-v3-123m-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use bmax16634/sologpt-v3-123m-base with Docker Model Runner:
docker model run hf.co/bmax16634/sologpt-v3-123m-base
SoloLLM v3 123M Base
SoloLLM v3 123M Base is the smaller-than-GPT-2 ablation from the SoloLLM v3 project. It is a from-scratch GPT-style decoder-only base language model trained on one RTX 3090.
This is not the final best SoloLLM checkpoint. The final best model is
bmax16634/sologpt-v3-150m-base. This 123M model is published because it is
slightly smaller than GPT-2 small and documents the strict smaller-model test.
Bottom Line
The 123M model is slightly smaller than GPT-2 small and beats GPT-2 on most external checks, but it does not beat GPT-2 across every metric. It loses the project held-out perplexity comparison and some fixed-prompt generation diversity/repetition diagnostics.
| Model | Params | Train tokens | Held-out PPL | WikiText-2 PPL | LAMBADA PPL | MC avg acc norm |
|---|---|---|---|---|---|---|
| GPT-2 small | 124.44M | public | 25.32 | 45.32 | 40.62 | 41.05% |
| SoloLLM v3 123M | 123.55M | 9.80B | 25.64 | 41.87 | 36.28 | 42.46% |
| SoloLLM v3 150M | 151.87M | 10.00B | 24.90 | 41.18 | 35.35 | 42.71% |
The honest read:
The 123M model is a strong smaller-than-GPT-2 ablation, but it does not prove that a smaller model beats GPT-2 small across the board.
Model Details
| Item | Value |
|---|---|
| Architecture | Decoder-only GPT-style transformer |
| Parameters | 123,551,232 |
| Context length | 1024 |
| Tokenizer | GPT-2 tokenizer |
| Embedding width | 768 |
| Layers | 12 |
| Attention heads | 12 |
| Positional method | RoPE |
| Normalization | RMSNorm |
| MLP | SwiGLU |
| Weight tying | Input/output embeddings tied |
| Training hardware | Single RTX 3090 |
| Training tokens | 9,800,728,576 |
Training Data
The model was trained on the same curated 10B-token SoloLLM v3 dataset as the 150M final model:
| Source | Accepted tokens | Share |
|---|---|---|
FineWeb-Edu sample-10BT |
4,000,001,532 | 40% |
| DCLM baseline | 2,500,001,319 | 25% |
FineWeb sample-10BT |
1,499,997,774 | 15% |
| English Wikipedia | 999,998,937 | 10% |
| OpenWebText | 1,000,000,972 | 10% |
Multiple-Choice Detail
Length-normalized accuracy:
| Benchmark | GPT-2 small | SoloLLM v3 123M |
|---|---|---|
| HellaSwag | 29.53% | 29.85% |
| PIQA | 63.60% | 63.40% |
| ARC-Easy | 40.35% | 44.04% |
| ARC-Challenge | 22.07% | 24.08% |
| WinoGrande | 49.72% | 50.91% |
| Average | 41.05% | 42.46% |
Files
| File | Purpose |
|---|---|
model.safetensors |
Final model state dict |
config.json |
Model/training config used to instantiate SoloGPT_v2 |
config_resolved.json |
Resolved run config from training |
metrics_summary.json |
Training summary for the final checkpoint |
model.py |
Minimal SoloGPT model implementation used by this checkpoint |
configuration_sologpt.py |
Hugging Face AutoConfig remote-code wrapper |
modeling_sologpt.py |
Hugging Face AutoModelForCausalLM remote-code wrapper |
tokenizer.json |
GPT-2 tokenizer used for training and inference |
tokenizer_config.json |
Tokenizer metadata with 1024-token context and EOS-as-pad |
load_example.py |
Example loading and sampling script |
docs/v3_final_gpt2_comparison.md |
Full final result writeup |
docs/project_page.md |
Short portfolio-style project page |
Usage
This repo supports Hugging Face AutoModelForCausalLM loading through custom
remote code. Pass trust_remote_code=True when loading the model.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
repo_id = "bmax16634/sologpt-v3-123m-base"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(repo_id, trust_remote_code=True).to(device)
model.eval()
prompt = "The future of artificial intelligence is"
inputs = tokenizer(prompt, return_tensors="pt").to(device)
with torch.no_grad():
output_ids = model.generate(
**inputs,
max_new_tokens=40,
do_sample=True,
temperature=0.8,
top_k=40,
use_cache=False,
remove_invalid_values=True,
renormalize_logits=True,
pad_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
For a runnable example, see load_example.py. For low-level state-dict loading,
the raw PyTorch implementation is still included as model.py.
Intended Use
This model is intended for:
- educational inspection of a smaller GPT-2-class base LM,
- ablation comparison against
sologpt-v3-150m-base, - text-completion experiments,
- reproducing the SoloLLM v3 evaluation story.
It is not intended for production use, high-stakes decisions, factual QA, or chat/instruction-following use without additional tuning and safety evaluation.
Limitations
- This is a small base model, not an assistant.
- It can generate incorrect, biased, repetitive, or unsafe text.
- It has no retrieval, tool use, or instruction tuning.
- It does not beat GPT-2 small across every metric.
- Training data came from broad public web/text sources and may contain undesirable content despite filtering.
Related Artifacts
- Final best model: https://huggingface.co/bmax16634/sologpt-v3-150m-base
- Public completion demo: https://huggingface.co/spaces/bmax16634/sologpt-v3-150m-demo
- Legacy v1 baseline: https://huggingface.co/bmax16634/sologpt-base-v1
License
The SoloLLM code and released weights are provided under the MIT License by the author. Training data sources retain their own licenses and terms.
- Downloads last month
- 57
Datasets used to train bmax16634/sologpt-v3-123m-base
HuggingFaceFW/fineweb-edu
HuggingFaceFW/fineweb
Collection including bmax16634/sologpt-v3-123m-base
Evaluation results
- Held-out perplexity on SoloLLM project held-out OpenWebText-style shardsself-reported25.637
- WikiText-2 perplexity on WikiText-2 testself-reported41.874
- LAMBADA perplexity on LAMBADAself-reported36.278
- LAMBADA last-word accuracy on LAMBADAself-reported0.328