Looped‑Transformer‑24M

A compact 24M‑parameter language model built with a Looped Transformer architecture, trained using the Muon optimizer, and enhanced with Chain‑of‑Thought reasoning. This model specializes in story generation and basic math reasoning, making it ideal for lightweight experimentation, educational projects, and rapid prototyping.

🧩 Model Overview

Parameters: 24M
Architecture: Looped Transformer
Optimizer: Muon
Reasoning: Chain‑of‑Thought (CoT)
Training Data: Story‑focused corpus
Math Skills: Simple arithmetic
Training Hardware: Dual NVIDIA T4 GPUs
Training Time: ~30 minutes
Language: English
License: AGPL‑3.0

✨ What This Model Does Well

Story generation — coherent, imaginative, character‑driven narratives
Dialogue writing — natural conversational flow
Basic math — simple arithmetic and step‑by‑step reasoning
CoT reasoning — improved logical flow when prompted
Lightweight inference — runs smoothly on consumer GPUs and many CPUs

📚 Training Details

The model was trained for 30 minutes on two NVIDIA T4 GPUs, using a curated dataset of short stories, narrative prompts, character interactions, and basic math word problems.
The Muon optimizer provided fast, stable convergence, making it well‑suited for small‑parameter models.

🧠 Intended Use

This model is designed for:

Creative writing
Story generation
Dialogue simulation
Educational demos
Lightweight reasoning tasks

Not recommended for:

Factual retrieval
Complex mathematics
Safety‑critical applications

Website

https://gugu8intel-i9.github.io/atom-1_website/

🚀 Example Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("your-username/looped-transformer-24m")
model = AutoModelForCausalLM.from_pretrained("your-username/looped-transformer-24m")

prompt = "Write a short story about a robot learning to dream."

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Downloads last month: -

Safetensors

Model size

24.2M params

Tensor type

F32