Looped‑Transformer‑24M

A compact 24M‑parameter language model built with a Looped Transformer architecture, trained using the Muon optimizer, and enhanced with Chain‑of‑Thought reasoning. This model specializes in story generation and basic math reasoning, making it ideal for lightweight experimentation, educational projects, and rapid prototyping.


🧩 Model Overview

  • Parameters: 24M
  • Architecture: Looped Transformer
  • Optimizer: Muon
  • Reasoning: Chain‑of‑Thought (CoT)
  • Training Data: Story‑focused corpus
  • Math Skills: Simple arithmetic
  • Training Hardware: Dual NVIDIA T4 GPUs
  • Training Time: ~30 minutes
  • Language: English
  • License: AGPL‑3.0

✨ What This Model Does Well

  • Story generation — coherent, imaginative, character‑driven narratives
  • Dialogue writing — natural conversational flow
  • Basic math — simple arithmetic and step‑by‑step reasoning
  • CoT reasoning — improved logical flow when prompted
  • Lightweight inference — runs smoothly on consumer GPUs and many CPUs

📚 Training Details

The model was trained for 30 minutes on two NVIDIA T4 GPUs, using a curated dataset of short stories, narrative prompts, character interactions, and basic math word problems.
The Muon optimizer provided fast, stable convergence, making it well‑suited for small‑parameter models.


🧠 Intended Use

This model is designed for:

  • Creative writing
  • Story generation
  • Dialogue simulation
  • Educational demos
  • Lightweight reasoning tasks

Not recommended for:

  • Factual retrieval
  • Complex mathematics
  • Safety‑critical applications

Website

https://gugu8intel-i9.github.io/atom-1_website/


🚀 Example Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("your-username/looped-transformer-24m")
model = AutoModelForCausalLM.from_pretrained("your-username/looped-transformer-24m")

prompt = "Write a short story about a robot learning to dream."

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month
-
Safetensors
Model size
24.2M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support