CMBATRM / README.md
Viharikvs's picture
Model card updated after epoch 1
3363076 verified
metadata
base_model: t5-small
tags:
  - trm
  - act
  - recursive
  - text-generation
  - wikitext
metrics:
  - loss
  - lm_loss
  - ponder_loss
  - perplexity_lm

TRM-Text1 (ACT)

TRM-Text1 (ACT) is a causal language model based on a Tiny Recursive Reasoning Model (TRM) with Adaptive Computation Time (ACT) for per-token variable depth.

  • Architecture: TRM (causal) + ACT halting
  • Training Data: wikitext-103-raw-v1
  • Tokenizer: t5-small (SentencePiece)
  • Vocab Size: 32100
  • Objective: Causal Language Modeling (next-token)
  • Seq Len: 1024

Note: This model uses the T5 SentencePiece tokenizer. Perplexity numbers on WT103 reported here are not directly comparable to GPT-2 BPE-based PPLs.

Latest Performance (Epoch 1)

  • Validation Loss: 4.8248
  • Validation LM Loss: 4.8149
  • Validation Ponder Loss: 1.0064
  • Validation Perplexity (LM-only): 123.34