Noun-Collapse β€” Wikipedia word embeddings from pure vector collapse

Word embeddings learned with nothing but a collapse dynamical system β€” no MLP, no attention, no output layer, no pretrained embeddings. The entire model is one 256-d well per word, a start state, and two scalars (pull strength, readout temperature). ~25.6M numbers total; ~99% of them are the well table.

Meaning is read out of the geometry: the same wells that pull a state during encoding are the vectors you look up as embeddings.

Part of Livnium. Honest by design β€” the number below is reported against real baselines, not a chance floor.

How it was trained

CBOW-style fill-in-the-blank, executed by the collapse engine instead of a net: for every noun occurrence, a state is collapsed through the noun's Β±5-word ordered context and must end up pointing at the missing noun (sampled-softmax cross-entropy over nouns). The update law, applied once per context word:

h ← h βˆ’ strength Β· (1 βˆ’ cos(h, W)) Β· norm(h βˆ’ W)
  • Data: English Wikipedia, 5M lines (7.5% of the corpus).
  • Signal: 94.75M noun occurrences, one streaming pass.
  • Compute: ~3.2 h on an Apple-silicon MacBook (MPS).
  • Nouns: WordNet noun lexicon; 100k-word context vocab, 23,758 noun targets.

Because the context is read as an ordered collapse trajectory (not a bag), word order is physically encoded β€” unlike CBOW/PPMI.

Quality β€” SimLex-999 (similarity, not association)

model data SimLex-999 ρ (nouns)
this model 7.5% of Wikipedia, noun-only 0.362 (662/666 pairs)
word2vec / GloVe (published) full Wikipedia+Gigaword ~0.37–0.44
PPMI+SVD (reference) full corpus ~0.38

Within the word2vec/GloVe band on a fraction of the data, with no neural network.

Speed (M-series MacBook)

  • Embed one 10-word context: 0.23 ms on CPU.
  • Bulk: 2.3M words/s on MPS at batch 1024.
  • Nearest-noun query vs 23,758 wells: 0.48 ms.

Usage

pip install torch huggingface_hub
hf download chetanxpatil/noun-collapse --local-dir noun-collapse
cd noun-collapse
from modeling_noun_collapse import NounCollapse

m = NounCollapse.from_pretrained("noun_collapse_pure.pt")

m.vector("physics")            # 256-d unit embedding of a word
m.similarity("cat", "dog")     # cosine similarity
m.neighbors("india", k=8)      # nearest nouns
m.encode(["a cat sat on the mat"])   # collapse a sentence -> one state vector

Example neighbors:

cat     -> tabby dog pet felis mouse stray feline
physics -> chemistry mathematics astronomy quantum mechanics astrophysics
war     -> vietnam outbreak world cold ii boer veteran
india   -> gujarat pakistan nepal sikkim delhi bombay punjab bengal

Files

  • noun_collapse_pure.pt β€” the checkpoint (wells, stoi, noun_ids, start, strength, temp, config).
  • modeling_noun_collapse.py β€” standalone loader/encoder (torch only).
  • config.json β€” architecture metadata.

Limitations (read before citing)

  • Similarity, not logic. It learns that cat and animal are close, not that a cat is an animal. No facts, no hierarchy, no negation.
  • Frequency-bound. Common nouns have sharp neighborhoods; rare nouns stay near their random init.
  • 7.5% of Wikipedia, single pass, no LR schedule β€” headroom remains; this is the honest first result, not a tuned ceiling.
  • Whole-word vocab (no subwords): out-of-vocab words have no vector.

License

PolyForm Noncommercial 1.0.0 β€” free for individuals, students, researchers, nonprofits. Commercial use requires a paid license. See the Livnium repo.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train chetanxpatil/noun-collapse