YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
- LATENTROUTE
- Repository Layout
- 1) Environment Setup
- 2) Wikipedia Training Pipeline
- 2) Run Smoke Test (End-to-End)
- 3) Run Unit Tests
- 4) Quick Local Hyperparameter Tuning
- 5) Start Ray Dashboard
- 6) Build Full Innovation Model + Recheck Connections
- 7) Current Status
- 8) Innovation Registry
- Innovation #1 β Adaptive Morphological Tokenizer (AMT)
- Innovation #2 β Factorized Embedding Decomposition with Dynamic k (FED-Dk)
- Innovation #3 β Adaptive RoPE with Learned Frequency Scaling (ARFS)
- Innovation #4 β Hierarchical Latent Compression with Residual Gating (HLCR)
- Innovation #5 β Hierarchical MoE Routing with Entropy Regularization
- 9) Cost Reduction Summary (Target)
- 10) Recommended Implementation Roadmap
- 11) Useful Entry Points
- Repository Layout
LATENTROUTE
A from-scratch LLM research prototype with the following implemented pipeline:
- Entropy-weighted BPE tokenizer (AMT-style)
- Factorized embeddings (FED / FED-Dk)
- RoPE + ARFS positional encoding
- MLA attention with HLCR-style latent compression/gating
- MoE FFN with hierarchical routing + entropy/load-balance loss
- RMSNorm pre-norm transformer blocks
- Training utilities (total loss, AdamW, cosine warmup, Ray Tune)
- Smoke tests and pipeline readiness checks
Repository Layout
src/tokenizerβ Standard BPE + entropy-weighted BPEsrc/embeddingβ FED, FED-Dk, RoPE, ARFSsrc/modelβ Transformer stack (MLA + MoE + RMSNorm)src/trainingβ Objectives, optimizer/scheduler, Ray Tune, pipeline registry/rechecktestsβ Pytest smoke testsscriptsβ Standalone smoke-test runner
1) Environment Setup
From project root:
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
2) Wikipedia Training Pipeline
New scripts have been added to support full-scale training:
scripts/prepare_wiki.py: Extracts Wikipedia articles into JSONL format (streaming).scripts/train_on_wiki.py: Trains the LatentRoute model on Wikipedia with all innovations enabled.src/model/hf_utils.py: Uploads your trained model to the Hugging Face Hub.
Example training:
source .venv/bin/activate
python scripts/train_on_wiki.py --total_steps 5000
2) Run Smoke Test (End-to-End)
source .venv/bin/activate
python scripts/run_smoke_test.py
Expected output includes:
Smoke test passedoverall_ready: True
3) Run Unit Tests
source .venv/bin/activate
PYTHONPATH=. pytest -q
4) Quick Local Hyperparameter Tuning
source .venv/bin/activate
PYTHONPATH=. python - <<'PY'
from src.training.ray_tune_train import run_quick_tune_local
best = run_quick_tune_local(num_samples=2)
print('Best config:', best.config)
print('Best loss:', best.metrics.get('loss'))
print('Best perplexity:', best.metrics.get('perplexity'))
PY
5) Start Ray Dashboard
source .venv/bin/activate
ray stop --force
ray start --head --dashboard-host=127.0.0.1 --dashboard-port=8265 --disable-usage-stats
Open:
Check health:
python - <<'PY'
import urllib.request
for u in ["http://127.0.0.1:8265", "http://127.0.0.1:8265/api/version"]:
with urllib.request.urlopen(u, timeout=5) as r:
print(u, r.status)
PY
6) Build Full Innovation Model + Recheck Connections
source .venv/bin/activate
PYTHONPATH=. python - <<'PY'
from src.training.pipeline import build_full_innovation_model, recheck_pipeline_connections
model = build_full_innovation_model(
vocab_size=512,
d_model=256,
n_layers=2,
n_heads=8,
max_seq_len=256,
n_experts=8,
d_c=64,
d_rope=16,
)
report = recheck_pipeline_connections(model)
print(report)
PY
7) Current Status
Implemented and validated:
- Tokenization, embedding, positional, attention, FFN/MoE, model integration
- Total training loss components:
L_CElambda_1 * L_routelambda_2 * L_aux
- Optimizer and scheduler utilities
- Ray Tune local quick tuning
- Ray dashboard operational
This repository is a research/training core, not yet a full production chat platform (no serving API, chat UI, auth, moderation, persistence, or deployment stack yet).
8) Innovation Registry
Innovation #1 β Adaptive Morphological Tokenizer (AMT)
- Phase: Tokenization (Phase 2)
- Formula:
- $P_{merge}(a,b)=freq(a,b)\cdot \exp\left(-H\left(\frac{p(c|ab)}{p(c|a)p(c|b)}\right)\right)$
- Impact: ~18% vocabulary reduction, better rare-word handling
- Patent basis: entropy-weighted BPE merge criterion
- Code location: src/tokenizer/bpe_entropy_weighted.py
Innovation #2 β Factorized Embedding Decomposition with Dynamic k (FED-Dk)
- Phase: Embedding (Phase 3)
- Formula:
- $k_i = k_{min} + (k_{max}-k_{min})\cdot \sigma(\alpha\log(freq_i))$
- Impact: up to ~93% embedding memory reduction
- Patent basis: per-token adaptive bottleneck dimension
- Code location: src/embedding/fed_dk.py
Innovation #3 β Adaptive RoPE with Learned Frequency Scaling (ARFS)
- Phase: Positional Encoding (Phase 4)
- Formula:
- $\theta_j^{(domain)} = \theta_j\cdot \exp(\gamma_j z_{domain})$
- Impact: better cross-domain context extension without full retraining
- Patent basis: domain-conditioned rotary scaling
- Code location: src/embedding/rope.py
Innovation #4 β Hierarchical Latent Compression with Residual Gating (HLCR)
- Phase: Attention / MLA (Phase 6)
- Formula:
- $g=\sigma(W_g[c_1;c_2;h])$
- $c_{final}=g\cdot c_1 + (1-g)\cdot proj(c_2)$
- Impact: major KV-cache reduction with adaptive per-token compression
- Patent basis: two-level latent KV compression with learned gating
- Code location: src/model/init.py
Innovation #5 β Hierarchical MoE Routing with Entropy Regularization
- Phase: MoE (Phase 7)
- Formula:
- $L_{entropy}=-\lambda\sum_e \bar{p}_e\log(\bar{p}_e)$
- Impact: reduced routing compute and improved expert utilization
- Patent basis: hierarchical $O(\sqrt{E})$ routing + entropy regularization
- Code location: src/model/init.py
9) Cost Reduction Summary (Target)
| Component | Standard | With innovations |
|---|---|---|
| Embedding memory (50K vocab) | ~800 MB | ~55 MB (up to -93%) |
| KV-cache (large-model scenario) | very high | strongly reduced via MLA/HLCR |
| MoE routing compute | $O(E)$ | hierarchical approx. $O(\sqrt{E})$ |
| Effective vocabulary | baseline | can reduce via entropy-weighted merges |
| Total inference cost | baseline | substantial reduction depending on scale |
10) Recommended Implementation Roadmap
- Month 1-2: 1B-scale prototype validation (quality + stability)
- Month 3-4: 7B tuning with Ray Tune (quality/cost Pareto)
- Month 5-6: long-context + preference alignment
- Month 7-8: scale-up, optimization hardening, and IP/legal packaging
11) Useful Entry Points
src/model/__init__.py:LLM,TransformerLM,MLAAttention,MoELayer
src/training/objectives.py:compute_language_model_loss
src/training/optim.py:create_adamw,CosineWithWarmup
src/training/ray_tune_train.py:run_quick_tune_local,build_tuner,train_llm_ray
src/training/pipeline.py:- innovation registry, cost summary, connection recheck
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support