HoLo 6.5.1 — byte-native multimodal research model (59M)

Proof-of-operation weights for HoLo 6.5.1: a tokenizer-free, byte-native, decoder-only prefix-LM (dim 512 / 8 layers / 8 heads, 59.4M params) running on the public non-learned 27-D HSL encoder (pip install hsl-embedding, MIT). Trained on a single RTX 4070. Not a benchmark-superiority claim — the release exists so every "it works" claim is reproducible.

Live demo: https://holo-demo-p5txmh4dda-as.a.run.app
Code: https://github.com/Woojiggun/holo-hsl
Project card: https://huggingface.co/spaces/ggunio/holo-demo-space

Files

file	stage	golden numbers
`holo651_s1_text_30k.pt`	S1 text backbone (EN+KO, 30k steps)	text 1.632 bpb / knowledge-domain 1.689
`holo651_s2_chat_know_12k.pt`	S2 chat + knowledge SFT (2KB context, 12k)	text 1.538 / chat 1.107 / grounding gap 0.120
`holo651_s3_multimodal_10k.pt`	S3 multimodal (video windows, 10k)	text 1.528 / video 4.575 / grounding gap 1.835

Grounding gap = extra bits/byte the model pays when its disk-retrieved facts are swapped for wrong ones (know_abl_bpb − know_bpb). It grew 0.001 → 1.835 across training: the model measurably READS its disk memory instead of memorizing (facts live in a disk store, patterns in weights).

Usage

pip install "hsl-embedding>=0.5.0" torch
git clone https://github.com/Woojiggun/holo-hsl

from holo_generate import load, gen_text   # from the repo (Train/)
m, cfg = load("holo651_s3_multimodal_10k.pt", device="cuda")
out = gen_text(m, "The universe is ".encode(), n_new=120, temperature=0.7,
               origin_anchor=cfg["origin_anchor"])
print(out.decode("utf-8", "replace"))

Training data & why the license is NC

data	role	license
FineWeb-Edu (EN) + Korean Wikipedia	text backbone	ODC-By / CC-BY-SA 4.0
Project Gutenberg classics (philosophy etc.)	knowledge store + canon mix	Public domain (US)
Korean chat corpora (incl. GPT-derived sets)	S2 chat SFT	mixed, parts NC / model-derived
Open-movie video streams (Blender films)	S3 multimodal	CC-BY 3.0

Because the S2/S3 stages include chat data that is partly model-derived / non-commercial, these weights are released for research use under CC-BY-NC-SA 4.0. Attribution: Korean Wikipedia (CC-BY-SA), Blender Foundation open movies (CC-BY).

Honest limitations

A newborn research model: text generation is rough, chat is shallow, generated frames/audio are toy quality (16×16 gray, 8 kHz mu-law). Free-running quality varies by checkpoint (see the repo notes on cosine-tail selection). No safety tuning of any kind — research use only.

Citation

Jinhyun Woo, HoLo: A Feasibility Study of Change-Rate-Based Multimodal Unification — DOI: 10.5281/zenodo.20581805

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support