LFM2.5-8B-A1B-Multichannel
A multi-stream fine-tune of LiquidAI/LFM2.5-8B-A1B. Instead of generating one sequential stream of tokens, the model generates four parallel channels at once, one token per channel per step:
| idx | channel | role |
|---|---|---|
| 0 | User | the incoming user message (input) |
| 1 | Output | the user-visible reply |
| 2 | Think | an internal analytical reasoning stream |
| 3 | Skeptical | an internal adversarial / error-checking stream |
It's a smaller, 4-channel reproduction of the idea in Multi-Stream LLMs (arXiv:2605.12460), ported onto LFM2.5-8B-A1B (a hybrid conv + GQA MoE, ~1.5B active params).
What makes it different
A row is [User, Output, Think, Skeptical], flattened row-major. Attention is
block-causal: a token at row r sees all of rows < r (every channel)
plus itself, never its same-row peers. So Output can read what Think and
Skeptical produced on earlier rows.
Loading & inference
This is a custom architecture — it needs the bundled modeling code
(stream_lfm2.py, block_causal_short_conv.py) and the multi-stream decode
loop (stream_cache.py). model.generate() does not apply; use the
StreamDecoder / row-by-row decode.
import torch
from transformers import AutoTokenizer
from transformers.models.lfm2_moe.modeling_lfm2_moe import Lfm2MoeConfig
from stream_lfm2 import StreamLfm2MoeForCausalLM # bundled in this repo
CHANNELS = ["User", "Output", "Think", "Skeptical"]
tok = AutoTokenizer.from_pretrained("Isolyth/LFM2.5-8B-A1B-Multichannel")
cfg = Lfm2MoeConfig.from_pretrained("Isolyth/LFM2.5-8B-A1B-Multichannel")
cfg.num_channels = 4
cfg.channel_names = CHANNELS
model = StreamLfm2MoeForCausalLM.from_pretrained(
"Isolyth/LFM2.5-8B-A1B-Multichannel",
config=cfg, torch_dtype=torch.bfloat16, device_map="auto").eval()
infer.py (bundled) is a runnable single-prompt example; server_lfm2.py +
chat.py give a live TUI with one pane per channel.
Decoding feeds the user message into the User channel one token per row; the
other channels generate at the same time.
To avoid the model becoming quiet, a penalty to the silence token is applied, controllable via the silency penalty parameter. I currently have this at one, which seems to be a sane default, but there may be a better option. There is a similar penalty for thinking, which is also set to 1 by default.
Training
- Base: LFM2.5-8B-A1B, fine-tuned from scratch (not continued) with LongCE active throughout.
- Data: a weighted mix — the stream-data paper corpus projected to 4 channels (×2), my own DeepSeek-synthesized 4-channel data (×1), and rollouts distilled from the 27B Stream-Qwen teacher (×0.5); then augmented (silence time-shifts, leading-silence for cold-start, concatenation for multi-turn).
- Objective: plain cross-entropy mixed (40%) with the LongCE-channel counterfactual reweighting (60%, γ=2), 4000 steps.
Evaluation
- GSM8K (sampled, multi-stream decode): ~13%.
Limitations
- Can be repetitive / over-talk at high silence penalties.
- No tool calling etc
License & attribution
Derivative of LiquidAI/LFM2.5-8B-A1B, released under the LFM Open License
v1.0 (see LICENSE), which includes a non-commercial / commercial-use
threshold. "LFM2.5" appears here only to describe the model's origin; this is an
independent fine-tune, not an official LiquidAI release. Method and the
stream-data corpus are from Multi-Stream LLMs (arXiv:2605.12460).
Thanks to Field Level Tech (https://fieldleveltech.org/) for access to compute for training this model. Without access to FLT's compute, this project would have not been possible.
- Downloads last month
- 1