LFM2.5-8B-A1B-Multichannel

A multi-stream fine-tune of LiquidAI/LFM2.5-8B-A1B. Instead of generating one sequential stream of tokens, the model generates four parallel channels at once, one token per channel per step:

idx	channel	role
0	User	the incoming user message (input)
1	Output	the user-visible reply
2	Think	an internal analytical reasoning stream
3	Skeptical	an internal adversarial / error-checking stream

It's a smaller, 4-channel reproduction of the idea in Multi-Stream LLMs (arXiv:2605.12460), ported onto LFM2.5-8B-A1B (a hybrid conv + GQA MoE, ~1.5B active params).

What makes it different

A row is [User, Output, Think, Skeptical], flattened row-major. Attention is block-causal: a token at row r sees all of rows < r (every channel) plus itself, never its same-row peers. So Output can read what Think and Skeptical produced on earlier rows.

Loading & inference

This is a custom architecture — it needs the bundled modeling code (stream_lfm2.py, block_causal_short_conv.py) and the multi-stream decode loop (stream_cache.py). model.generate() does not apply; use the StreamDecoder / row-by-row decode.

import torch
from transformers import AutoTokenizer
from transformers.models.lfm2_moe.modeling_lfm2_moe import Lfm2MoeConfig
from stream_lfm2 import StreamLfm2MoeForCausalLM   # bundled in this repo

CHANNELS = ["User", "Output", "Think", "Skeptical"]
tok = AutoTokenizer.from_pretrained("Isolyth/LFM2.5-8B-A1B-Multichannel")
cfg = Lfm2MoeConfig.from_pretrained("Isolyth/LFM2.5-8B-A1B-Multichannel")
cfg.num_channels = 4
cfg.channel_names = CHANNELS
model = StreamLfm2MoeForCausalLM.from_pretrained(
    "Isolyth/LFM2.5-8B-A1B-Multichannel",
    config=cfg, torch_dtype=torch.bfloat16, device_map="auto").eval()

infer.py (bundled) is a runnable single-prompt example; server_lfm2.py + chat.py give a live TUI with one pane per channel.

Decoding feeds the user message into the User channel one token per row; the other channels generate at the same time.

To avoid the model becoming quiet, a penalty to the silence token is applied, controllable via the silency penalty parameter. I currently have this at one, which seems to be a sane default, but there may be a better option. There is a similar penalty for thinking, which is also set to 1 by default.

Training

Base: LFM2.5-8B-A1B, fine-tuned from scratch (not continued) with LongCE active throughout.
Data: a weighted mix — the stream-data paper corpus projected to 4 channels (×2), my own DeepSeek-synthesized 4-channel data (×1), and rollouts distilled from the 27B Stream-Qwen teacher (×0.5); then augmented (silence time-shifts, leading-silence for cold-start, concatenation for multi-turn).
Objective: plain cross-entropy mixed (40%) with the LongCE-channel counterfactual reweighting (60%, γ=2), 4000 steps.

Evaluation

GSM8K (sampled, multi-stream decode): ~13%.

Limitations

Can be repetitive / over-talk at high silence penalties.
No tool calling etc

License & attribution

Derivative of LiquidAI/LFM2.5-8B-A1B, released under the LFM Open License v1.0 (see LICENSE), which includes a non-commercial / commercial-use threshold. "LFM2.5" appears here only to describe the model's origin; this is an independent fine-tune, not an official LiquidAI release. Method and the stream-data corpus are from Multi-Stream LLMs (arXiv:2605.12460).

Thanks to Field Level Tech (https://fieldleveltech.org/) for access to compute for training this model. Without access to FLT's compute, this project would have not been possible.

Downloads last month: 1

Safetensors

Model size

8B params

Tensor type

F32

BF16

Model tree for Eriskii/LFM2.5-8B-A1B-Multichannel

Base model

LiquidAI/LFM2.5-8B-A1B-Base

Finetuned

LiquidAI/LFM2.5-8B-A1B

Finetuned

(20)

this model

Paper for Eriskii/LFM2.5-8B-A1B-Multichannel

Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs

Paper • 2605.12460 • Published May 12 • 17