Instructions to use VishalMysore/RecursiveMAS-0.5B-MLC with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLC-LLM
How to use VishalMysore/RecursiveMAS-0.5B-MLC with MLC-LLM:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
RecursiveMAS-0.5B-MLC
A WebGPU / WebLLM build of Qwen2.5-0.5B-Instruct,
compiled from source with MLC-LLM (quantization q4f16_1) and
patched to expose its last-layer hidden states. It runs entirely in the browser — no server, no API key —
and is built for latent multi-agent experiments inspired by the paper
Recursive Multi-Agent Systems.
Build pipeline & sources: https://github.com/vishalmysore/recursiveMASWebLLM
What's special
Standard WebLLM chat models only expose input_ids → logits. This build adds two functions to the
compiled module so latent state can be read/looped between agents (the RecursiveMAS "RecursiveLink" idea):
get_last_hidden(input_embed, kv_cache) → (hidden_states, kv_cache)decode_last_hidden(input_embed, kv_cache) → (hidden_states, kv_cache)
It also works as an ordinary chat backbone.
Files
| File | What |
|---|---|
params_shard_*.bin, ndarray-cache.json |
q4f16_1 quantized weights |
mlc-chat-config.json |
MLC chat config |
tokenizer.json, vocab.json, merges.txt, tokenizer_config.json |
tokenizer |
libs/RecursiveMAS-0.5B-q4f16_1-webgpu.wasm |
the WebGPU model library |
Usage (WebLLM, in the browser)
import * as webllm from "@mlc-ai/web-llm";
const appConfig = {
model_list: [{
model: "https://huggingface.co/VishalMysore/RecursiveMAS-0.5B-MLC",
model_id: "recursivemas-0.5b",
model_lib: "https://huggingface.co/VishalMysore/RecursiveMAS-0.5B-MLC/resolve/main/libs/RecursiveMAS-0.5B-q4f16_1-webgpu.wasm",
}],
};
const engine = await webllm.CreateMLCEngine("recursivemas-0.5b", { appConfig });
const r = await engine.chat.completions.create({ messages: [{ role: "user", content: "Hello!" }] });
console.log(r.choices[0].message.content);
How it was built
Built from source against mlc-llm v0.19.0 (the last release before the apache-tvm-ffi
migration), TVM compiled with LLVM, model definition patched via
expose_hidden.py,
then mlc_llm compile --device webgpu. The full, reproducible pipeline (and a Colab notebook) is in the
recursiveMASWebLLM repo.
⚠️ Version compatibility
The .wasm model library was compiled with mlc-llm v0.19.0. WebGPU model libraries are tied to the
runtime version, so load it with a compatible @mlc-ai/web-llm build — if you hit a "model lib
version" error, pin @mlc-ai/web-llm to the version matching mlc-llm v0.19.0 (or recompile the .wasm
against your runtime's version).
License
Apache-2.0, inheriting the base model Qwen2.5-0.5B-Instruct. This is a research/educational artifact.
- Downloads last month
- -