Model Card for SmolLM360M-IT-ConvFill_mlx_q8
This model card implements a finetune of HuggingFaceTB/SmolLM2-360M-Instruct for the conversational infill task described in the ConvFill paper, and is a INT8 MLX quantized version of vysri/SmolLM360M-IT-ConvFill.
Model Details
This model should be used respecting the original license of the base model, HuggingFaceTB/SmolLM2-360M-Instruct. The dataset that was used to finetune this model can be found here.
Model Description
Deploying responsive, multi-turn conversational voice agents with large language models poses a critical challenge: cloud-based foundation models utilize reasoning, information retrieval, and tool use for high-value tasks, but introduce latency that disrupts natural conversation. In contrast, small models can respond quickly but lack capabilities needed in real-world tasks. We propose conversational infill, a task where a small, local model generates prompt, contextually appropriate dialogue and seamlessly incorporates delayed, external knowledge produced in parallel by a foundation model backend. This finetune trains HuggingFaceTB/SmolLM2-360M-Instruct to perform the conversational infill task.
- Finetuned from model: HuggingFaceTB/SmolLM2-360M-Instruct
- License: Apache 2.0
Model Sources [optional]
- Repository: https://github.com/vysri/conversational-infill
- Paper: Thinking While Speaking: Inference-Time Knowledge Transfer for Responsive and Intelligent Conversational Voice Agents
- Demo: TBD
Direct Use
This model is intended to be used with the infrastructure in the ConvFill repository.
Bias, Risks, and Limitations
This model is not explicitly tuned for guardrailed behavior. Please use with caution.
How to Get Started with the Model
Use the code in the ConvFill repository to get started with this model.
Training Data
A link to the training data for this model can be found here. The dataset generation procedure can be found here. Information on training procedures can be found in the ConvFill paper. Training code and scripts can be found in the ConvFill repository.
Citation
@misc{srinivas2026thinkingspeakinginferencetimeknowledge,
title={Thinking While Speaking: Inference-Time Knowledge Transfer for Responsive and Intelligent Conversational Voice Agents},
author={Vidya Srinivas and Zachary Englhardt and Shwetak Patel and Vikram Iyer},
year={2026},
eprint={2511.07397},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2511.07397},
}
- Downloads last month
- 13
Model tree for vysri/SmolLM360M-IT-ConvFill_mlx_q8
Base model
HuggingFaceTB/SmolLM2-360M