YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Mixture of Attentions for Speculative Decoding

This checkpoint was obtained from "Mixture of Attentions For Speculative Decoding" by Matthieu Zimmer*, Milan Gritta*, Gerasimos Lampouras, Haitham Bou Ammar, and Jun Wang. The paper introduces a novel architecture for speculative decoding that enhances the speed of large language model (LLM) inference.

It is supported in vLLM see our Github repository.

Checkpoints

Base Model MOA Spec on Hugging Face Base Model Parameters MOA Spec Parameters
meta-llama/Meta-Llama-3-8B-Instruct huawei-noah/MOASpec-Llama-3-8B-Instruct 8B 0.25B

Citation

If you use this code or this checkpoint in your research, please cite our paper:

@misc{zimmer2024mixtureattentionsspeculativedecoding,
      title={Mixture of Attentions For Speculative Decoding}, 
      author={Matthieu Zimmer and Milan Gritta and Gerasimos Lampouras and Haitham Bou Ammar and Jun Wang},
      year={2024},
      eprint={2410.03804},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.03804}, 
}

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Disclaimer: This open source project is not an official Huawei product, Huawei is not expected to provide support for this project.

Downloads last month
0
Safetensors
Model size
264M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .