facebook
/

multi-token-prediction

Model card Files Files and versions Community

Multi-token Prediction Models for Large Language Models: Code and Discussion

#2

by ashishpatel26 - opened Jun 19, 2024

Jun 19, 2024

•

edited Jun 19, 2024

Discussion Points:

Efficacy and Generalizability:

How do the multi-token models perform compared to baselines on various code-related tasks? Does the performance gap widen with increased data size (1T vs 200B tokens)?
Does the benefit of multi-token prediction extend beyond code to natural language tasks? Let's explore its generalizability across domains.

Efficiency Considerations:

While the paper suggests faster inference with multi-token models, the code primarily focuses on running inference. Let's discuss the potential trade-offs between training efficiency (time and resources) and inference speedup. Is the potential speed gain worth the additional training cost, if any?

Optimal Prediction Horizon (n):

The current implementation uses n=4. Is this the optimal value for all scenarios? How does varying n affect performance and efficiency for different tasks or model sizes? Let's explore the impact of this parameter.

Evaluation Metrics:

The code doesn't specify evaluation metrics. What metrics are most suitable to compare multi-token and baseline models, particularly for code-related tasks?

Jun 25, 2024

Good questions, but who answers?

Jul 12, 2024

I also wonder how well speculative decoding works with this + a larger single token prediction model. Might be where the real benefits are

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment