SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper
•
2502.02737
•
Published
•
80
None defined yet.
mamba
is now available in transformers. Thanks to
@tridao
and
@albertgu
for this brilliant model! 🚀 and the amazing mamba-ssm
kernels powering this!return_timestamps=True
helps reduce hallucinations, particularly when doing long-form evaluation with Transformers’ “chunked” algorithm. The cat sat on the on the on the mat.
<|0.00|> The cat sat on the on the on the mat.<|5.02|>
<|0.00|> The cat sat on the mat.<|5.02|>