About the number of training tokens

by Enigrand - opened Jun 14, 2024

Discussion

Enigrand

Jun 14, 2024

•

edited Jun 15, 2024

Hi,

The comparison between the hybrid model and transformer are trained with 3.5t tokens as described in your paper, which is not consistent with the naming here.

Is there something I'm missing?

rwaleffe changed discussion status to closed Jun 25, 2024

rwaleffe changed discussion status to open Jun 25, 2024

rwaleffe

NVIDIA org Jun 25, 2024

You aren't missing anything. These models were trained for 3.5T tokens as described in the paper. The 3.5T tokens has just been shortened to "3t" here for the naming.

rwaleffe changed discussion status to closed Jun 25, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment