About the number of training tokens

#1
by Enigrand - opened

Hi,

The comparison between the hybrid model and transformer are trained with 3.5t tokens as described in your paper, which is not consistent with the naming here.

Is there something I'm missing?

rwaleffe changed discussion status to closed
rwaleffe changed discussion status to open
NVIDIA org

You aren't missing anything. These models were trained for 3.5T tokens as described in the paper. The 3.5T tokens has just been shortened to "3t" here for the naming.

rwaleffe changed discussion status to closed

Sign up or log in to comment