Smaller variant

#3
by viktor-ferenczi - opened

The original blog post says:

Furthermore, in a space-constrained environment, the 70k unused embeddings (corresponding to reserved tokens) could be removed from the input/output embedding matrices. This would reduce the model size by approximately 570M parameters.

I suggest having such a reduced version as well available on HF.

I wish, but not sure the checkpoints were released

Sign up or log in to comment