Difference between distilled and the original?

by ghosthamlet - opened

Thanks for this great model.
The original model: https://huggingface.co/facebook/nllb-200-1.3B has a same size file pytorch_model.bin as this distilled version,
then what is the difference between these two model?

As I understand it (from the paper) this is a 1.3B parameters model distilled from the full 54B NLLB-200 model. it gives better results then 1.3 B dense (Table 41 in the paper).

Thanks for the reply.

ghosthamlet changed discussion status to closed

Sign up or log in to comment