is this model comparing the accuracy of a smaller model?

#1
by ultraleow - opened

Distilbert aims to optimize the training by reducing the size of BERT and increase the speed of BERT β€” all while trying to retain as much performance as possible. Specifically, Distilbert is 40% smaller than the original BERT-base model, is 60% faster than it, and retains 97% of its functionality.

Thus, is it appropriate?

When we released the model, Distilbert-SST2 was the standard sentiment model in Huggingface. Our benchmark shows that our (larger) model is indeed more accurate, even if inference is somewhat slower. So when choosing models, you may consider this trade-off between our more accurate but slower model, or the somewhat faster but less accurate Distilbert model, depending on your use case. Since our model is already trained, the higher computation cost for training is not relevant anymore for your decision. Hope this helps!

Thanks, I'm now able to understand the initiative behind of it

ultraleow changed discussion status to closed

Sign up or log in to comment