Language &

by agomberto - opened


Just a question, can we use it to fine-tune like the miniLMv1 ? What is the licence ?

Is it the one with this readme:

Small and fast pre-trained models for language understanding and generation

***** New June 9, 2021: MiniLM v2 release *****

MiniLM v2: the pre-trained models for the paper entitled "MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers". We generalize deep self-attention distillation in MiniLMv1 by using self-attention relation distillation for task-agnostic compression of pre-trained Transformers. The proposed method eliminates the restriction on the number of student’s attention heads. Our monolingual and multilingual small models distilled from different base and large size teacher models achieve competitive performance.

[Multilingual] Pre-trained Models

Model Teacher Model Speedup #Param XNLI (Acc) MLQA (F1)
L12xH384 mMiniLMv2 XLMR-Large 2.7x 117M 72.9 64.9
L6xH384 mMiniLMv2 XLMR-Large 5.3x 107M 69.3 59.0

We compress XLMR-Large into 12-layer and 6-layer models with 384 hidden size and report the zero-shot performance on XNLI and MLQA test set.


And also, is it planed to have it in tensorflow ? If not, is it a way I can change it from pytorch to tensorflow myself and upload it ? Thanks :)

Sign up or log in to comment