nreimers/mMiniLMv2-L12-H384-distilled-from-XLMR-Large

Hey,

Just a question, can we use it to fine-tune like the miniLMv1 ? What is the licence ?

Is it the one with this readme:

Small and fast pre-trained models for language understanding and generation

***** New June 9, 2021: MiniLM v2 release *****

MiniLM v2: the pre-trained models for the paper entitled "MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers". We generalize deep self-attention distillation in MiniLMv1 by using self-attention relation distillation for task-agnostic compression of pre-trained Transformers. The proposed method eliminates the restriction on the number of student’s attention heads. Our monolingual and multilingual small models distilled from different base and large size teacher models achieve competitive performance.

[Multilingual] Pre-trained Models

Model	Teacher Model	Speedup	#Param	XNLI (Acc)	MLQA (F1)
L12xH384 mMiniLMv2	XLMR-Large	2.7x	117M	72.9	64.9
L6xH384 mMiniLMv2	XLMR-Large	5.3x	107M	69.3	59.0

We compress XLMR-Large into 12-layer and 6-layer models with 384 hidden size and report the zero-shot performance on XNLI and MLQA test set.

Arnault

nreimers
/

mMiniLMv2-L12-H384-distilled-from-XLMR-Large

Language & Readme.md