distilbert-base-ur-cased

We are sharing smaller versions of distilbert-base-multilingual-cased that handle a custom number of languages.

Our versions give exactly the same representations produced by the original model which preserves the original accuracy.

For more information please visit our paper: Load What You Need: Smaller Versions of Multilingual BERT.

How to use

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("Geotrend/distilbert-base-ur-cased")
model = AutoModel.from_pretrained("Geotrend/distilbert-base-ur-cased")

To generate other smaller versions of multilingual transformers please visit our Github repo.

How to cite

@inproceedings{smallermdistilbert,
  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
  booktitle={SustaiNLP / EMNLP},
  year={2020}
}

Contact

Please contact amine@geotrend.fr for any question, feedback or request.

Downloads last month
16
Safetensors
Model size
50.2M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train Geotrend/distilbert-base-ur-cased