Recognai
/

selectra_small

Inference Endpoints

Model card Files Files and versions Community

selectra_small / README.md

David

Update README.md

6864ed8 almost 3 years ago

|

3.13 kB

	---
	language:
	- es
	thumbnail: "url to a thumbnail used in social sharing"
	tags:
	- tag1
	- tag2
	license: apache-2.0
	datasets:
	- oscar
	metrics:
	- metric1
	- metric2
	---

	# SELECTRA: A Spanish ELECTRA

	SELECTRA is a Spanish pre-trained language model based on [ELECTRA](https://github.com/google-research/electra).
	We release a `small` and `medium` version with the following configuration:

	\| Model \| Layers \| Embedding/Hidden Size \| Params \| Vocab Size \| Max Sequence Length \| Cased \|
	\| --- \| --- \| --- \| --- \| --- \| --- \| --- \|
	\| SELECTRA small \| 12 \| 256 \| 22M \| 50k \| 512 \| True \|
	\| SELECTRA medium \| 12 \| 384 \| 41M \| 50k \| 512 \| True \|

	Selectra small is about 5 times smaller than BETO but achieves comparable results (see Metrics section below).

	## Usage



	```python
	from transformers import ElectraForPreTraining, ElectraTokenizerFast

	discriminator = ElectraForPreTraining.from_pretrained("Recognai/selectra_small")
	tokenizer = ElectraTokenizerFast.from_pretrained("Recognai/selectra_small")

	sentence_with_fake_token = "Estamos desayunando pan rosa con tomate y aceite de oliva."

	inputs = tokenizer.encode(sentence_with_fake_token, return_tensors="pt")
	logits = discriminator(inputs).logits.tolist()[0]

	print("\t".join(tokenizer.tokenize(sentence_with_fake_token)))
	print("\t".join(map(lambda x: str(x)[:4], logits[1:-1])))
	"""Output:
	Estamos desayun ##ando pan rosa con tomate y aceite de oliva .
	-3.1 -3.6 -6.9 -3.0 0.19 -4.5 -3.3 -5.1 -5.7 -7.7 -4.4 -4.2
	"""
	```

	- Links to our zero-shot-classifiers

	## Metrics

	We fine-tune our models on 4 different down-stream tasks:

	- [XNLI](https://huggingface.co/datasets/xnli)
	- [PAWS-X](https://huggingface.co/datasets/paws-x)
	- [CoNLL2002 - POS](https://huggingface.co/datasets/conll2002)
	- [CoNLL2002 - NER](https://huggingface.co/datasets/conll2002)

	We provide the mean and standard deviation of 5 fine-tuning runs.

	The metrics


	\| Model \| CoNLL2002 - POS (acc) \| CoNLL2002 - NER (f1) \| PAWS-X (acc) \| XNLI (acc) \| Params \|
	\| --- \| --- \| --- \| --- \| --- \| --- \|
	\| SELECTRA small \| 0.9653 +- 0.0007 \| 0.863 +- 0.004 \| 0.896 +- 0.002 \| 0.784 +- 0.002 \| 22M \|
	\| SELECTRA medium \| 0.9677 +- 0.0004 \| 0.870 +- 0.003 \| 0.896 +- 0.002 \| 0.804 +- 0.002 \| 41M \|
	\| [mBERT](https://huggingface.co/bert-base-multilingual-cased) \| 0.9689 \| 0.8616 \| 0.8895 \| 0.7606 \| 178M \|
	\| [BETO](https://huggingface.co/dccuchile/bert-base-spanish-wwm-cased) \| 0.9693 \| 0.8596 \| 0.8720 \| 0.8012 \| 110M \|
	\| [BSC-BNE](https://huggingface.co/BSC-TeMU/roberta-base-bne) \| 0.9706 \| 0.8764 \| 0.8815 \| 0.7771 \| 125M \|
	\| [Bertin](https://huggingface.co/bertin-project/bertin-roberta-base-spanish/tree/v1-512) \| 0.9697 \| 0.8707 \| 0.8965 \| 0.7843 \| 125M \|


	## Training

	- Link to our repo

	## Motivation

	Despite the abundance of excelent Spanish language models (BETO, bertin, etc) we felt there was still a lack of distilled or compact models with comparable metrics to their bigger siblings.

	## Acknowledgment

	This research was supported by the use of the Google TPU Research Cloud (TRC).

	## Authors