Edit model card

ELECTRICIDAD: The Spanish Electra Imgur

Electricidad-base-generator (uncased) is a base Electra like model (generator in this case) trained on a + 20 GB of the OSCAR Spanish corpus.

As mentioned in the original paper: ELECTRA is a new method for self-supervised language representation learning. It can be used to pre-train transformer networks using relatively little compute. ELECTRA models are trained to distinguish "real" input tokens vs "fake" input tokens generated by another neural network, similar to the discriminator of a GAN. At small scale, ELECTRA achieves strong results even when trained on a single GPU. At large scale, ELECTRA achieves state-of-the-art results on the SQuAD 2.0 dataset.

For a detailed description and experimental results, please refer the paper ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators.

Fast example of usage 🚀

from transformers import pipeline

fill_mask = pipeline(
    "fill-mask",
    model="mrm8488/electricidad-base-generator",
    tokenizer="mrm8488/electricidad-base-generator"
)

print(
    fill_mask(f"HuggingFace está creando {fill_mask.tokenizer.mask_token} que la comunidad usa para resolver tareas de NLP.")
)

# Output: [{'sequence': '[CLS] huggingface esta creando herramientas que la comunidad usa para resolver tareas de nlp. [SEP]', 'score': 0.0896105170249939, 'token': 8760, 'token_str': 'herramientas'}, ...]

Acknowledgments

I thank 🤗/transformers team for allowing me to train the model (specially to Julien Chaumond).

Created by Manuel Romero/@mrm8488

Made with in Spain

Downloads last month
7
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.