Back to all models
fill-mask mask_token:
Query this model
πŸ”₯ This model is currently loaded and running on the Inference API. ⚠️ This model could not be loaded by the inference API. ⚠️ This model can be loaded on the Inference API on-demand.
JSON Output
API endpoint  

⚑️ Upgrade your account to access the Inference API

							$
							curl -X POST \
-H "Authorization: Bearer YOUR_ORG_OR_USER_API_TOKEN" \
-H "Content-Type: application/json" \
-d '"json encoded string"' \
https://api-inference.huggingface.co/models/funnel-transformer/large-base
Share Copied link to clipboard

Monthly model downloads

funnel-transformer/large-base funnel-transformer/large-base
713 downloads
last 30 days

pytorch

tf

Contributed by

funnel-transformer university
1 team member Β· 10 models

How to use this model directly from the πŸ€—/transformers library:

			
Copy to clipboard
from transformers import AutoTokenizer, AutoModelWithLMHead tokenizer = AutoTokenizer.from_pretrained("funnel-transformer/large-base") model = AutoModelWithLMHead.from_pretrained("funnel-transformer/large-base")

Funnel Transformer large model (B8-8-8 without decoder)

Pretrained model on English language using a similar objective objective as ELECTRA. It was introduced in this paper and first released in this repository. This model is uncased: it does not make a difference between english and English.

Disclaimer: The team releasing Funnel Transformer did not write a model card for this model so this model card has been written by the Hugging Face team.

Model description

Funnel Transformer is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts.

More precisely, a small language model corrupts the input texts and serves as a generator of inputs for this model, and the pretraining objective is to predict which token is an original and which one has been replaced, a bit like a GAN training.

This way, the model learns an inner representation of the English language that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard classifier using the features produced by the BERT model as inputs.

Note: This model does not contain the decoder, so it ouputs hidden states that have a sequence length of one fourth of the inputs. It's good to use for tasks requiring a summary of the sentence (like sentence classification) but not if you need one input per initial token. You should use the large model in that case.

Intended uses & limitations

You can use the raw model to extract a vector representation of a given text, but it's mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions on a task that interests you.

Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering. For tasks such as text generation you should look at model like GPT2.

How to use

Here is how to use this model to get the features of a given text in PyTorch:

from transformers import FunnelTokenizer, FunnelBaseModel
tokenizer = FunnelTokenizer.from_pretrained("funnel-transformer/large-base")
model = FunnelBaseModel.from_pretrained("funnel-transformer/large-base")
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)

and in TensorFlow:

from transformers import FunnelTokenizer, TFFunnelBaseModel
tokenizer = FunnelTokenizer.from_pretrained("funnel-transformer/large-base")
model = TFFunnelBaseModel.from_pretrained("funnel-transformer/large-base")
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='tf')
output = model(encoded_input)

Training data

The BERT model was pretrained on:

BibTeX entry and citation info

@misc{dai2020funneltransformer,
    title={Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing},
    author={Zihang Dai and Guokun Lai and Yiming Yang and Quoc V. Le},
    year={2020},
    eprint={2006.03236},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}