BERTabaporu: a genre-specific pre-trained model of Portuguese-speaking social media
Having the same architecture of [Bert] we trained our model from scratch following BERT pre-training procedure. And has been built from a collection of about 238 million tweets written by over 100 thousand unique Twitter users, and conveying over 2.9 billion words in total.
from transformers import AutoTokenizer # Or BertTokenizer from transformers import AutoModelForPreTraining # Or BertForPreTraining for loading pretraining heads from transformers import AutoModel # or BertModel, for BERT without pretraining heads model = AutoModelForPreTraining.from_pretrained('pablocosta/bertabaporu-base-uncased') tokenizer = AutoTokenizer.from_pretrained('pablocosta/bertabaporu-base-uncased')
- Downloads last month