Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/models-cards#model-card-metadata)

PhoBERT: Pre-trained language models for Vietnamese

Pre-trained PhoBERT models are the state-of-the-art language models for Vietnamese (Pho, i.e. "Phở", is a popular food in Vietnam):

  • Two PhoBERT versions of "base" and "large" are the first public large-scale monolingual language models pre-trained for Vietnamese. PhoBERT pre-training approach is based on RoBERTa which optimizes the BERT pre-training procedure for more robust performance.
  • PhoBERT outperforms previous monolingual and multilingual approaches, obtaining new state-of-the-art performances on four downstream Vietnamese NLP tasks of Part-of-speech tagging, Dependency parsing, Named-entity recognition and Natural language inference.

The general architecture and experimental results of PhoBERT can be found in our EMNLP-2020 Findings paper:

title     = {{PhoBERT: Pre-trained language models for Vietnamese}},
author    = {Dat Quoc Nguyen and Anh Tuan Nguyen},
journal   = {Findings of EMNLP},
year      = {2020}

Please CITE our paper when PhoBERT is used to help produce published results or is incorporated into other software.

For further information or requests, please go to PhoBERT's homepage!

Downloads last month
Hosted inference API
Mask token: <mask>
This model can be loaded on the Inference API on-demand.