BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Published on Oct 11, 2018

Authors:

Abstract

BERT is a bidirectional transformer-based model that pre-trains on unlabeled text and fine-tunes for various NLP tasks, achieving state-of-the-art results across multiple benchmarks.

AI-generated summary

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).

BERT: Transforming NLP with Deep Bidirectional Transformers

Links 🔗:

👉 Subscribe: https://www.youtube.com/@Arxflix
👉 Twitter: https://x.com/arxflix
👉 LMNT (Partner): https://lmnt.com/

By Arxflix

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Abstract

Community

BERT: Transforming NLP with Deep Bidirectional Transformers

Links 🔗:

Models citing this paper 290

Datasets citing this paper 4

Spaces citing this paper 2,961

Collections including this paper 37