Graphcore/lxmert-gqa-uncased

BERT (Bidirectional Encoder Representations from Transformers) is a transformers model which is designed to pretrain bidirectional representations from unlabeled texts. It enables easy and fast fine-tuning for different downstream task such as Sequence Classification, Named Entity Recognition, Question Answering, Multiple Choice and MaskedLM.

It was trained with two objectives in pretraining : Masked language modeling(MLM) and Next sentence prediction(NSP). First, MLM is different from traditional LM which sees the words one after another while BERT allows the model to learn a bidirectional representation. In addition to MLM, NSP is used for jointly pertaining text-pair representations.

It reduces the need of many engineering efforts for building task specific architectures through pre-trained representation. And achieves state-of-the-art performance on a large suite of sentence-level and token-level tasks.

Model description

LXMERT is a transformer model for learning vision-and-language cross-modality representations. It has a Transformer model that has three encoders: object relationship encoder, a language encoder, and a cross-modality encoder. It is pretrained via a combination of masked language modelling, visual-language text alignment, ROI-feature regression, masked visual-attribute modeling, masked visual-object modelling, and visual-question answering objectives. It achieves the state-of-the-art results on VQA anad GQA.

Paper link : LXMERT: Learning Cross-Modality Encoder Representations from Transformers

Intended uses & limitations

This model is a fine-tuned version of unc-nlp/lxmert-base-uncased on the Graphcore/gqa-lxmert dataset. It achieves the following results on the evaluation set:

  • Loss: 1.9326
  • Accuracy: 0.5934

Training and evaluation data

Training procedure

Trained on 16 Graphcore Mk2 IPUs using optimum-graphcore.

Command line:

python examples/question-answering/run_vqa.py \
  --model_name_or_path unc-nlp/lxmert-base-uncased \
  --ipu_config_name Graphcore/lxmert-base-ipu \
  --dataset_name Graphcore/gqa-lxmert \
  --do_train \
  --do_eval \
  --max_seq_length 512 \
  --per_device_train_batch_size 1 \
  --num_train_epochs 4 \
  --dataloader_num_workers 64 \
  --logging_steps 5 \
  --learning_rate 1e-5 \
  --lr_scheduler_type linear \
  --loss_scaling 16384 \
  --weight_decay 0.01 \
  --warmup_ratio 0.1 \
  --output_dir /tmp/gqa/ \
  --dataloader_drop_last \
  --replace_qa_head \
  --pod_type pod16

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: IPU
  • total_train_batch_size: 64
  • total_eval_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 4.0
  • training precision: Mixed Precision

Training results

***** train metrics *****
  "epoch": 4.0,
  "train_loss": 0.6123406731570221,
  "train_runtime": 29986.2288,
  "train_samples": 943000,
  "train_samples_per_second": 125.791,
  "train_steps_per_second": 1.965

***** eval metrics *****
  "eval_accuracy": 0.5933514030612245,
  "eval_loss": 1.9326171875,
  "eval_samples": 12576,

Framework versions

  • Transformers 4.18.0.dev0
  • Pytorch 1.10.0+cpu
  • Datasets 2.0.0
  • Tokenizers 0.11.6
Downloads last month
5
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train Graphcore/lxmert-gqa-uncased

Evaluation results