tda-bert-en-cola / README.md
iproskurina's picture
Update README.md
6c29e40
metadata
license: apache-2.0
tags:
  - TDA
metrics:
  - accuracy
  - matthews_correlation
model-index:
  - name: bert-base-cased-en-cola_32_3e-05_lr_0.01_decay_balanced
    results: []
datasets:
  - shivkumarganesh/CoLA
language:
  - en
widget:
  - text: The book was by John written.
  - text: The ship sank, but I don't know with what.
  - text: Everyone relies on someone. It's unclear who.
  - text: The book what inspired them was very long.
  - text: I want goes to the store.
  - text: I wonder whom us to trust.

Official repository

BERT-TDA

This model is a version of bert-base-cased fine-tuned on CoLA. It achieves the following results on the evaluation set:

  • Loss: 0.6809
  • Accuracy: 0.8501
  • Mcc: 0.6337

Features extracted from Transformer

The features extracted from attention maps include the following:

  1. Topological features are properties of attention graphs. Features of directed attention graphs include the number of strongly connected components, edges, simple cycles and average vertex degree. The properties of undirected graphs include the first two Betti numbers: the number of connected components and the number of simple cycles, the matching number and the chordality.

  2. Features derived from barcodes include descriptive characteristics of 0/1-dimensional barcodes and reflect the survival (death and birth) of connected components and edges throughout the filtration.

  3. Distance-to-pattern features measure the distance between attention matrices and identity matrices of pre-defined attention patterns, such as attention to the first token [CLS] and to the last [SEP] of the sequence, attention to previous and next token and to punctuation marks.

The computed features and barcodes can be found in the subdirectories of the repository. test_sub features and barcodes were computed on the out of domain test CoLA dataset. Refer to notebooks 4* and 5* from the repository to construct the classification pipeline with TDA features.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • train_batch_size: 32
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3.0

Framework versions

  • Transformers 4.23.0.dev0
  • Pytorch 1.12.1+cu113
  • Datasets 2.5.1
  • Tokenizers 0.13.0