BERT for PLANE classification

This model is a fine-tuned version of bert-base-cased on one of the PLANE's dataset split (no.2), introduced in Bertolini et al., COLING 2022 It achieves the following results on the evaluation set:

Accuracy: 0.9043

Model description

The model is trained to perform a sequence classification task over phrase-level adjective-noun inferences (e.g., "A red car is a vehicle").

Intended uses & limitations

The scope of the model is not to run lexical entailment (i.e., hypernym detection). The model is trained solely to perform a very specific subset of phrase-level entailment, based on adjective-nouns phrases. The type of question you should ask the model are limited, and should have one of three forms:

An Adjective-Noun is a Noun (e.g. A red car is a car)
An Adjective-Noun is a Hypernym(Noun) (e.g. A red car is a vehicle)
An Adjective-Noun is a Adjective-Hypernym(Noun) (e.g. A red car is a red vehicle)

Linguistically speaking, adjectives belong to three macro classes (intersective, subsective, and intensional). From a linguistic and logical stand, these class shape the truth value of the three forms above. For instance, since red is an intersective adjective, the three from are all true. A subjective adjective like small allows just the first two, but not the last – that is, logically speaking, a small car is not a small vehicle.

In other words, the model was built to study out-of-distribution compositional generalisation with respect to a very specific set of compositional phenomena.

This poses clear limitations to the question you can ask the model. For instance, if you had to query the model with a basic (false) hypernym detection task (e.g., A dog is a cat), the model will consider it as true.

Training and evaluation data

The data used for training and testing, as well as the other splits used for the experiments, are available on the paper's git page here. The reported accuracy reference to out-of-distribution evaluation. that is, the model was tested to perform text classification as presented but on unknown adjectives and nouns.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 1

Framework versions

Transformers 4.25.1
Pytorch 1.12.1
Datasets 2.5.1
Tokenizers 0.12.1

Cite

if you want to use the model or data in your work please reference the paper too

@inproceedings{bertolini-etal-2022-testing,
    title = "Testing Large Language Models on Compositionality and Inference with Phrase-Level Adjective-Noun Entailment",
    author = "Bertolini, Lorenzo  and
      Weeds, Julie  and
      Weir, David",
    booktitle = "Proceedings of the 29th International Conference on Computational Linguistics",
    month = oct,
    year = "2022",
    address = "Gyeongju, Republic of Korea",
    publisher = "International Committee on Computational Linguistics",
    url = "https://aclanthology.org/2022.coling-1.359",
    pages = "4084--4100",
}

lorenzoscottb
/

bert-base-cased-PLANE-ood-2