license: apache-2.0
tags:
- generated_from_trainer
metrics:
- accuracy
model-index:
- name: bert-base-cased-PLANE-ood-2
results: []
language:
- en
pipeline_tag: text-classification
widget:
- text: A fake smile is a smile
- text: An alleged thief is an alleged criminal
- text: A small cat is an animal
- text: A small cat is a small mammal
datasets:
- lorenzoscottb/PLANE-ood
BERT for PLANE classification
This model is a fine-tuned version of bert-base-cased on one of the PLANE's dataset split (no.2), introduced in Bertolini et al., COLING 2022 It achieves the following results on the evaluation set:
- Accuracy: 0.9043
Model description
The model is trained to perform a sequence classification task over phrase-level adjective-noun inferences (e.g., "A red car is a vehicle").
Intended uses & limitations
The scope of the model is not to run lexical entailment (i.e., hypernym detection). The model is trained solely to perform a very specific subset of phrase-level entailment, based on adjective-nouns phrases. The type of question you should ask the model are limited, and should have one of three forms:
- An Adjective-Noun is a Noun (e.g. A red car is a car)
- An Adjective-Noun is a Hypernym(Noun) (e.g. A red car is a vehicle)
- An Adjective-Noun is a Adjective-Hypernym(Noun) (e.g. A red car is a red vehicle)
Linguistically speaking, adjectives belong to three macro classes (intersective, subsective, and intensional). From a linguistic and logical stand, these class shape the truth value of the three forms above. For instance, since red is an intersective adjective, the three from are all true. A subjective adjective like small allows just the first two, but not the last – that is, logically speaking, a small car is not a small vehicle.
In other words, the model was built to study out-of-distribution compositional generalisation with respect to a very specific set of compositional phenomena.
This poses clear limitations to the question you can ask the model. For instance, if you had to query the model with a basic (false) hypernym detection task (e.g., A dog is a cat), the model will consider it as true.
Training and evaluation data
The data used for training and testing, as well as the other splits used for the experiments, are available on the paper's git page here. The reported accuracy reference to out-of-distribution evaluation. that is, the model was tested to perform text classification as presented but on unknown adjectives and nouns.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 1
Framework versions
- Transformers 4.25.1
- Pytorch 1.12.1
- Datasets 2.5.1
- Tokenizers 0.12.1
Cite
if you want to use the model or data in your work please reference the paper too
@inproceedings{bertolini-etal-2022-testing,
title = "Testing Large Language Models on Compositionality and Inference with Phrase-Level Adjective-Noun Entailment",
author = "Bertolini, Lorenzo and
Weeds, Julie and
Weir, David",
booktitle = "Proceedings of the 29th International Conference on Computational Linguistics",
month = oct,
year = "2022",
address = "Gyeongju, Republic of Korea",
publisher = "International Committee on Computational Linguistics",
url = "https://aclanthology.org/2022.coling-1.359",
pages = "4084--4100",
}