davebulaval
/

MeaningBERT

Text Classification

Transformers

Safetensors

bert

Inference Endpoints

Model card Files Files and versions Community

davebulaval commited on Mar 23, 2024

Commit

131851c

verified ·

1 Parent(s): e2eaeef

Update README.md

Browse files

Files changed (1) hide show

README.md +65 -9

README.md CHANGED Viewed

@@ -18,38 +18,84 @@ Its goal is to assess meaning preservation between two sentences that correlate
 checks. For more details, refer to our publicly available article.
 > This public version of our model uses the best model trained (where in our article, we present the performance results
-> of an average of 10 models) for a more extended period (1,000 epochs instead of 250). We have observed later that the
-> model can further reduce dev loss and increase performance.
 ## Sanity Check
 Correlation to human judgment is one way to evaluate the quality of a meaning preservation metric.
-However, it is inherently subjective, since it uses human judgment as a gold standard, and expensive, since it requires
 a large dataset
 annotated by several humans. As an alternative, we designed two automated tests: evaluating meaning preservation between
 identical sentences (which should be 100% preserving) and between unrelated sentences (which should be 0% preserving).
 In these tests, the meaning preservation target value is not subjective and does not require human annotation to
-measure. They represent a trivial and minimal threshold a good automatic meaning preservation metric should be able to
 achieve. Namely, a metric should be minimally able to return a perfect score (i.e., 100%) if two identical sentences are
 compared and return a null score (i.e., 0%) if two sentences are completely unrelated.
-### Identical sentences
 The first test evaluates meaning preservation between identical sentences. To analyze the metrics' capabilities to pass
 this test, we count the number of times a metric rating was greater or equal to a threshold value X∈[95, 99] and divide
-it by the number of sentences to create a ratio of the number of times the metric gives the expected rating. To account
 for computer floating-point inaccuracy, we round the ratings to the nearest integer and do not use a threshold value of
 100%.
-### Unrelated sentences
 Our second test evaluates meaning preservation between a source sentence and an unrelated sentence generated by a large
 language model.3 The idea is to verify that the metric finds a meaning preservation rating of 0 when given a completely
 irrelevant sentence mainly composed of irrelevant words (also known as word soup). Since this test's expected rating is
 0, we check that the metric rating is lower or equal to a threshold value X∈[5, 1].
-Again, to account for computer floating-point inaccuracy, we round the ratings to the nearest integer and do not use a
 a threshold value of 0%.
 ## Cite
 Use the following citation to cite MeaningBERT
@@ -67,7 +113,17 @@ ISSN={2624-8212},
 }
 ```
 ## License
 MeaningBERT is MIT licensed, as found in
-the [LICENSE file](https://github.com/GRAAL-Research/risc/blob/main/LICENSE).

 checks. For more details, refer to our publicly available article.
 > This public version of our model uses the best model trained (where in our article, we present the performance results
+> of an average of 10 models) for a more extended period (500 epochs instead of 250). We have observed later that the
+> model can further reduce dev loss and increase performance. Also, we have changed the data augmentation technique used
+> in the article for a more robust one, that also includes the commutative property of the meaning function. Namely, Meaning(Sent_a, Sent_b) = Meaning(Sent_b, Sent_a).
+- [HuggingFace Model Card](https://huggingface.co/davebulaval/MeaningBERT)
 ## Sanity Check
 Correlation to human judgment is one way to evaluate the quality of a meaning preservation metric.
+However, it is inherently subjective, since it uses human judgment as a gold standard, and expensive since it requires
 a large dataset
 annotated by several humans. As an alternative, we designed two automated tests: evaluating meaning preservation between
 identical sentences (which should be 100% preserving) and between unrelated sentences (which should be 0% preserving).
 In these tests, the meaning preservation target value is not subjective and does not require human annotation to
+be measured. They represent a trivial and minimal threshold a good automatic meaning preservation metric should be able to
 achieve. Namely, a metric should be minimally able to return a perfect score (i.e., 100%) if two identical sentences are
 compared and return a null score (i.e., 0%) if two sentences are completely unrelated.
+### Identical Sentences
 The first test evaluates meaning preservation between identical sentences. To analyze the metrics' capabilities to pass
 this test, we count the number of times a metric rating was greater or equal to a threshold value X∈[95, 99] and divide
+It is calculated by the number of sentences to create a ratio of the number of times the metric gives the expected rating. To account
 for computer floating-point inaccuracy, we round the ratings to the nearest integer and do not use a threshold value of
 100%.
+### Unrelated Sentences
 Our second test evaluates meaning preservation between a source sentence and an unrelated sentence generated by a large
 language model.3 The idea is to verify that the metric finds a meaning preservation rating of 0 when given a completely
 irrelevant sentence mainly composed of irrelevant words (also known as word soup). Since this test's expected rating is
 0, we check that the metric rating is lower or equal to a threshold value X∈[5, 1].
+Again, to account for computer floating-point inaccuracy, we round the ratings to the nearest integer and do not use
 a threshold value of 0%.
+## Use MeaningBERT
+You can use MeaningBERT as a [model](https://huggingface.co/davebulaval/MeaningBERT) that you can retrain or use for
+inference using the following with HuggingFace
+```python
+# Load model directly
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+tokenizer = AutoTokenizer.from_pretrained("davebulaval/MeaningBERT")
+model = AutoModelForSequenceClassification.from_pretrained("davebulaval/MeaningBERT")
+```
+or you can use MeaningBERT as a metric for evaluation (no retrain) using the following with HuggingFace
+## Code Examples
+```python
+import torch
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+tokenizer = AutoTokenizer.from_pretrained("davebulaval/MeaningBERT")
+scorer = AutoModelForSequenceClassification.from_pretrained("davebulaval/MeaningBERT")
+scorer.eval()
+documents = ["He wanted to make them pay.", "This sandwich looks delicious.", "He wants to eat."]
+simplifications = ["He wanted to make them pay.", "This sandwich looks delicious.",
+                   "Whatever, whenever, this is a sentence."]
+# We tokenize the text as a pair and return Pytorch Tensors
+tokenize_text = tokenizer(documents, simplifications, truncation=True, padding=True, return_tensors="pt")
+with torch.no_grad():
+    # We process the text
+    scores = scorer(**tokenize_text)
+print(scores.logits.tolist())
+```
+------------------
 ## Cite
 Use the following citation to cite MeaningBERT
 }
 ```
+------------------
+## Contributing to MeaningBERT
+We welcome user input, whether it regards bugs found in the library or feature propositions! Make sure to have a
+look at our [contributing guidelines](https://github.com/GRAAL-Research/MeaningBERT/blob/main/.github/CONTRIBUTING.md)
+for more details on this matter.
 ## License
 MeaningBERT is MIT licensed, as found in
+the [LICENSE file](https://github.com/GRAAL-Research/risc/blob/main/LICENSE).
+------------------