davidheineman
commited on
Commit
•
4bec69e
1
Parent(s):
b547a07
Add model
Browse files- README.md +44 -0
- checkpoints/epoch=3-step=1460-val_kendall=0.409.ckpt +3 -0
- hparams.yaml +40 -0
README.md
CHANGED
@@ -1,3 +1,47 @@
|
|
1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
license: apache-2.0
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
datasets:
|
5 |
+
- simpeval
|
6 |
+
tags:
|
7 |
+
- simplification
|
8 |
license: apache-2.0
|
9 |
---
|
10 |
+
|
11 |
+
This contains the trained checkpoint for LENS-SALSA, as introduced in [**Dancing Between Success and Failure: Edit-level Simplification Evaluation using SALSA**](https://arxiv.org/abs/2305.14458). For more information, please refer to the [**SALSA repository**](https://github.com/davidheineman/salsa).
|
12 |
+
|
13 |
+
```bash
|
14 |
+
pip install lens-metric
|
15 |
+
```
|
16 |
+
|
17 |
+
```python
|
18 |
+
from lens import download_model
|
19 |
+
from lens.lens_salsa import LENS_SALSA
|
20 |
+
|
21 |
+
model_path = download_model("davidheineman/lens-salsa")
|
22 |
+
lens_salsa = LENS_SALSA(model_path)
|
23 |
+
|
24 |
+
score = lens_salsa.score(
|
25 |
+
complex = [
|
26 |
+
"They are culturally akin to the coastal peoples of Papua New Guinea."
|
27 |
+
],
|
28 |
+
simple = [
|
29 |
+
"They are culturally similar to the people of Papua New Guinea."
|
30 |
+
]
|
31 |
+
)
|
32 |
+
```
|
33 |
+
|
34 |
+
## Intended uses
|
35 |
+
|
36 |
+
Our model is intented to be used for **reference-free simplification evaluation**. Given a source text and its translation, outputs a single score between 0 and 1 where 1 represents a perfect simplification and 0 a random simplification. LENS-SALSA was trained on edit annotations of the SimpEval dataset, which covers manually-written, complex Wikipedia simplifications. We have not evaluated our model on non-English languages or non-Wikipedia domains.
|
37 |
+
|
38 |
+
## Cite SALSA
|
39 |
+
If you find our paper, code or data helpful, please consider citing [**our work**](https://arxiv.org/abs/2305.14458):
|
40 |
+
```tex
|
41 |
+
@article{heineman2023dancing,
|
42 |
+
title={Dancing {B}etween {S}uccess and {F}ailure: {E}dit-level {S}implification {E}valuation using {SALSA}},
|
43 |
+
author = "Heineman, David and Dou, Yao and Xu, Wei",
|
44 |
+
journal={arXiv preprint arXiv:2305.14458},
|
45 |
+
year={2023}
|
46 |
+
}
|
47 |
+
```
|
checkpoints/epoch=3-step=1460-val_kendall=0.409.ckpt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:971a9c705c90bb97fe85e73211aa8ca2beff7e7f438395d2ac86403a4960c0b3
|
3 |
+
size 1419010479
|
hparams.yaml
ADDED
@@ -0,0 +1,40 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
activations: Tanh
|
2 |
+
batch_size: 4
|
3 |
+
class_identifier: unified_metric
|
4 |
+
continuous_word_labels: false
|
5 |
+
dropout: 0.15
|
6 |
+
encoder_learning_rate: 1.0e-05
|
7 |
+
encoder_model: RoBERTa
|
8 |
+
final_activation: null
|
9 |
+
hidden_sizes:
|
10 |
+
- 384
|
11 |
+
initalize_pretrained_unified_weights: true
|
12 |
+
input_segments:
|
13 |
+
- edit_id_simplified
|
14 |
+
- edit_id_original
|
15 |
+
keep_embeddings_frozen: true
|
16 |
+
layer: mix
|
17 |
+
layer_norm: true
|
18 |
+
layer_transformation: sparsemax
|
19 |
+
layerwise_decay: 0.95
|
20 |
+
learning_rate: 3.1e-05
|
21 |
+
load_pretrained_weights: true
|
22 |
+
loss: mse
|
23 |
+
loss_lambda: 0.9
|
24 |
+
nr_frozen_epochs: 0.3
|
25 |
+
optimizer: AdamW
|
26 |
+
pool: avg
|
27 |
+
pretrained_model: roberta-large
|
28 |
+
score_target: lens_score
|
29 |
+
sent_layer: mix
|
30 |
+
span_targets:
|
31 |
+
- edit_id_simplified
|
32 |
+
- edit_id_original
|
33 |
+
span_tokens:
|
34 |
+
- bad
|
35 |
+
warmup_steps: 0
|
36 |
+
word_layer: 24
|
37 |
+
word_level_training: true
|
38 |
+
word_weights:
|
39 |
+
- 0.1
|
40 |
+
- 0.9
|