liamcripwell
commited on
Commit
•
81b7f75
1
Parent(s):
44e11cc
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,45 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
---
|
5 |
+
|
6 |
+
# LED_para Document Simplification Model
|
7 |
+
|
8 |
+
This is a pretrained version of the document simplification model presented in the EACL 2023 paper "Context-Aware Document Simplification".
|
9 |
+
It is an end-to-end system based on the Longformer encoder-decoder that operates at the paragraph-level.
|
10 |
+
|
11 |
+
## How to use
|
12 |
+
It is recommended to use the [plan_simp](https://github.com/liamcripwell/plan_simp/tree/main) library to interface with the model.
|
13 |
+
|
14 |
+
Here is how to use this model in PyTorch:
|
15 |
+
|
16 |
+
```python
|
17 |
+
from plan_simp.models.bart import load_simplifier
|
18 |
+
|
19 |
+
simplifier, tokenizer, hparams = load_simplifier("liamcripwell/ledpara")
|
20 |
+
|
21 |
+
text = "<RL_3> Turing has an extensive legacy with statues of him and many things named after him, including an annual award for computer science innovations. He appears on the current Bank of England £50 note, which was released on 23 June 2021, to coincide with his birthday. A 2019 BBC series, as voted by the audience, named him the greatest person of the 20th century."
|
22 |
+
inputs = tokenizer(text, return_tensors="pt")
|
23 |
+
outputs = model.generate(**inputs, num_beams=5)
|
24 |
+
```
|
25 |
+
|
26 |
+
Generation and evaluation can also be run from the terminal.
|
27 |
+
|
28 |
+
```bash
|
29 |
+
python plan_simp/scripts/generate.py inference
|
30 |
+
--model_ckpt=liamcripwell/ledpara
|
31 |
+
--test_file=<test_data>
|
32 |
+
--reading_lvl=s_level
|
33 |
+
--out_file=<output_csv>
|
34 |
+
|
35 |
+
python plan_simp/scripts/eval_simp.py
|
36 |
+
--input_data=newselaauto_docs_test.csv
|
37 |
+
--output_data=test_out_ledpara.csv
|
38 |
+
--x_col=complex_str
|
39 |
+
--r_col=simple_str
|
40 |
+
--y_col=pred
|
41 |
+
--doc_id_col=pair_id
|
42 |
+
--prepro=True
|
43 |
+
--sent_level=True
|
44 |
+
```
|
45 |
+
|