liamcripwell commited on
Commit
81b7f75
1 Parent(s): 44e11cc

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -0
README.md ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ ---
5
+
6
+ # LED_para Document Simplification Model
7
+
8
+ This is a pretrained version of the document simplification model presented in the EACL 2023 paper "Context-Aware Document Simplification".
9
+ It is an end-to-end system based on the Longformer encoder-decoder that operates at the paragraph-level.
10
+
11
+ ## How to use
12
+ It is recommended to use the [plan_simp](https://github.com/liamcripwell/plan_simp/tree/main) library to interface with the model.
13
+
14
+ Here is how to use this model in PyTorch:
15
+
16
+ ```python
17
+ from plan_simp.models.bart import load_simplifier
18
+
19
+ simplifier, tokenizer, hparams = load_simplifier("liamcripwell/ledpara")
20
+
21
+ text = "<RL_3> Turing has an extensive legacy with statues of him and many things named after him, including an annual award for computer science innovations. He appears on the current Bank of England £50 note, which was released on 23 June 2021, to coincide with his birthday. A 2019 BBC series, as voted by the audience, named him the greatest person of the 20th century."
22
+ inputs = tokenizer(text, return_tensors="pt")
23
+ outputs = model.generate(**inputs, num_beams=5)
24
+ ```
25
+
26
+ Generation and evaluation can also be run from the terminal.
27
+
28
+ ```bash
29
+ python plan_simp/scripts/generate.py inference
30
+ --model_ckpt=liamcripwell/ledpara
31
+ --test_file=<test_data>
32
+ --reading_lvl=s_level
33
+ --out_file=<output_csv>
34
+
35
+ python plan_simp/scripts/eval_simp.py
36
+ --input_data=newselaauto_docs_test.csv
37
+ --output_data=test_out_ledpara.csv
38
+ --x_col=complex_str
39
+ --r_col=simple_str
40
+ --y_col=pred
41
+ --doc_id_col=pair_id
42
+ --prepro=True
43
+ --sent_level=True
44
+ ```
45
+