kalpeshk2011 commited on
Commit
5d71a07
1 Parent(s): 7d73e03

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -0
README.md CHANGED
@@ -6,6 +6,36 @@ https://github.com/martiansideofthemoon/rankgen
6
 
7
  RankGen is a suite of encoder models (100M-1.2B parameters) which map prefixes and generations from any pretrained English language model to a shared vector space. RankGen can be used to rerank multiple full-length samples from an LM, and it can also be incorporated as a scoring function into beam search to significantly improve generation quality (0.85 vs 0.77 MAUVE, 75% preference according to humans annotators who are English writers). RankGen can also be used like a dense retriever, and achieves state-of-the-art performance on [literary retrieval](https://relic.cs.umass.edu/leaderboard.html).
8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  ## Using RankGen
10
 
11
  Loading RankGen is simple using the HuggingFace APIs (see Method-2 below), but we suggest using [`RankGenEncoder`](https://github.com/martiansideofthemoon/rankgen/blob/master/rankgen/rankgen_encoder.py), which is a small wrapper around the HuggingFace APIs for correctly preprocessing data and doing tokenization automatically. You can either download [our repository](https://github.com/martiansideofthemoon/rankgen) and install the API, or copy the implementation from [below](#rankgenencoder-implementation).
 
6
 
7
  RankGen is a suite of encoder models (100M-1.2B parameters) which map prefixes and generations from any pretrained English language model to a shared vector space. RankGen can be used to rerank multiple full-length samples from an LM, and it can also be incorporated as a scoring function into beam search to significantly improve generation quality (0.85 vs 0.77 MAUVE, 75% preference according to humans annotators who are English writers). RankGen can also be used like a dense retriever, and achieves state-of-the-art performance on [literary retrieval](https://relic.cs.umass.edu/leaderboard.html).
8
 
9
+ ## Setup
10
+
11
+ **Requirements** (`pip` will install these dependencies for you)
12
+
13
+ Python 3.7+, `torch` (CUDA recommended), `transformers`
14
+
15
+ **Installation**
16
+
17
+ ```
18
+ python3.7 -m virtualenv rankgen-venv
19
+ source rankgen-venv/bin/activate
20
+ pip install rankgen
21
+ ```
22
+
23
+ Get the data [here](https://drive.google.com/drive/folders/1DRG2ess7fK3apfB-6KoHb_azMuHbsIv4?usp=sharing) and place folder in root directory. Alternatively, use `gdown` as shown below,
24
+
25
+ ```
26
+ gdown --folder https://drive.google.com/drive/folders/1DRG2ess7fK3apfB-6KoHb_azMuHbsIv4
27
+ ```
28
+
29
+ Run the test script to make sure the RankGen checkpoint has loaded correctly,
30
+
31
+ ```
32
+ python -m rankgen.test_rankgen_encoder --model_path kalpeshk2011/rankgen-t5-base-all
33
+
34
+ ### Expected output
35
+ 0.0009239262409127233
36
+ 0.0011521980725477804
37
+ ```
38
+
39
  ## Using RankGen
40
 
41
  Loading RankGen is simple using the HuggingFace APIs (see Method-2 below), but we suggest using [`RankGenEncoder`](https://github.com/martiansideofthemoon/rankgen/blob/master/rankgen/rankgen_encoder.py), which is a small wrapper around the HuggingFace APIs for correctly preprocessing data and doing tokenization automatically. You can either download [our repository](https://github.com/martiansideofthemoon/rankgen) and install the API, or copy the implementation from [below](#rankgenencoder-implementation).