nicoladecao commited on
Commit
6186793
1 Parent(s): 650ea3e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -1
README.md CHANGED
@@ -12,4 +12,56 @@ tags:
12
  - question-answering
13
  - fill-mask
14
 
15
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  - question-answering
13
  - fill-mask
14
 
15
+ ---
16
+
17
+
18
+ # GENRE
19
+
20
+
21
+ The GENRE (Generative ENtity REtrieval) system as presented in [Autoregressive Entity Retrieval](https://arxiv.org/abs/2010.00904) implemented in pytorch.
22
+
23
+ In a nutshell, GENRE uses a sequence-to-sequence approach to entity retrieval (e.g., linking), based on fine-tuned [BART](https://arxiv.org/abs/1910.13461) architecture. GENRE performs retrieval generating the unique entity name conditioned on the input text using constrained beam search to only generate valid identifiers. The model was first released in the [facebookresearch/GENRE](https://github.com/facebookresearch/GENRE) repository using `fairseq` (the `transformers` models are obtained with a conversion script similar to [this](https://github.com/huggingface/transformers/blob/master/src/transformers/models/bart/convert_bart_original_pytorch_checkpoint_to_pytorch.py).
24
+
25
+
26
+ ## BibTeX entry and citation info
27
+
28
+ **Please consider citing our works if you use code from this repository.**
29
+
30
+ ```bibtex
31
+ @inproceedings{decao2020autoregressive,
32
+ title={Autoregressive Entity Retrieval},
33
+ author={Nicola {De Cao} and Gautier Izacard and Sebastian Riedel and Fabio Petroni},
34
+ booktitle={International Conference on Learning Representations},
35
+ url={https://openreview.net/forum?id=5k8F6UU39V},
36
+ year={2021}
37
+ }
38
+ ```
39
+
40
+ ## Usage
41
+
42
+ Here is an example of generation for Wikipedia page retrieval for open-domain fact-checking:
43
+
44
+ ```python
45
+ import pickle
46
+ from trie import Trie
47
+ from transformers import BartTokenizer, BartForConditionalGeneration
48
+
49
+ # OPTIONAL: load the prefix tree (trie)
50
+ # with open("kilt_titles_trie_dict.pkl", "rb") as f:
51
+ # trie = Trie.load_from_dict(pickle.load(f))
52
+
53
+ tokenizer = BartTokenizer.from_pretrained("facebook/genre-kilt")
54
+ model = BartForConditionalGeneration.from_pretrained("facebook/genre-kilt").eval()
55
+
56
+ sentences = ["Einstein was a German physicist."]
57
+
58
+ outputs = model.generate(
59
+ **tokenizer(sentences, return_tensors="pt"),
60
+ num_beams=5,
61
+ num_return_sequences=5,
62
+ # OPTIONAL: use constrained beam search
63
+ # prefix_allowed_tokens_fn=lambda batch_id, sent: trie.get(sent.tolist()),
64
+ )
65
+
66
+ tokenizer.batch_decode(outputs, skip_special_tokens=True)
67
+ ```