yulongl commited on
Commit
b460825
1 Parent(s): b27819c

Create Readme file

Browse files
Files changed (1) hide show
  1. README.md +44 -0
README.md ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Basic Information
2
+
3
+ This is the Dr. Decr-large model used in XOR-TyDi leaderboard task 1 whitebox submission.
4
+
5
+ https://nlp.cs.washington.edu/xorqa/
6
+
7
+
8
+ The detailed implementation of the model can be found in:
9
+
10
+ https://arxiv.org/pdf/2112.08185.pdf
11
+
12
+ Source code to train the model can be found via PrimeQA's IR component:
13
+ https://github.com/primeqa/primeqa/tree/main/examples/drdecr
14
+
15
+ It is a Neural IR model built on top of the ColBERTv2 api and not directly compatible with Huggingface API. The inference result on XOR Dev dataset is:
16
+ ```
17
+ R@2kt R@5kt
18
+ ko 69.1 75.1
19
+ ar 68.0 75.7
20
+ bn 81.9 85.2
21
+ fi 68.2 73.6
22
+ ru 67.1 72.2
23
+ ja 63.1 69.7
24
+ te 82.8 86.1
25
+ Avg 71.4 76.8
26
+ ```
27
+ # Limitations and Bias
28
+
29
+ This model used pre-trained XLMR-large model and fine tuned on 7 languages in XOR-TyDi leaderboard. The performance of other languages was not tested.
30
+
31
+ Since the model was fine-tuned on a large pre-trained language model XLM-Roberta, biases associated with the pre-existing XLM-Roberta model may be present in our fine-tuned model, Dr. Decr
32
+
33
+ # Citation
34
+ ```
35
+ @article{Li2021_DrDecr,
36
+ doi = {10.48550/ARXIV.2112.08185},
37
+ url = {https://arxiv.org/abs/2112.08185},
38
+ author = {Li, Yulong and Franz, Martin and Sultan, Md Arafat and Iyer, Bhavani and Lee, Young-Suk and Sil, Avirup},
39
+ keywords = {Computation and Language (cs.CL), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences},
40
+ title = {Learning Cross-Lingual IR from an English Retriever},
41
+ publisher = {arXiv},
42
+ year = {2021}
43
+ }
44
+ ```