firqaaa commited on
Commit
736fd46
1 Parent(s): bf1077f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -0
README.md CHANGED
@@ -1,3 +1,68 @@
1
  ---
 
 
 
 
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ pipeline_tag: feature-extraction
3
+ tags:
4
+ - feature-extraction
5
+ - transformers
6
  license: apache-2.0
7
+ language:
8
+ - id
9
+ metrics:
10
+ - accuracy
11
+ - f1
12
+ - precision
13
+ - recall
14
+ datasets:
15
+ - squad_v2
16
+ - natural_questions
17
  ---
18
+ ### indo-dpr-question_encoder-multiset-base
19
+ <p style="font-size:16px">Indonesian Dense Passage Retrieval trained on translated SQuADv2.0 and Natural Question dataset in DPR format.</p>
20
+
21
+
22
+ ### Evaluation
23
+
24
+ | Class | Precision | Recall | F1-Score | Support |
25
+ |-------|-----------|--------|----------|---------|
26
+ | hard_negative | 0.9961 | 0.9961 | 0.9961 | 384778 |
27
+ | positive | 0.8783 | 0.8783 | 0.8783 | 12414 |
28
+
29
+ | Metric | Value |
30
+ |--------|-------|
31
+ | Loss | 0.0220 |
32
+ | Accuracy | 0.9924 |
33
+ | Macro Average | 0.9372 |
34
+ | Weighted Average | 0.9924 |
35
+ | Accuracy and F1 | 0.9353 |
36
+ | Average Rank | 0.2194 |
37
+
38
+
39
+ <p style="font-size:16px">Note: This report is for evaluation on the dev set, after 27288 batches.</p>
40
+
41
+ ### Usage
42
+
43
+ ```python
44
+ from transformers import DPRQuestionEncoder, DPRQuestionEncoderTokenizer
45
+
46
+ tokenizer = DPRQuestionEncoderTokenizer.from_pretrained('firqaaa/indo-dpr-question_encoder-multiset-base')
47
+ model = DPRQuestionEncoder.from_pretrained('firqaaa/indo-dpr-question_encoder-multiset-base')
48
+ input_ids = tokenizer("Siapa nama pengarang manga Yu-Gi-Oh?", return_tensors='pt')["input_ids"]
49
+ embeddings = model(input_ids).pooler_output
50
+ ```
51
+
52
+ You can use it using `haystack` as follows:
53
+
54
+ ```
55
+ from haystack.nodes import DensePassageRetriever
56
+ from haystack.document_stores import InMemoryDocumentStore
57
+
58
+ retriever = DensePassageRetriever(document_store=InMemoryDocumentStore(),
59
+ query_embedding_model="firqaaa/indo-dpr-question_encoder-multiset-base",
60
+ passage_embedding_model="firqaaa/indo-dpr-question_encoder-multiset-base",
61
+ max_seq_len_query=64,
62
+ max_seq_len_passage=256,
63
+ batch_size=16,
64
+ use_gpu=True,
65
+ embed_title=True,
66
+ use_fast_tokenizers=True)
67
+ ```
68
+