thingsu commited on
Commit
76812f6
1 Parent(s): 1b13736

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -0
README.md ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ fintuned the kykim/bert-kor-base model as a dense passage retrieval context encoder by KLUE dataset
2
+ this link is experiment result. https://wandb.ai/thingsu/DenseRetrieval
3
+
4
+ Corpus : Korean Wikipedia Corpus
5
+
6
+ Trained Strategy :
7
+ - Pretrained Model : kykim/bert-kor-base
8
+ - Inverse Cloze Task : 16 Epoch, by korquad v 1.0, KLUE MRC dataset
9
+ - In-batch Negatives : 12 Epoch, by KLUE MRC dataset, random sampling between Sparse Retrieval(TF-IDF) top 100 passage per each query
10
+
11
+ I'm not confident about this model will work in other dataset or corpus.
12
+ '''
13
+ from Transformers import AutoTokenizer, BertPreTrainedModel, BertModel
14
+
15
+ class BertEncoder(BertPreTrainedModel):
16
+ def __init__(self, config):
17
+ super(BertEncoder, self).__init__(config)
18
+
19
+ self.bert = BertModel(config)
20
+ self.init_weights()
21
+
22
+
23
+ def forward(self, input_ids, attention_mask=None, token_type_ids=None):
24
+ outputs = self.bert(input_ids, attention_mask, token_type_ids)
25
+ pooled_output = outputs[1]
26
+ return pooled_output
27
+
28
+ model_name = 'kykim/bert-kor-base'
29
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
30
+
31
+ q_encoder = BertEncoder.from_pretrained("thingsu/koDPR_question")
32
+ p_encoder = BertEncoder.from_pretrained("thingsu/koDPR_context")
33
+
34
+ '''