sangdv commited on
Commit
26d93a5
1 Parent(s): c1d85a2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -0
README.md CHANGED
@@ -22,6 +22,23 @@ widget:
22
  # bkai-foundation-models/vietnamese-bi-encoder
23
 
24
  This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
  <!--- Describe your model here -->
27
 
 
22
  # bkai-foundation-models/vietnamese-bi-encoder
23
 
24
  This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
25
+ We train the model on a merged training dataset that consists of:
26
+ - MS Macro (translated in Vietnamese)
27
+ - Squadv2 (translated in Vietnamese)
28
+ - 80% of the training set from the Legal Text Retrieval Zalo 2021 challenge
29
+
30
+ We use phobert-base-v2 as the pre-trained backbone.
31
+
32
+ Here are the results on the remaining 20% of the training set from the Legal Text Retrieval Zalo 2021 challenge:
33
+
34
+ | Pretrained Model | Trained Datasets | Acc@1 | Acc@10 | Acc@100 | Pre@10 | MRR@10 |
35
+ |-------------------------------|---------------------------------------|:------------:|:-------------:|:--------------:|:-------------:|:-------------:|
36
+ | [Vietnamese-SBERT](https://huggingface.co/keepitreal/vietnamese-sbert) | - | 32.34 | 52.97 | 89.84 | 7.05 | 45.30 |
37
+ | | MSMACRO | 54.06 | 84.69 | 93.75 | 8.33 | 64.56 |
38
+ | PhoBERT-base-v2 | MSMACRO | 47.81 | 77.19 | 92.34 | 7.72 | 58.37 |
39
+ | | MSMACRO + SQuADv2.0 + 80% Zalo | 73.28 | 93.59 | 98.85 | 9.36 | 80.73 |
40
+
41
+ ![Uploading image.png…]()
42
 
43
  <!--- Describe your model here -->
44