Bingsu commited on
Commit
e9a9cb9
1 Parent(s): 3ed7855

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -6
README.md CHANGED
@@ -5,14 +5,27 @@ tags:
5
  - feature-extraction
6
  - sentence-similarity
7
  - transformers
8
-
 
 
 
 
 
 
 
 
 
9
  ---
10
 
11
  # smartmind/roberta-ko-small-tsdae
12
 
13
  This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 256 dimensional dense vector space and can be used for tasks like clustering or semantic search.
14
 
15
- <!--- Describe your model here -->
 
 
 
 
16
 
17
  ## Usage (Sentence-Transformers)
18
 
@@ -72,16 +85,18 @@ print(sentence_embeddings)
72
 
73
  ## Evaluation Results
74
 
75
- <!--- Describe how your model was evaluated -->
76
-
77
- For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name=smartmind/roberta-ko-small-tsdae)
78
 
 
 
 
 
79
 
80
 
81
  ## Full Model Architecture
82
  ```
83
  SentenceTransformer(
84
- (0): Transformer({'max_seq_length': 508, 'do_lower_case': False}) with Transformer model: RobertaModel
85
  (1): Pooling({'word_embedding_dimension': 256, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
86
  )
87
  ```
5
  - feature-extraction
6
  - sentence-similarity
7
  - transformers
8
+ language:
9
+ - ko
10
+ license:
11
+ - mit
12
+ widget:
13
+ source_sentence: "대한민국의 수도는 서울입니다."
14
+ sentences:
15
+ - "미국의 수도는 뉴욕이 아닙니다."
16
+ - "대한민국의 수도 요금은 저렴한 편입니다."
17
+ - "서울은 대한민국의 수도입니다."
18
  ---
19
 
20
  # smartmind/roberta-ko-small-tsdae
21
 
22
  This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 256 dimensional dense vector space and can be used for tasks like clustering or semantic search.
23
 
24
+ Korean roberta small model pretrained with [TSDAE](https://arxiv.org/abs/2104.06979).
25
+
26
+ [TSDAE](https://arxiv.org/abs/2104.06979)로 사전학습된 한국어 roberta모델입니다. 모델의 구조는 [lassl/roberta-ko-small](https://huggingface.co/lassl/roberta-ko-small)과 동일합니다. 토크나이저는 다릅니다.
27
+
28
+ sentence-similarity를 구하는 용도로 바로 사용할 수도 있고, 목적에 맞게 파인튜닝하여 사용할 수도 있습니다.
29
 
30
  ## Usage (Sentence-Transformers)
31
 
85
 
86
  ## Evaluation Results
87
 
88
+ [klue](https://huggingface.co/datasets/klue) STS 데이터에 대해 다음 점수를 얻었습니다. 이 데이터에 대해 파인튜닝하지 **않은** 상태로 구한 점수입니다.
 
 
89
 
90
+ |split|cosine_pearson|cosine_spearman|euclidean_pearson|euclidean_spearman|manhattan_pearson|manhattan_spearman|dot_pearson|dot_spearman|
91
+ |-----|--------------|---------------|-----------------|------------------|-----------------|------------------|-----------|------------|
92
+ |train|0.8735|0.8676|0.8268|0.8357|0.8248|0.8336|0.8449|0.8383|
93
+ |validation|0.5409|0.5349|0.4786|0.4657|0.4775|0.4625|0.5284|0.5252|
94
 
95
 
96
  ## Full Model Architecture
97
  ```
98
  SentenceTransformer(
99
+ (0): Transformer({'max_seq_length': 508, 'do_lower_case': False}) with Transformer model: RobertaModel
100
  (1): Pooling({'word_embedding_dimension': 256, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
101
  )
102
  ```