Sentence Similarity
sentence-transformers
PyTorch
Transformers
Japanese
luke
feature-extraction
akiFQC commited on
Commit
008d110
1 Parent(s): 6040316

small update (reame)

Browse files
Files changed (2) hide show
  1. README.md +2 -0
  2. README_JA.md +11 -0
README.md CHANGED
@@ -22,6 +22,8 @@ datasets:
22
 
23
  # GLuCoSE (General Luke-based COntrastive Sentence Embedding)-base-Japanese
24
 
 
 
25
  GLuCoSE (General LUke-based COntrastive Sentence Embedding, "glucose") is a Japanese text embedding model based on [LUKE](https://github.com/studio-ousia/luke). In order to create a general-purpose, user-friendly Japanese text embedding model, GLuCoSE has been trained on a mix of web data and various datasets associated with natural language inference and search. This model is not only suitable for sentence vector similarity tasks but also for semantic search tasks.
26
  - Maximum token count: 512
27
  - Output dimension: 768
 
22
 
23
  # GLuCoSE (General Luke-based COntrastive Sentence Embedding)-base-Japanese
24
 
25
+ [日本語のREADME/Japanese README](https://huggingface.co/pkshatech/GLuCoSE-base-ja)
26
+
27
  GLuCoSE (General LUke-based COntrastive Sentence Embedding, "glucose") is a Japanese text embedding model based on [LUKE](https://github.com/studio-ousia/luke). In order to create a general-purpose, user-friendly Japanese text embedding model, GLuCoSE has been trained on a mix of web data and various datasets associated with natural language inference and search. This model is not only suitable for sentence vector similarity tasks but also for semantic search tasks.
28
  - Maximum token count: 512
29
  - Output dimension: 768
README_JA.md CHANGED
@@ -8,10 +8,21 @@ tags:
8
  - feature-extraction
9
  - sentence-transformers
10
  inference: false
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
  # GLuCoSE (General Luke-based COntrastive Sentence Embedding)-base-Japanese
14
 
 
 
15
  GLuCoSE (General LUke-based COntrastive Sentence Embedding, "ぐるこーす")は[LUKE](https://github.com/studio-ousia/luke)をベースにした日本語のテキスト埋め込みモデルです。汎用的で気軽に使えるテキスト埋め込みモデルを目指して、Webデータと自然言語推論や検索などの複数のデータセットを組み合わせたデータで学習されています。文ベクトルの類似度タスクだけでなく意味検索タスクにもお使いいただけます。
16
  - 最大トークン数: 512
17
  - 出力次元数: 768
 
8
  - feature-extraction
9
  - sentence-transformers
10
  inference: false
11
+ datasets:
12
+ - mc4
13
+ - clips/mqa
14
+ - shunk031/JGLUE
15
+ - paws-x
16
+ - hpprc/janli
17
+ - MoritzLaurer/multilingual-NLI-26lang-2mil7
18
+ - castorini/mr-tydi
19
+ - hpprc/jsick
20
  ---
21
 
22
  # GLuCoSE (General Luke-based COntrastive Sentence Embedding)-base-Japanese
23
 
24
+ [English README/英語のREADME](https://huggingface.co/pkshatech/GLuCoSE-base-ja)
25
+
26
  GLuCoSE (General LUke-based COntrastive Sentence Embedding, "ぐるこーす")は[LUKE](https://github.com/studio-ousia/luke)をベースにした日本語のテキスト埋め込みモデルです。汎用的で気軽に使えるテキスト埋め込みモデルを目指して、Webデータと自然言語推論や検索などの複数のデータセットを組み合わせたデータで学習されています。文ベクトルの類似度タスクだけでなく意味検索タスクにもお使いいただけます。
27
  - 最大トークン数: 512
28
  - 出力次元数: 768