Files changed (1) hide show
  1. README.md +6 -27
README.md CHANGED
@@ -8,31 +8,18 @@ tags:
8
  - transformers
9
  license: apache-2.0
10
  widget:
11
- - source_sentence: "那个人很开心"
12
- sentences:
13
- - "那个人非常开心"
14
- - "那只猫很开心"
15
- - "那个人在吃东西"
16
  ---
17
 
18
  # Chinese Sentence BERT
19
 
20
  ## Model description
21
 
22
- This is the sentence embedding model pre-trained by [UER-py](https://github.com/dbiir/UER-py/), which is introduced in [this paper](https://arxiv.org/abs/1909.05658). Besides, the model could also be pre-trained by [TencentPretrain](https://github.com/Tencent/TencentPretrain) introduced in [this paper](https://arxiv.org/abs/2212.06385), which inherits UER-py to support models with parameters above one billion, and extends it to a multimodal pre-training framework.
23
-
24
- ## How to use
25
-
26
- You can use this model to extract sentence embeddings for sentence similarity task. We use cosine distance to calculate the embedding similarity here:
27
-
28
- ```python
29
- >>> from sentence_transformers import SentenceTransformer
30
- >>> model = SentenceTransformer('uer/sbert-base-chinese-nli')
31
- >>> sentences = ['那个人很开心', '那个人非常开心']
32
- >>> sentence_embeddings = model.encode(sentences)
33
- >>> from sklearn.metrics.pairwise import paired_cosine_distances
34
- >>> cosine_score = 1 - paired_cosine_distances([sentence_embeddings[0]],[sentence_embeddings[1]])
35
- ```
36
 
37
  ## Training data
38
 
@@ -68,7 +55,6 @@ python3 scripts/convert_sbert_from_uer_to_huggingface.py --input_model_path mode
68
  journal={arXiv preprint arXiv:1908.10084},
69
  year={2019}
70
  }
71
-
72
  @article{zhao2019uer,
73
  title={UER: An Open-Source Toolkit for Pre-training Models},
74
  author={Zhao, Zhe and Chen, Hui and Zhang, Jinbin and Zhao, Xin and Liu, Tao and Lu, Wei and Chen, Xi and Deng, Haotang and Ju, Qi and Du, Xiaoyong},
@@ -76,11 +62,4 @@ python3 scripts/convert_sbert_from_uer_to_huggingface.py --input_model_path mode
76
  pages={241},
77
  year={2019}
78
  }
79
-
80
- @article{zhao2023tencentpretrain,
81
- title={TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities},
82
- author={Zhao, Zhe and Li, Yudong and Hou, Cheng and Zhao, Jing and others},
83
- journal={ACL 2023},
84
- pages={217},
85
- year={2023}
86
  ```
 
8
  - transformers
9
  license: apache-2.0
10
  widget:
11
+ source_sentence: "那个人很开心"
12
+ sentences:
13
+ - 那个人非常开心
14
+ - 那只猫很开心
15
+ - 那个人在吃东西
16
  ---
17
 
18
  # Chinese Sentence BERT
19
 
20
  ## Model description
21
 
22
+ This is the sentence embedding model pre-trained by [UER-py](https://github.com/dbiir/UER-py/), which is introduced in [this paper](https://arxiv.org/abs/1909.05658).
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
  ## Training data
25
 
 
55
  journal={arXiv preprint arXiv:1908.10084},
56
  year={2019}
57
  }
 
58
  @article{zhao2019uer,
59
  title={UER: An Open-Source Toolkit for Pre-training Models},
60
  author={Zhao, Zhe and Chen, Hui and Zhang, Jinbin and Zhao, Xin and Liu, Tao and Lu, Wei and Chen, Xi and Deng, Haotang and Ju, Qi and Du, Xiaoyong},
 
62
  pages={241},
63
  year={2019}
64
  }
 
 
 
 
 
 
 
65
  ```