|
--- |
|
pipeline_tag: sentence-similarity |
|
license: apache-2.0 |
|
tags: |
|
- sentence-transformers |
|
- feature-extraction |
|
- sentence-similarity |
|
- transformers |
|
--- |
|
|
|
# kornwtp/ConGen-WangchanBERT-Small |
|
|
|
This is a [ConGen](https://github.com/KornWtp/ConGen) model: It maps sentences to a 128 dimensional dense vector space and can be used for tasks like semantic search. |
|
|
|
|
|
|
|
## Usage |
|
|
|
Using this model becomes easy when you have [ConGen](https://github.com/KornWtp/ConGen) installed: |
|
|
|
``` |
|
pip install -U git+https://github.com/KornWtp/ConGen.git |
|
``` |
|
|
|
Then you can use the model like this: |
|
|
|
```python |
|
from sentence_transformers import SentenceTransformer |
|
sentences = ["กลุ่มผู้ชายเล่นฟุตบอลบนชายหาด", "กลุ่มเด็กชายกำลังเล่นฟุตบอลบนชายหาด"] |
|
|
|
model = SentenceTransformer('kornwtp/ConGen-WangchanBERT-Small') |
|
embeddings = model.encode(sentences) |
|
print(embeddings) |
|
``` |
|
|
|
|
|
|
|
## Evaluation Results |
|
|
|
|
|
|
|
For an automated evaluation of this model, see the *Thai Sentence Embeddings Benchmark*: [Semantic Textual Similarity](https://github.com/KornWtp/ConGen#thai-semantic-textual-similarity-benchmark) |
|
|
|
|
|
## Citing & Authors |
|
|
|
```bibtex |
|
@inproceedings{limkonchotiwat-etal-2022-congen, |
|
title = "{ConGen}: Unsupervised Control and Generalization Distillation For Sentence Representation", |
|
author = "Limkonchotiwat, Peerat and |
|
Ponwitayarat, Wuttikorn and |
|
Lowphansirikul, Lalita and |
|
Udomcharoenchaikit, Can and |
|
Chuangsuwanich, Ekapol and |
|
Nutanong, Sarana", |
|
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2022", |
|
year = "2022", |
|
publisher = "Association for Computational Linguistics", |
|
} |
|
``` |