kornwtp/ConGen-simcse-model-roberta-base-thai
This is a ConGen model: It maps sentences to a 768 dimensional dense vector space and can be used for tasks like semantic search.
Usage
Using this model becomes easy when you have ConGen installed:
pip install -U git+https://github.com/KornWtp/ConGen.git
Then you can use the model like this:
from sentence_transformers import SentenceTransformer
sentences = ["กลุ่มผู้ชายเล่นฟุตบอลบนชายหาด", "กลุ่มเด็กชายกำลังเล่นฟุตบอลบนชายหาด"]
model = SentenceTransformer('kornwtp/ConGen-simcse-model-roberta-base-thai')
embeddings = model.encode(sentences)
print(embeddings)
Evaluation Results
For an automated evaluation of this model, see the Thai Sentence Embeddings Benchmark: Semantic Textual Similarity
Citing & Authors
@inproceedings{limkonchotiwat-etal-2022-congen,
title = "{ConGen}: Unsupervised Control and Generalization Distillation For Sentence Representation",
author = "Limkonchotiwat, Peerat and
Ponwitayarat, Wuttikorn and
Lowphansirikul, Lalita and
Udomcharoenchaikit, Can and
Chuangsuwanich, Ekapol and
Nutanong, Sarana",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2022",
year = "2022",
publisher = "Association for Computational Linguistics",
}
- Downloads last month
- 4,052
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.