metadata
language: ko
tags:
- roberta
- sentence-transformers
datasets:
- klue
KLUE RoBERTa base model for Sentence Embeddings
This is the sentence-klue-roberta-base
model. The sentence-transformers repository allows to train and use Transformer models for generating sentence and text embeddings.
The model is described in the paper Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Usage (Sentence-Transformers)
Using this model becomes more convenient when you have sentence-transformers installed:
pip install -U sentence-transformers
Then you can use the model like this:
import torch
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer("Huffon/sentence-klue-roberta-base")
docs = [
"1992λ
7μ 8μΌ μν₯λ―Όμ κ°μλ μΆμ²μ ννλμμ μλ²μ§ μμ
μ κ³Ό μ΄λ¨Έλ κΈΈμμμ μ°¨λ¨μΌλ‘ νμ΄λ κ·Έκ³³μμ μλλ€.",
"νμ μν₯μ€μ΄λ€.",
"μΆμ² λΆμμ΄λ±νκ΅λ₯Ό μ‘Έμ
νκ³ , μΆμ² ννμ€νκ΅μ μ
νν ν 2νλ
λ μμ£Ό μ‘λ―Όκ΄μ€νκ΅ μΆκ΅¬λΆμ λ€μ΄κ°κΈ° μν΄ μ ννμ¬ μ‘Έμ
νμμΌλ©°, 2008λ
λΉμ FC μμΈμ U-18νμ΄μλ λλΆκ³ λ±νκ΅ μΆκ΅¬λΆμμ μ μ νλ μ€ λνμΆκ΅¬νν μ°μμ μ ν΄μΈμ ν νλ‘μ νΈμ μ λ°λμ΄ 2008λ
8μ λ
μΌ λΆλ°μ€λ¦¬κ°μ ν¨λΆλ₯΄ν¬ μ μλ
νμ μ
λ¨νμλ€.",
"ν¨λΆλ₯΄ν¬ μ μ€ν μ£Όμ 곡격μλ‘ 2008λ
6μ λ€λλλμμ μ΄λ¦° 4κ°κ΅ κ²½κΈ°μμ 4κ²μμ μΆμ , 3골μ ν°λ¨λ Έλ€.",
"1λ
κ°μ μ ν ν 2009λ
8μ νκ΅μΌλ‘ λμμ¨ ν 10μμ κ°λ§ν FIFA U-17 μλμ»΅μ μΆμ νμ¬ 3골μ ν°νΈλ¦¬λ©° νκ΅μ 8κ°μΌλ‘ μ΄λμλ€.",
"κ·Έν΄ 11μ ν¨λΆλ₯΄ν¬μ μ μ μ μλ
ν μ μ κ³μ½μ 체결νμμΌλ©° λ
μΌ U-19 리그 4κ²½κΈ° 2골μ λ£κ³ 2κ΅° 리그μ μΆμ μ μμνλ€.",
"λ
μΌ U-19 리그μμ μν₯λ―Όμ 11κ²½κΈ° 6골, 2λΆ λ¦¬κ·Έμμλ 6κ²½κΈ° 1골μ λ£μΌλ©° μ¬λ₯μ μΈμ λ°μ 2010λ
6μ 17μΈμ λμ΄λ‘ ν¨λΆλ₯΄ν¬μ 1κ΅° ν νλ ¨μ μ°Έκ°, ν리μμ¦ νμ½μΌλ‘ ν¨λΆλ₯΄ν¬μ μ μ κ³μ½μ ν ν 10μ 18μΈμ ν¨λΆλ₯΄ν¬ 1κ΅° μμμΌλ‘ λ
μΌ λΆλ°μ€λ¦¬κ°μ λ°λ·νμλ€.",
]
document_embeddings = model.encode(docs)
query = "μν₯λ―Όμ μ΄λ¦° λμ΄μ μ λ½μ μ§μΆνμλ€."
query_embedding = model.encode(query)
top_k = min(5, len(docs))
cos_scores = util.pytorch_cos_sim(query_embedding, document_embeddings)[0]
top_results = torch.topk(cos_scores, k=top_k)
print(f"μ
λ ₯ λ¬Έμ₯: {query}")
print(f"<μ
λ ₯ λ¬Έμ₯κ³Ό μ μ¬ν {top_k} κ°μ λ¬Έμ₯>")
for i, (score, idx) in enumerate(zip(top_results[0], top_results[1])):
print(f"{i+1}: {docs[idx]} {'(μ μ¬λ: {:.4f})'.format(score)}")