Korean Embedding Models
Collection
Public Korean embedding models with benchmark results and model cards • 2 items • Updated
How to use hyunseop/bge-m3-ko with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("hyunseop/bge-m3-ko")
sentences = [
"That is a happy person",
"That is a happy dog",
"That is a very happy person",
"Today is a sunny day"
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]H100 fine-tuned Korean embedding model based on dragonkue/BGE-m3-ko.
This is the better fit for the local AutoRAG corpus among the two uploaded models. Compared with qwen3-embedding-h100, it is the stronger model to highlight for the domain-specific benchmark.
dragonkue/BGE-m3-ko111677_20260506_114341_both_2gpuMIRACLRetrieval9c09abc13478308c27598f350e31d8f06b9b5481| cutoff | Precision | Recall | F1 | mAP | mRR | NDCG |
|---|---|---|---|---|---|---|
| @1 | 0.39906 | 0.25093 | 0.30812 | 0.25093 | 0.399061 | 0.39906 |
| @3 | 0.26604 | 0.43788 | 0.33099 | 0.34631 | 0.50313 | 0.43568 |
| @5 | 0.20282 | 0.52623 | 0.29279 | 0.37510 | 0.523787 | 0.45984 |
| @10 | 0.13380 | 0.64972 | 0.22190 | 0.40375 | 0.537201 | 0.50233 |
| @20 | 0.08052 | 0.72989 | 0.14504 | 0.41724 | 0.540598 | 0.53096 |
| @100 | 0.02272 | 0.90287 | 0.04432 | 0.43030 | 0.543259 | 0.57932 |
| @1000 | 0.00254 | 0.98881 | 0.00507 | 0.43149 | 0.543447 | 0.59457 |
output/111677_20260506_114341_both_2gpu/bgebenchmark_results/autorag_benchmark.jsonbenchmark_results/miracl_benchmark.txtbenchmark_results/mteb/MIRACLRetrieval.jsonrtfin-qwen3-embedding-h100dragonkue/snowflake-arctic-embed-l-v2.0-ko and rtfin-qwen3-embedding-h100 on the public Korean retrieval leaderboard, but still a strong local-domain model