Instructions to use BCCard/MoAI-Embedding-0.6B-LoRA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use BCCard/MoAI-Embedding-0.6B-LoRA with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("BCCard/MoAI-Embedding-0.6B-LoRA") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - PEFT
How to use BCCard/MoAI-Embedding-0.6B-LoRA with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
1. Overview
A Korean text-embedding model for the BC Card domain, built by LoRA fine-tuning
Qwen/Qwen3-Embedding-0.6B on BC Card
in-domain data (personal / merchant / corporate / VIP). It is intended as the retriever (bi-encoder) stage of a BC Card RAG pipeline.
On a held-out in-domain test set it improves NDCG@10 by +8.2% and Accuracy@1 by +11.3% over the base model.
This repository ships the LoRA adapter. Loading it pulls the base model (
Qwen/Qwen3-Embedding-0.6B) and applies the adapter on top. For a base-free, self-contained artifact (e.g. for vLLM / TEI serving), use a merged build instead.
1.1. TL;DR
- Base model:
Qwen/Qwen3-Embedding-0.6Bβ 28 layers, hidden 1024, last-token pooling, instruction-aware - Domain / Language: Finance (BC Card β personal / merchant / corporate / VIP) / Korean
- Task: Query-document retrieval (QA search, document similarity), RAG retriever
- Method: PEFT (LoRA) + Multiple Negatives Ranking (contrastive)
- Embedding dimension: 1024 Β· Max sequence length: 1024 Β· Similarity: cosine (outputs are L2-normalized)
- Intended use
- In-house BC Card-domain RAG retriever (Top-K candidate retrieval)
- QA search, document-similarity scoring
1.2. Usage
Install sentence-transformers and peft (required to apply the LoRA adapter); loading also
downloads the base model Qwen/Qwen3-Embedding-0.6B on first use.
pip install -U sentence-transformers peft
Queries use an instruction prompt; documents use none (matching how the model was trained).
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("BCCard/MoAI-Embedding-0.6B")
queries = ["BCμΉ΄λ μ°νλΉλ μ΄λ»κ² λλμ?"]
documents = [
"BCμΉ΄λ μ°νλΉλ μΉ΄λ μ’
λ₯μ νν ꡬμ±μ λ°λΌ λ€λ₯΄κ² μ±
μ λ©λλ€ ...",
"μΉ΄λ λΆμ€ μ κ³ λ κ³ κ°μΌν° λλ μ±μμ μ¦μ κ°λ₯ν©λλ€ ...",
]
# `prompt_name` selects the prompt stored in the model config
q_emb = model.encode(queries, prompt_name="query") # query instruction auto-applied
d_emb = model.encode(documents, prompt_name="document") # document side (no instruction)
scores = model.similarity(q_emb, d_emb) # cosine; rank documents by score
print(scores)
- Query prompt (instruction):
Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: - Document prompt: none
1.3. Training Data
| Dataset | Role | Size |
|---|---|---|
| BCAI-Finance-Kor-Embedding-Triplet | Training (anchor / positive / negative) | 43,394 triplets (train) |
| BCAI-Finance-Kor-Embedding-Pair | Corpus pool / evaluation | 36,281 unique chunks |
- Sources: BC Card financial QA (BCAI) + website crawl + synthetic data (chunking + multi-query generation)
- Triplets are constructed via hard-negative mining over the unified corpus.
1.4. Training Procedure
| Item | Value |
|---|---|
| Method | LoRA (PEFT) |
| LoRA | r=64, alpha=128, dropout=0.05, targets = q,k,v,o,gate,up,down_proj |
| Loss | CachedMultipleNegativesRankingLoss (in-batch negatives) |
| Batch | per-device 256 (DDP) β 511 in-batch negatives per rank |
| LR / scheduler | 1e-4 / cosine, warmup_ratio 0.1, weight_decay 0.01 |
| Epochs | 3, early stopping β best checkpoint selected by validation NDCG@10 |
| Precision | bf16, gradient checkpointing |
| Hardware | 6Γ NVIDIA L40S (DDP) |
2. Evaluation
2.1. Training
Trained for 3 epochs (early-stopped) with a cosine schedule; training loss decreases steadily while validation NDCG@10 climbs early and plateaus, and the best checkpoint is selected at the peak. Curves (loss / learning rate / validation NDCG@10) are logged to Weights & Biases.
2.2. In-domain Retrieval Benchmark
(1) Setup
- Queries: 1,000 (held-out test split) Β· Corpus: 36,281 unique chunks
- Protocol: binary-relevance information retrieval; the same evaluator used during training
- Metrics: NDCG@10 (primary), MRR@10, Recall@{1,10}, Accuracy@1, MAP@10
- Models compared: base (
Qwen3-Embedding-0.6B, no fine-tuning) vs. v1 (r32 / lr2e-4 / 4ep) vs. v2 (r64 / lr1e-4 / 3ep, released)
(2) Test set
| Metric | base (Qwen3-0.6B) | v1 (r32/2e-4/4ep) | v2 (r64/1e-4/3ep) | v2 Ξ vs base |
|---|---|---|---|---|
| NDCG@10 | 0.6186 | 0.6665 | 0.6695 | +0.051 (+8.2%) |
| MRR@10 | 0.6449 | 0.6993 | 0.7060 | +0.061 (+9.5%) |
| Recall@10 | 0.7046 | 0.7512 | 0.7508 | +0.046 (+6.6%) |
| Recall@1 | 0.4730 | 0.5221 | 0.5293 | +0.056 (+11.9%) |
| Accuracy@1 | 0.5560 | 0.6080 | 0.6190 | +0.063 (+11.3%) |
| MAP@10 | 0.5652 | 0.6131 | 0.6171 | +0.052 (+9.2%) |
v2 is the released model (best across all metrics; Recall@10 is on par with v1). Fine-tuning lifts in-domain retrieval by roughly +10% over the base model, with the largest gains on top-rank precision (Accuracy@1, Recall@1).
2.3. Limitations
- Domain-specific β tuned for BC Card Korean financial text; out-of-domain or non-Korean performance is not guaranteed.
- Re-ranking recommended β as a 0.6B bi-encoder, it favors recall/throughput over fine-grained precision.
- Recommended pipeline: Bi-Encoder (this model) Top-K β Cross-Encoder re-ranking.
- Sequence length β inputs are truncated at 1,024 tokens; content past that limit is not encoded, so very long documents should be chunked before indexing.
- Exact-value matching β fine-grained numeric/tabular facts (fees, rates, dates, terms) are not reliably distinguished by dense similarity alone; pair with lexical (BM25) retrieval or a re-ranker when exactness matters.
- Retrieval only β this is an embedding model, not a generator; it ranks passages and does not produce answers.
- Synthetic data influence β part of the training set is LLM-synthesized (chunking + multi-query), which may carry the generator's stylistic/coverage biases.
- PII β personal/card information was masked during preprocessing, but the model performs no PII protection at inference; apply your own masking/filtering on inputs and outputs.
3. Future Work
- Data quality improvement & re-training
- Human-annotation labeling
- More rigorous hard-negative mining (iterative, mined with this model)
- Broader/higher-quality data (incl. general financial corpora)
- System-level
- Cross-Encoder re-ranker for precision
- HyDE / dynamic instruction injection at query time
4. Meta Info
4.1. Citation
@misc{bccard2026moaiembedding,
title = {MoAI-Embedding-0.6B: A BC Card-Domain Korean Text Embedding Model},
author = {BC Card},
year = {2026},
howpublished = {\url{https://huggingface.co/BCCard/MoAI-Embedding-0.6B}},
note = {LoRA fine-tune of Qwen3-Embedding-0.6B for BC Card-domain Korean retrieval}
}
- Corpus dataset:
BCCard/BCAI-Finance-Kor-Embedding-Pair