Edit model card

๐ŸŠ DPR-KO

1. Intro

ํ•œ๊ตญ์–ด DPR ๋ชจ๋ธ (Question Encoder) ์ž…๋‹ˆ๋‹ค.
Facebook์˜ DPR ์ฝ”๋“œ์™€๋Š” ์ „ํ˜€ ๋‹ค๋ฅธ ์ƒˆ๋กœ์šด ์ฝ”๋“œ๋กœ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
Dense Vector ๊ธฐ๋ฐ˜์˜ Semantic Search์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์งˆ๋ฌธ์€ Question Encoder๋กœ, ํ…์ŠคํŠธ๋Š” Context Encoder๋ฅผ ์ด์šฉํ•ด ์ธ์ฝ”๋”ฉํ•ฉ๋‹ˆ๋‹ค.

2. Experiment settings

  • ๋ฒ ์ด์Šค ๋ชจ๋ธ: klue/bert-base
  • ๋ฐ์ดํ„ฐ ์…‹: KorQuad v1
  • ์œ„ํ‚ค ๋คํ”„: kowiki-latest-pages-articles.xml.bz2 (2024/07/23)
  • ์ฒญํฌ ๋‹น ๋ฌธ์žฅ: 5
  • ์ „์ฒด ์ฒญํฌ: ์•ฝ 160 ๋งŒ
  • BM25 ๊ฐ€์ค‘์น˜: 0.3
  • 1 A100 GPU

3. Performance

(%) BM25 (w/o DPR-KO) DPR-KO (w/o BM25) DPR-KO (with BM25)
Top1 Acc 36.25 48.98 71.16
Top5 Acc 51.61 71.16 86.75
Top10 Acc 57.34 77.05 90.28
Top20 Acc 62.40 82.09 92.66
Top50 Acc 68.46 87.03 94.86
Top100 Acc 72.48 90.23 96.02

โ€ป BM25๋ชจ๋ธ์€ ํ•œ๊ตญ์–ด ์œ„ํ‚คํ”ผ๋””์•„ ์ „์ฒด ํ…์ŠคํŠธ๋กœ ํ•™์Šตํ•œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
โ€ป ์ž์„ธํ•œ ํ•™์Šต ๋ฐ ํ‰๊ฐ€ ๋ฐฉ์‹์€ Github๋ฅผ ์ฐธ๊ณ ํ•ด์ฃผ์„ธ์š”.

Citing

@article{lim2019korquad1,
  title={Korquad1. 0: Korean qa dataset for machine reading comprehension},
  author={Lim, Seungyoung and Kim, Myungji and Lee, Jooyoul},
  journal={arXiv preprint arXiv:1909.07005},
  year={2019}
}
@article{karpukhin2020dense,
  title={Dense Passage Retrieval for Open-Domain Question Answering},
  author={Vladimir Karpukhin, Barlas OฤŸuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih},
  journal={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year={2020}
}
@misc{park2021klue,
      title={KLUE: Korean Language Understanding Evaluation},
      author={Sungjoon Park and Jihyung Moon and Sungdong Kim and Won Ik Cho and Jiyoon Han and Jangwon Park and Chisung Song and Junseong Kim and Yongsook Song and Taehwan Oh and Joohong Lee and Juhyun Oh and Sungwon Lyu and Younghoon Jeong and Inkwon Lee and Sangwoo Seo and Dongjun Lee and Hyunwoo Kim and Myeonghwa Lee and Seongbo Jang and Seungwon Do and Sunkyoung Kim and Kyungtae Lim and Jongwon Lee and Kyumin Park and Jamin Shin and Seonghyun Kim and Lucy Park and Alice Oh and Jungwoo Ha and Kyunghyun Cho},
      year={2021},
      eprint={2105.09680},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Downloads last month
0
Safetensors
Model size
111M params
Tensor type
F32
ยท
Inference API
Unable to determine this model's library. Check the docs .

Dataset used to train snumin44/biencoder-ko-bert-question