🍊 DPR-KO

1. Intro

한국어 DPR 모델 (Question Encoder) 입니다.
Facebook의 DPR 코드와는 전혀 다른 새로운 코드로 학습되었습니다.
Dense Vector 기반의 Semantic Search에 사용할 수 있습니다.
질문은 Question Encoder로, 텍스트는 Context Encoder를 이용해 인코딩합니다.

Github: https://github.com/millet04/DPR-KO
Original Code: https://github.com/facebookresearch/DPR/tree/main
Context Encoder: https://huggingface.co/snumin44/biencoder-ko-bert-context

2. Experiment settings

베이스 모델: klue/bert-base
데이터 셋: KorQuad v1
위키 덤프: kowiki-latest-pages-articles.xml.bz2 (2024/07/23)
청크 당 문장: 5
전체 청크: 약 160 만
BM25 가중치: 0.3
1 A100 GPU

3. Performance

(%)	BM25 (w/o DPR-KO)	DPR-KO (w/o BM25)	DPR-KO (with BM25)
Top1 Acc	36.25	48.98	71.16
Top5 Acc	51.61	71.16	86.75
Top10 Acc	57.34	77.05	90.28
Top20 Acc	62.40	82.09	92.66
Top50 Acc	68.46	87.03	94.86
Top100 Acc	72.48	90.23	96.02

※ BM25모델은 한국어 위키피디아 전체 텍스트로 학습한 모델입니다.
※ 자세한 학습 및 평가 방식은 Github를 참고해주세요.

Citing

@article{lim2019korquad1,
  title={Korquad1. 0: Korean qa dataset for machine reading comprehension},
  author={Lim, Seungyoung and Kim, Myungji and Lee, Jooyoul},
  journal={arXiv preprint arXiv:1909.07005},
  year={2019}
}
@article{karpukhin2020dense,
  title={Dense Passage Retrieval for Open-Domain Question Answering},
  author={Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih},
  journal={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year={2020}
}
@misc{park2021klue,
      title={KLUE: Korean Language Understanding Evaluation},
      author={Sungjoon Park and Jihyung Moon and Sungdong Kim and Won Ik Cho and Jiyoon Han and Jangwon Park and Chisung Song and Junseong Kim and Yongsook Song and Taehwan Oh and Joohong Lee and Juhyun Oh and Sungwon Lyu and Younghoon Jeong and Inkwon Lee and Sangwoo Seo and Dongjun Lee and Hyunwoo Kim and Myeonghwa Lee and Seongbo Jang and Seungwon Do and Sunkyoung Kim and Kyungtae Lim and Jongwon Lee and Kyumin Park and Jamin Shin and Seonghyun Kim and Lucy Park and Alice Oh and Jungwoo Ha and Kyunghyun Cho},
      year={2021},
      eprint={2105.09680},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Downloads last month: 256

Safetensors

Model size

0.1B params

Tensor type

F32

snumin44
/

biencoder-ko-bert-question

🍊 DPR-KO

1. Intro

2. Experiment settings

3. Performance

Citing

Dataset used to train snumin44/biencoder-ko-bert-question