TinyPubMedBERT-v1.0 / README.MD
dmis-lab's picture
Upload 7 files
42b3a00
This model repository presents "TinyPubMedBERT", a distillated PubMedBERT (Gu et al., 2021) model.
TinyPubMedBERT is used as the initial weights for the training of the [dmis-lab/KAZU-NER-module-distil-v1.0](https://huggingface.co/dmis-lab/KAZU-NER-module-distil-v1.0) which is used in the initial release of the KAZU (Korea University and AstraZeneca) framework.
The model is composed of 4-layers and distillated following methods introduced in TinyBERT paper (Jiao et al., 2020).
* For the framework, please visit https://github.com/AstraZeneca/KAZU
* For details about the model, please see our paper entitled **Biomedical NER for the Enterprise with Distillated BERN2 and the Kazu Framework**, (EMNLP 2022 industry track).
More details to be announced soon.
### Citation info
Joint-first authorship of **Richard Jackson** (AstraZeneca) and **WonJin Yoon** (Korea University).
<br>Please cite: (Full citation info will be announced soon)
```
@inproceedings{YoonAndJackson2022BiomedicalNER,
title={Biomedical NER for the Enterprise with Distillated BERN2 and the Kazu Framework},
author={Wonjin Yoon, Richard Jackson, Elliot Ford, Vladimir Poroshin, Jaewoo Kang},
booktitle={Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
year={2022}
}
```
The model used resources of PubMedBERT paper and TinyBERT paper.
Gu, Yu, et al. "Domain-specific language model pretraining for biomedical natural language processing." ACM Transactions on Computing for Healthcare (HEALTH) 3.1 (2021): 1-23.
Jiao, Xiaoqi, et al. "TinyBERT: Distilling BERT for Natural Language Understanding." Findings of the Association for Computational Linguistics: EMNLP 2020. 2020.
### Contact Information
For help or issues using the codes or model (NER module of KAZU) in this repository, please contact WonJin Yoon (wonjin.info (at) gmail.com) or submit a GitHub issue.