--- license: mit datasets: - COGNANO/VHHCorpus-2M library_name: transformers tags: - biology - protein - antibody - VHH --- ## VHHBERT VHHBERT is a RoBERTa-based model pre-trained on two million VHH sequences in [VHHCorpus-2M](https://huggingface.co/datasets/COGNANO/VHHCorpus-2M). VHHBERT has the same model parameters as RoBERTaBASE, except that it used positional embeddings with a length of 185 to cover the maximum sequence length of 179 in VHHCorpus-2M. Further details on VHHBERT are described in our paper "[A SARS-CoV-2 Interaction Dataset and VHH Sequence Corpus for Antibody Language Models](https://arxiv.org/abs/2405.18749).” ## Usage The model and tokenizer can be loaded using the `transformers` library. ```python from transformers import BertTokenizer, RobertaModel tokenizer = BertTokenizer.from_pretrained("COGNANO/VHHBERT") model = RobertaModel.from_pretrained("COGNANO/VHHBERT") ``` ## Links - Pre-training Corpus: https://huggingface.co/datasets/COGNANO/VHHCorpus-2M - Code: https://github.com/cognano/AVIDa-SARS-CoV-2 - Paper: https://arxiv.org/abs/2405.18749 ## Citation If you use VHHBERT in your research, please cite the following paper. ```bibtex @inproceedings{tsuruta2024sars, title={A {SARS}-{C}o{V}-2 Interaction Dataset and {VHH} Sequence Corpus for Antibody Language Models}, author={Hirofumi Tsuruta and Hiroyuki Yamazaki and Ryota Maeda and Ryotaro Tamura and Akihiro Imura}, booktitle={Advances in Neural Information Processing Systems 37}, year={2024} } ```