--- language: - en datasets: - pubmed - chemical patent - cooking recipe --- ## ProcBERT ProcBERT is a pre-trained language model specifically for procedural text. It was pre-trained on a large-scale procedural corpus (PubMed articles/chemical patents/cooking recipes) containing over 12B tokens and shows great performance on downstream tasks. More details can be found in the following [paper](https://arxiv.org/abs/2109.04711): ``` @inproceedings{bai-etal-2021-pre, title = "Pre-train or Annotate? Domain Adaptation with a Constrained Budget", author = "Bai, Fan and Ritter, Alan and Xu, Wei", booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing", month = nov, year = "2021", address = "Online and Punta Cana, Dominican Republic", publisher = "Association for Computational Linguistics", } ``` ## Usage ``` from transformers import * tokenizer = AutoTokenizer.from_pretrained("fbaigt/procbert") model = AutoModelForTokenClassification.from_pretrained("fbaigt/procbert") ``` More usage details can be found [here](https://github.com/bflashcp3f/ProcBERT).