arazd
/

MIReAD

@@ -13,8 +13,8 @@ widget:
 - text: "Tissue-based diagnostics and research is incessantly evolving with the development of new molecular tools. It has long been realized that immunohistochemistry can add an important new level of information on top of morphology and that protein expression patterns in a cancer may yield crucial diagnostic and prognostic information. We have generated an immunohistochemistry-based map of protein expression profiles in normal tissues, cancer and cell lines."
   example_title: "Journal prediction"
 ---
-This is the finetuned model presented in **MIReAD: a simple method for learning high-quality representations from
-scientific documents (ACL 2023)**.
 We trained MIReAD on >500,000 PubMed and arXiv abstracts across over 2,000 journal classes. MIReAD was initialized with SciBERT weights and finetuned to predict journal class based on the abstract and title of the paper. MIReAD uses SciBERT's tokenizer.
@@ -52,7 +52,18 @@ logits = out.logits
 out = model.bert(**inputs)
 # IMPORTANT: use [CLS] token representation as document-level representation (hence, 0th idx)
 feature = out.last_hidden_state[:, 0, :]
 ```
 You can access our PubMed and arXiv abstracts and journal labels data here: [https://huggingface.co/datasets/brainchalov/pubmed_arxiv_abstracts_data](https://huggingface.co/datasets/brainchalov/pubmed_arxiv_abstracts_data).

 - text: "Tissue-based diagnostics and research is incessantly evolving with the development of new molecular tools. It has long been realized that immunohistochemistry can add an important new level of information on top of morphology and that protein expression patterns in a cancer may yield crucial diagnostic and prognostic information. We have generated an immunohistochemistry-based map of protein expression profiles in normal tissues, cancer and cell lines."
   example_title: "Journal prediction"
 ---
+This is the finetuned model presented in **[https://arxiv.org/abs/2305.04177](MIReAD: simple method for learning high-quality representations from
+scientific documents (ACL 2023))**.
 We trained MIReAD on >500,000 PubMed and arXiv abstracts across over 2,000 journal classes. MIReAD was initialized with SciBERT weights and finetuned to predict journal class based on the abstract and title of the paper. MIReAD uses SciBERT's tokenizer.
 out = model.bert(**inputs)
 # IMPORTANT: use [CLS] token representation as document-level representation (hence, 0th idx)
 feature = out.last_hidden_state[:, 0, :]
 ```
 You can access our PubMed and arXiv abstracts and journal labels data here: [https://huggingface.co/datasets/brainchalov/pubmed_arxiv_abstracts_data](https://huggingface.co/datasets/brainchalov/pubmed_arxiv_abstracts_data).
+Check our GitHub repo here: [https://github.com/arazd/miread](https://github.com/arazd/miread)
+To cite this work:
+```bibtex
+@inproceedings{razdaibiedina2023miread,
+   title={MIReAD: simple method for learning high-quality representations from scientific documents},
+   author={Razdaibiedina, Anastasia and Brechalov, Alexander},
+   booktitle={Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics},
+   year={2023}
+}
+```