DanL
/

scientific-challenges-and-directions

@@ -1,89 +1,56 @@
 ---
-license: mit
 tags:
-- generated_from_trainer
-metrics:
-- precision
-- recall
-- f1
-model-index:
-- name: scientific-challenges-and-directions
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# scientific-challenges-and-directions
-This model is a fine-tuned version of [microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext](https://huggingface.co/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext) on the None dataset.
-It achieves the following results on the evaluation set:
-- Loss: 1.1956
-- Precision: 0.7405
-- Recall: 0.6573
-- F1: 0.6964
-- Precision Prob: 0.8163
-- Recall Prob: 0.7080
-- F1 Prob: 0.7583
-- Precision Dir: 0.6167
-- Recall Dir: 0.5692
-- F1 Dir: 0.5920
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 2e-05
-- train_batch_size: 8
-- eval_batch_size: 4
-- seed: 2
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: linear
-- lr_scheduler_warmup_steps: 500
-- num_epochs: 20
-### Training results
-| Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1     | Precision Prob | Recall Prob | F1 Prob | Precision Dir | Recall Dir | F1 Dir |
-|:-------------:|:-----:|:----:|:---------------:|:---------:|:------:|:------:|:--------------:|:-----------:|:-------:|:-------------:|:----------:|:------:|
-| 0.6464        | 1.0   | 144  | 0.5840          | 0.6389    | 0.1292 | 0.2150 | 0.6389         | 0.2035      | 0.3087  | 0.0           | 0.0        | 0.0    |
-| 0.555         | 2.0   | 288  | 0.4280          | 0.776     | 0.5449 | 0.6403 | 0.8072         | 0.5929      | 0.6837  | 0.7143        | 0.4615     | 0.5607 |
-| 0.2925        | 3.0   | 432  | 0.4080          | 0.7302    | 0.7753 | 0.7520 | 0.8142         | 0.8142      | 0.8142  | 0.6053        | 0.7077     | 0.6525 |
-| 0.2313        | 4.0   | 576  | 0.5124          | 0.7929    | 0.6236 | 0.6981 | 0.8367         | 0.7257      | 0.7773  | 0.6905        | 0.4462     | 0.5421 |
-| 0.1277        | 5.0   | 720  | 0.6727          | 0.7326    | 0.7079 | 0.7200 | 0.8119         | 0.7257      | 0.7664  | 0.6197        | 0.6769     | 0.6471 |
-| 0.0916        | 6.0   | 864  | 0.7179          | 0.75      | 0.7079 | 0.7283 | 0.7623         | 0.8230      | 0.7915  | 0.7174        | 0.5077     | 0.5946 |
-| 0.0454        | 7.0   | 1008 | 0.8098          | 0.7578    | 0.6854 | 0.7198 | 0.8526         | 0.7168      | 0.7788  | 0.6212        | 0.6308     | 0.6260 |
-| 0.0234        | 8.0   | 1152 | 0.9168          | 0.7616    | 0.6461 | 0.6991 | 0.8571         | 0.6903      | 0.7647  | 0.6167        | 0.5692     | 0.5920 |
-| 0.0085        | 9.0   | 1296 | 0.9727          | 0.7703    | 0.6404 | 0.6994 | 0.8298         | 0.6903      | 0.7536  | 0.6667        | 0.5538     | 0.6050 |
-| 0.0042        | 10.0  | 1440 | 1.0478          | 0.7484    | 0.6517 | 0.6967 | 0.8182         | 0.7168      | 0.7642  | 0.625         | 0.5385     | 0.5785 |
-| 0.0032        | 11.0  | 1584 | 1.0905          | 0.7484    | 0.6517 | 0.6967 | 0.8229         | 0.6991      | 0.7560  | 0.6271        | 0.5692     | 0.5968 |
-| 0.001         | 12.0  | 1728 | 1.1107          | 0.7312    | 0.6573 | 0.6923 | 0.7864         | 0.7168      | 0.7500  | 0.6316        | 0.5538     | 0.5902 |
-| 0.0009        | 13.0  | 1872 | 1.1301          | 0.7239    | 0.6629 | 0.6921 | 0.7885         | 0.7257      | 0.7558  | 0.6102        | 0.5538     | 0.5806 |
-| 0.0008        | 14.0  | 2016 | 1.1767          | 0.7108    | 0.6629 | 0.6860 | 0.7664         | 0.7257      | 0.7455  | 0.6102        | 0.5538     | 0.5806 |
-| 0.0007        | 15.0  | 2160 | 1.1690          | 0.7284    | 0.6629 | 0.6941 | 0.8163         | 0.7080      | 0.7583  | 0.5938        | 0.5846     | 0.5891 |
-| 0.0012        | 16.0  | 2304 | 1.1943          | 0.7202    | 0.6798 | 0.6994 | 0.7778         | 0.7434      | 0.7602  | 0.6167        | 0.5692     | 0.5920 |
-| 0.0007        | 17.0  | 2448 | 1.1806          | 0.7160    | 0.6798 | 0.6974 | 0.7706         | 0.7434      | 0.7568  | 0.6167        | 0.5692     | 0.5920 |
-| 0.0006        | 18.0  | 2592 | 1.1881          | 0.7273    | 0.6742 | 0.6997 | 0.7905         | 0.7345      | 0.7615  | 0.6167        | 0.5692     | 0.5920 |
-| 0.0055        | 19.0  | 2736 | 1.1952          | 0.7301    | 0.6685 | 0.6979 | 0.7961         | 0.7257      | 0.7593  | 0.6167        | 0.5692     | 0.5920 |
-| 0.0005        | 20.0  | 2880 | 1.1956          | 0.7405    | 0.6573 | 0.6964 | 0.8163         | 0.7080      | 0.7583  | 0.6167        | 0.5692     | 0.5920 |
-### Framework versions
-- Transformers 4.15.0
-- Pytorch 1.10.0+cu111
-- Datasets 1.17.0
-- Tokenizers 0.10.3

 ---
+language:
+- en
 tags:
+- text-classification
+widget:
+- text: "severe atypical cases of pneumonia emerged and quickly spread worldwide.."
+  example_title: "challenge"
+- text: "we speculate that studying IL-6 will be beneficial."
+  example_title: "direction"
+- text: "in future studies, both PRRs should be tested as the cause for multiple deaths."
+  example_title: "both"
+- text: "IbMADS1-transformed potatoes exhibited tuber morphogenesis in the fibrous roots."
+  example_title: "neither"
 ---
+# Scientific challenges and directions
+We present a novel resource to help scientists and medical professionals discover challenges and potential directions across scientific literature, focusing on a broad corpus pertaining to the COVID-19 pandemic and related historical research. At a high level, our labels are defined as follows:
+* **Challenge**: A sentence mentioning a problem, difficulty, flaw, limitation, failure, lack of clarity, or knowledge gap.
+* **Research direction**: A sentence mentioning suggestions or needs for further research, hypotheses, speculations, indications or hints that an issue is worthy of exploration.
+This repository contains a finetuned version of the [PubMedBERT](https://huggingface.co/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext?text=%5BMASK%5D+is+a+tumor+suppressor+gene.) model on the proprietary dataset described in our paper: [A Search Engine for Discovery of Scientific Challenges and Directions](https://arxiv.org/abs/2108.13751). Also, check out [our search engine](https://challenges.apps.allenai.org/)!
+* Please cite our paper if you use our datasets or models in your project. See the [BibTeX](#citation).
+* Feel free to [email us](#contact-us).
+## Annotated datasets and model
+The train, test, and val csvs are can be downloaded from our [repository](https://github.com/Dan-La/scientific-challenges-and-directions) directly, or from the huggingface datasets.
+## Example notebook & Search Engine
+We include an example notebook that uses the model for inference. See `Inference_Notebook.ipynb` in our [repository](https://github.com/Dan-La/scientific-challenges-and-directions).
+## Citation
+If using our dataset and models, please cite:
+```
+@misc{lahav2021search,
+      title={A Search Engine for Discovery of Scientific Challenges and Directions},
+      author={Dan Lahav and Jon Saad Falcon and Bailey Kuehl and Sophie Johnson and Sravanthi Parasa and Noam Shomron and Duen Horng Chau and Diyi Yang and Eric Horvitz and Daniel S. Weld and Tom Hope},
+      year={2021},
+      eprint={2108.13751},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL}
+}
+```
+## Contact us
+Please don't hesitate to reach out.
+**Email:** `lahav@mail.tau.ac.il`,`tomh@allenai.org`.