DanL commited on
Commit
cd2d839
1 Parent(s): 6698e0a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -71
README.md CHANGED
@@ -1,89 +1,56 @@
1
  ---
2
- license: mit
 
3
  tags:
4
- - generated_from_trainer
5
- metrics:
6
- - precision
7
- - recall
8
- - f1
9
- model-index:
10
- - name: scientific-challenges-and-directions
11
- results: []
 
 
12
  ---
13
 
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
 
17
- # scientific-challenges-and-directions
18
 
19
- This model is a fine-tuned version of [microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext](https://huggingface.co/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext) on the None dataset.
20
- It achieves the following results on the evaluation set:
21
- - Loss: 1.1956
22
- - Precision: 0.7405
23
- - Recall: 0.6573
24
- - F1: 0.6964
25
- - Precision Prob: 0.8163
26
- - Recall Prob: 0.7080
27
- - F1 Prob: 0.7583
28
- - Precision Dir: 0.6167
29
- - Recall Dir: 0.5692
30
- - F1 Dir: 0.5920
31
 
32
- ## Model description
33
 
34
- More information needed
35
 
36
- ## Intended uses & limitations
 
37
 
38
- More information needed
 
39
 
40
- ## Training and evaluation data
 
41
 
42
- More information needed
43
 
44
- ## Training procedure
45
 
46
- ### Training hyperparameters
47
 
48
- The following hyperparameters were used during training:
49
- - learning_rate: 2e-05
50
- - train_batch_size: 8
51
- - eval_batch_size: 4
52
- - seed: 2
53
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
54
- - lr_scheduler_type: linear
55
- - lr_scheduler_warmup_steps: 500
56
- - num_epochs: 20
 
57
 
58
- ### Training results
59
 
60
- | Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Precision Prob | Recall Prob | F1 Prob | Precision Dir | Recall Dir | F1 Dir |
61
- |:-------------:|:-----:|:----:|:---------------:|:---------:|:------:|:------:|:--------------:|:-----------:|:-------:|:-------------:|:----------:|:------:|
62
- | 0.6464 | 1.0 | 144 | 0.5840 | 0.6389 | 0.1292 | 0.2150 | 0.6389 | 0.2035 | 0.3087 | 0.0 | 0.0 | 0.0 |
63
- | 0.555 | 2.0 | 288 | 0.4280 | 0.776 | 0.5449 | 0.6403 | 0.8072 | 0.5929 | 0.6837 | 0.7143 | 0.4615 | 0.5607 |
64
- | 0.2925 | 3.0 | 432 | 0.4080 | 0.7302 | 0.7753 | 0.7520 | 0.8142 | 0.8142 | 0.8142 | 0.6053 | 0.7077 | 0.6525 |
65
- | 0.2313 | 4.0 | 576 | 0.5124 | 0.7929 | 0.6236 | 0.6981 | 0.8367 | 0.7257 | 0.7773 | 0.6905 | 0.4462 | 0.5421 |
66
- | 0.1277 | 5.0 | 720 | 0.6727 | 0.7326 | 0.7079 | 0.7200 | 0.8119 | 0.7257 | 0.7664 | 0.6197 | 0.6769 | 0.6471 |
67
- | 0.0916 | 6.0 | 864 | 0.7179 | 0.75 | 0.7079 | 0.7283 | 0.7623 | 0.8230 | 0.7915 | 0.7174 | 0.5077 | 0.5946 |
68
- | 0.0454 | 7.0 | 1008 | 0.8098 | 0.7578 | 0.6854 | 0.7198 | 0.8526 | 0.7168 | 0.7788 | 0.6212 | 0.6308 | 0.6260 |
69
- | 0.0234 | 8.0 | 1152 | 0.9168 | 0.7616 | 0.6461 | 0.6991 | 0.8571 | 0.6903 | 0.7647 | 0.6167 | 0.5692 | 0.5920 |
70
- | 0.0085 | 9.0 | 1296 | 0.9727 | 0.7703 | 0.6404 | 0.6994 | 0.8298 | 0.6903 | 0.7536 | 0.6667 | 0.5538 | 0.6050 |
71
- | 0.0042 | 10.0 | 1440 | 1.0478 | 0.7484 | 0.6517 | 0.6967 | 0.8182 | 0.7168 | 0.7642 | 0.625 | 0.5385 | 0.5785 |
72
- | 0.0032 | 11.0 | 1584 | 1.0905 | 0.7484 | 0.6517 | 0.6967 | 0.8229 | 0.6991 | 0.7560 | 0.6271 | 0.5692 | 0.5968 |
73
- | 0.001 | 12.0 | 1728 | 1.1107 | 0.7312 | 0.6573 | 0.6923 | 0.7864 | 0.7168 | 0.7500 | 0.6316 | 0.5538 | 0.5902 |
74
- | 0.0009 | 13.0 | 1872 | 1.1301 | 0.7239 | 0.6629 | 0.6921 | 0.7885 | 0.7257 | 0.7558 | 0.6102 | 0.5538 | 0.5806 |
75
- | 0.0008 | 14.0 | 2016 | 1.1767 | 0.7108 | 0.6629 | 0.6860 | 0.7664 | 0.7257 | 0.7455 | 0.6102 | 0.5538 | 0.5806 |
76
- | 0.0007 | 15.0 | 2160 | 1.1690 | 0.7284 | 0.6629 | 0.6941 | 0.8163 | 0.7080 | 0.7583 | 0.5938 | 0.5846 | 0.5891 |
77
- | 0.0012 | 16.0 | 2304 | 1.1943 | 0.7202 | 0.6798 | 0.6994 | 0.7778 | 0.7434 | 0.7602 | 0.6167 | 0.5692 | 0.5920 |
78
- | 0.0007 | 17.0 | 2448 | 1.1806 | 0.7160 | 0.6798 | 0.6974 | 0.7706 | 0.7434 | 0.7568 | 0.6167 | 0.5692 | 0.5920 |
79
- | 0.0006 | 18.0 | 2592 | 1.1881 | 0.7273 | 0.6742 | 0.6997 | 0.7905 | 0.7345 | 0.7615 | 0.6167 | 0.5692 | 0.5920 |
80
- | 0.0055 | 19.0 | 2736 | 1.1952 | 0.7301 | 0.6685 | 0.6979 | 0.7961 | 0.7257 | 0.7593 | 0.6167 | 0.5692 | 0.5920 |
81
- | 0.0005 | 20.0 | 2880 | 1.1956 | 0.7405 | 0.6573 | 0.6964 | 0.8163 | 0.7080 | 0.7583 | 0.6167 | 0.5692 | 0.5920 |
82
 
83
-
84
- ### Framework versions
85
-
86
- - Transformers 4.15.0
87
- - Pytorch 1.10.0+cu111
88
- - Datasets 1.17.0
89
- - Tokenizers 0.10.3
 
1
  ---
2
+ language:
3
+ - en
4
  tags:
5
+ - text-classification
6
+ widget:
7
+ - text: "severe atypical cases of pneumonia emerged and quickly spread worldwide.."
8
+ example_title: "challenge"
9
+ - text: "we speculate that studying IL-6 will be beneficial."
10
+ example_title: "direction"
11
+ - text: "in future studies, both PRRs should be tested as the cause for multiple deaths."
12
+ example_title: "both"
13
+ - text: "IbMADS1-transformed potatoes exhibited tuber morphogenesis in the fibrous roots."
14
+ example_title: "neither"
15
  ---
16
 
17
+ # Scientific challenges and directions
 
18
 
19
+ We present a novel resource to help scientists and medical professionals discover challenges and potential directions across scientific literature, focusing on a broad corpus pertaining to the COVID-19 pandemic and related historical research. At a high level, our labels are defined as follows:
20
 
21
+ * **Challenge**: A sentence mentioning a problem, difficulty, flaw, limitation, failure, lack of clarity, or knowledge gap.
22
+ * **Research direction**: A sentence mentioning suggestions or needs for further research, hypotheses, speculations, indications or hints that an issue is worthy of exploration.
 
 
 
 
 
 
 
 
 
 
23
 
24
+ This repository contains a finetuned version of the [PubMedBERT](https://huggingface.co/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext?text=%5BMASK%5D+is+a+tumor+suppressor+gene.) model on the proprietary dataset described in our paper: [A Search Engine for Discovery of Scientific Challenges and Directions](https://arxiv.org/abs/2108.13751). Also, check out [our search engine](https://challenges.apps.allenai.org/)!
25
 
 
26
 
27
+ * Please cite our paper if you use our datasets or models in your project. See the [BibTeX](#citation).
28
+ * Feel free to [email us](#contact-us).
29
 
30
+ ## Annotated datasets and model
31
+ The train, test, and val csvs are can be downloaded from our [repository](https://github.com/Dan-La/scientific-challenges-and-directions) directly, or from the huggingface datasets.
32
 
33
+ ## Example notebook & Search Engine
34
+ We include an example notebook that uses the model for inference. See `Inference_Notebook.ipynb` in our [repository](https://github.com/Dan-La/scientific-challenges-and-directions).
35
 
 
36
 
37
+ ## Citation
38
 
39
+ If using our dataset and models, please cite:
40
 
41
+ ```
42
+ @misc{lahav2021search,
43
+ title={A Search Engine for Discovery of Scientific Challenges and Directions},
44
+ author={Dan Lahav and Jon Saad Falcon and Bailey Kuehl and Sophie Johnson and Sravanthi Parasa and Noam Shomron and Duen Horng Chau and Diyi Yang and Eric Horvitz and Daniel S. Weld and Tom Hope},
45
+ year={2021},
46
+ eprint={2108.13751},
47
+ archivePrefix={arXiv},
48
+ primaryClass={cs.CL}
49
+ }
50
+ ```
51
 
52
+ ## Contact us
53
 
54
+ Please don't hesitate to reach out.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
 
56
+ **Email:** `lahav@mail.tau.ac.il`,`tomh@allenai.org`.