QizhiPei commited on
Commit
9b80546
1 Parent(s): 1cb5181

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -0
README.md ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - QizhiPei/BioT5_finetune_dataset
5
+ language:
6
+ - en
7
+ ---
8
+ ## Example Usage
9
+ ```python
10
+ from transformers import T5Tokenizer, T5ForConditionalGeneration
11
+
12
+ def add_prefix_to_amino_acids(protein_sequence):
13
+ amino_acids = list(protein_sequence)
14
+ prefixed_amino_acids = ['<p>' + aa for aa in amino_acids]
15
+ new_sequence = ''.join(prefixed_amino_acids)
16
+ return new_sequence
17
+
18
+ tokenizer = T5Tokenizer.from_pretrained("QizhiPei/biot5-base-dti-human", model_max_length=512)
19
+ model = T5ForConditionalGeneration.from_pretrained('QizhiPei/biot5-base-dti-human')
20
+
21
+ task_definition = 'Definition: Drug target interaction prediction task (a binary classification task) for the human dataset. If the given molecule and protein can interact with each other, indicate via "Yes". Otherwise, response via "No".\n\n'
22
+ selfies_input = '[C][/C][=C][Branch1][C][\\C][C][=Branch1][C][=O][O]'
23
+ protein_input = 'MQALRVSQALIRSFSSTARNRFQNRVREKQKLFQEDNDIPLYLKGGIVDNILYRVTMTLCLGGTVYSLYSLGWASFPRN'
24
+ protein_input = add_prefix_to_amino_acids(protein_input)
25
+ task_input = f'Now complete the following example -\nInput: Molecule: <bom>{selfies_input}<eom>\nProtein: <bop>{protein_input}<eop>\nOutput: '
26
+
27
+ model_input = task_definition + task_input
28
+ input_ids = tokenizer(model_input, return_tensors="pt").input_ids
29
+
30
+ generation_config = model.generation_config
31
+ generation_config.max_length = 8
32
+ generation_config.num_beams = 1
33
+
34
+ outputs = model.generate(input_ids, generation_config=generation_config)
35
+
36
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
37
+ ```
38
+
39
+ ## References
40
+ For more information, please refer to our paper and GitHub repository.
41
+
42
+ Paper: [BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations](https://arxiv.org/abs/2310.07276)
43
+
44
+ GitHub: [BioT5](https://github.com/QizhiPei/BioT5)
45
+
46
+ Authors: *Qizhi Pei, Wei Zhang, Jinhua Zhu, Kehan Wu, Kaiyuan Gao, Lijun Wu, Yingce Xia, and Rui Yan*