SkywalkerLu commited on
Commit
fcb20a6
1 Parent(s): 6799faa

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +97 -1
README.md CHANGED
@@ -1,3 +1,99 @@
1
  ---
2
- license: apache-2.0
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ tags:
3
+ - protein language model
4
+ datasets:
5
+ - IEDB
6
  ---
7
+
8
+ # TransHLA model
9
+
10
+ `TransHLA` is a tool designed to discern whether a peptide will be recognized by HLA as an epitope.`TransHLA` is the first tool capable of directly identifying peptides as epitopes without the need for inputting HLA alleles. Due the different length of epitopes, we trained two models. The first is TransHLA_I, which is used for the detection of the HLA-I epitope, the other is TransHLA_II, which is used for the detection of the HLA-II epitope.
11
+
12
+
13
+ ## Model description
14
+ `TransHLA` is a hybrid transformer model that utilizes a transformer encoder module and a deep CNN module. It is trained using pretrained sequence embeddings from `ESM2` and contact map structural features as inputs. It can serve as a preliminary screening for the currently popular tools that are specific for HLA-epitope binding affinity.
15
+
16
+ ## Intended uses
17
+
18
+ Due to variations in peptide lengths, our TransHLA is divided into TransHLA_I and TransHLA_II, which are used to separately identify epitopes presented by HLA class I and class II molecules, respectively. Specifically, TransHLA_I is designed for shorter peptides ranging from 8 to 14 amino acids in length, while TransHLA_II targets longer peptides with lengths of 13 to 21 amino acids. The output consists of two parts. The first output indicates whether the peptide is an epitope, presented in a two-column format where each row contains two numbers that sum to 1, representing probabilities. If the number in the second column is greater than or equal to 0.5, the peptide is classified as an epitope; otherwise, it is considered a normal peptide.
19
+ The second output is the sequence embedding generated by the model.
20
+ For both models, we have written separate tutorials in this file to facilitate ease of use.
21
+
22
+ ### How to use
23
+ First, users need download these packages, `pytorch`, ` fair-esm`, `transformers`
24
+ ```
25
+ pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
26
+ pip install transformers
27
+ pip install fair-esm
28
+ ```
29
+ Here is how to use TransHLA_I model to predict the peptide whether epitope:
30
+
31
+ ```python
32
+ from transformers import AutoTokenizer
33
+ from transformers import AutoModel
34
+ import torch
35
+
36
+
37
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
38
+
39
+ print(f"Using {device} device")
40
+ def pad_inner_lists_to_length(outer_list,target_length=16):
41
+ for inner_list in outer_list:
42
+ padding_length = target_length - len(inner_list)
43
+ if padding_length > 0:
44
+ inner_list.extend([1] * padding_length)
45
+ return outer_list
46
+
47
+
48
+ tokenizer = AutoTokenizer.from_pretrained("facebook/esm2_t33_650M_UR50D")
49
+ model = AutoModel.from_pretrained("SkywalkerLu/TransHLA_I", trust_remote_code=True)
50
+
51
+
52
+
53
+ model.to(device)
54
+ peptide_examples = ['EDSAIVTPSR','SVWEPAKAKYVFR']
55
+ peptide_encoding = tokenizer(peptide_examples)['input_ids']
56
+ peptide_encoding = pad_inner_lists_to_length(peptide_encoding)
57
+ print(peptide_encoding)
58
+
59
+ peptide_encoding = torch.tensor(peptide_encoding)
60
+
61
+ outputs,representations = model(peptide_encoding.to(device))
62
+ print(outputs)
63
+ print(representations)
64
+ ```
65
+ And here is how to use TransHLA_II model to predict the peptide whether epitope:
66
+
67
+ ```python
68
+ from transformers import AutoTokenizer
69
+ from transformers import AutoModel
70
+ import torch
71
+
72
+
73
+
74
+
75
+ def pad_inner_lists_to_length(outer_list,target_length=23):
76
+ for inner_list in outer_list:
77
+ padding_length = target_length - len(inner_list)
78
+ if padding_length > 0:
79
+ inner_list.extend([1] * padding_length)
80
+ return outer_list
81
+
82
+
83
+
84
+ if __name__ == "__main__":
85
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
86
+ print(f"Using {device} device")
87
+ tokenizer = AutoTokenizer.from_pretrained("facebook/esm2_t33_650M_UR50D")
88
+ model = AutoModel.from_pretrained("SkywalkerLu/TransHLA_II", trust_remote_code=True)
89
+ model.to(device)
90
+ model.eval()
91
+ peptide_examples = ['KMIYSYSSHAASSL','ARGDFFRATSRLTTDFG']
92
+ peptide_encoding = tokenizer(peptide_examples)['input_ids']
93
+ peptide_encoding = pad_inner_lists_to_length(peptide_encoding)
94
+ peptide_encoding = torch.tensor(peptide_encoding)
95
+ outputs,representations = model(peptide_encoding.to(device))
96
+ print(outputs)
97
+ print(representations)
98
+
99
+ ```