|
Git commit: 53803a3acd9c7e1115233fff458d2226d7fd0c87 |
|
PyTorch CUDA version: 10.2 |
|
Parameter datasets: ['VDJdb', 'PIRD'] |
|
Parameter |
|
Parameter /home//jamesz//projects//model_configs/ |
|
Parameter |
|
Parameter |
|
Parameter |
|
Parameter |
|
Parameter |
|
Parameter |
|
Parameter |
|
Parameter |
|
Filtering |
|
VDJdb: dropping |
|
VDJdb: dropping |
|
PIRD /TRB instances: Counter({: 46483, : 4019, : 637}) |
|
PIRD data 0.1655 data labelled with antigen sequence |
|
PIRD: Removing 95 entires with non amino acid residues |
|
Creating self supervised dataset with 98225 sequences |
|
Maximum sequence length: 45 |
|
Example of tokenized input: CASSQDRGPANEQFF -> [25, 9, 13, 5, 5, 8, 3, 0, 11, 12, 13, 7, 4, 8, 18, 18, 24] |
|
Split test with 9822 examples |
|
Split train with 88403 examples |
|
Loading vanilla BERT model |
|
|