dkarthikeyan1
/

tcrt5_ft_tcrdb

@@ -20,14 +20,14 @@ It is released along with [this paper](google.com).
 ## Intended uses & limitations
-This model is designed for auto-regressively generating CDR3$\beta$ sequences against a pMHC of interest.
 This means that the model assumes a plausible pMHC is provided as input. We have not tested the model on peptides and MHC sequences
 where the binding affinity between petpide-MHC is low and do not expect the model will adjust its predictions around this.
 This model is intended for academic purposes and should not be used in a clinical setting.
 ### How to use
-You can use this model directly for conditional CDR3$\b$ generation:
 ```python
 import re
@@ -61,7 +61,7 @@ cdr3b_sequences = [re.sub(r'\[.*\]', '', x) for x in tokenizer.batch_decode(mode
  'CASSLGTGGNQPQHF']
 ```
-This model can also be used for unconditional generation of CDR3$\beta$ sequences:
 ```python
 import re
@@ -115,8 +115,8 @@ corpus of ~330k TCR:peptide-pseudosequence pairs taken from [VDJdb](https://vdjd
 ### Preprocessing
-All amino acid sequences, and V/J gene names were standardized using the \texttt{`tidytcells'} package. See [here](https://pmc.ncbi.nlm.nih.gov/articles/PMC10634431/). MHC
-allele information was standardized using \texttt{`mhcgnomes'}, available [here](https://pypi.org/project/mhcgnomes/) before mapping allele information to the MHC pseudo-sequence
 as defined in [NetMHCpan](https://pmc.ncbi.nlm.nih.gov/articles/PMC3319061/).
 ### Pre-training
@@ -150,12 +150,8 @@ Masks 'mlm_probability' tokens grouped into spans of size 'max_span_length' acco
 ### Finetuning
-TCRT5 was finetuned on peptide-pseudo sequence -> CDR3$\beta$ source:target pairs using the canonical cross entropy loss:
-$$
-\mathcal{L} = CE(\textbf{y}, \hat{\textbf{y}}) & = - \sum_{i=1}^n \textbf{y}_i \log \hat{\textbf{y}}_i
-= - \sum_{i=1}^n \sum_{j-1}^k y_{ij} \log p_\theta (y_{ij} | \textbf{x})
-$$
 ```
     Example Input:
@@ -171,7 +167,7 @@ $$
 ## Results
-This fine-tuned model achieves the following results on conditional CDR3$\beta$ generation on our validation set of the top-20 peptide-MHCs with the most abundant known TCRs (in alphabetical order):
 1. AVFDRKSDAK_A*11:01
 2. CRVRLCCYVL_C*07:02

 ## Intended uses & limitations
+This model is designed for auto-regressively generating CDR3 \\(\beta\\) sequences against a pMHC of interest.
 This means that the model assumes a plausible pMHC is provided as input. We have not tested the model on peptides and MHC sequences
 where the binding affinity between petpide-MHC is low and do not expect the model will adjust its predictions around this.
 This model is intended for academic purposes and should not be used in a clinical setting.
 ### How to use
+You can use this model directly for conditional CDR3 \\(\beta\\) generation:
 ```python
 import re
  'CASSLGTGGNQPQHF']
 ```
+This model can also be used for unconditional generation of CDR3 \\(\beta\\) sequences:
 ```python
 import re
 ### Preprocessing
+All amino acid sequences, and V/J gene names were standardized using the `tidytcells` package. See [here](https://pmc.ncbi.nlm.nih.gov/articles/PMC10634431/). MHC
+allele information was standardized using `mhcgnomes`, available [here](https://pypi.org/project/mhcgnomes/) before mapping allele information to the MHC pseudo-sequence
 as defined in [NetMHCpan](https://pmc.ncbi.nlm.nih.gov/articles/PMC3319061/).
 ### Pre-training
 ### Finetuning
+TCRT5 was finetuned on peptide-pseudo sequence -> CDR3 \\(\beta\\) source:target pairs using the canonical cross entropy loss.
 ```
     Example Input:
 ## Results
+This fine-tuned model achieves the following results on conditional CDR3 \\(\beta\\) generation on our validation set of the top-20 peptide-MHCs with the most abundant known TCRs (in alphabetical order):
 1. AVFDRKSDAK_A*11:01
 2. CRVRLCCYVL_C*07:02