Christina Theodoris commited on
Commit
cf0d7d4
1 Parent(s): f91f132

add readthedocs link to model card

Browse files
README.md CHANGED
@@ -5,7 +5,8 @@ license: apache-2.0
5
  # Geneformer
6
  Geneformer is a foundation transformer model pretrained on a large-scale corpus of ~30 million single cell transcriptomes to enable context-aware predictions in settings with limited data in network biology.
7
 
8
- See [our manuscript](https://rdcu.be/ddrx0) for details.
 
9
 
10
  # Model Description
11
  Geneformer is a foundation transformer model pretrained on [Genecorpus-30M](https://huggingface.co/datasets/ctheodoris/Genecorpus-30M), a pretraining corpus comprised of ~30 million single cell transcriptomes from a broad range of human tissues. We excluded cells with high mutational burdens (e.g. malignant cells and immortalized cell lines) that could lead to substantial network rewiring without companion genome sequencing to facilitate interpretation. Each single cell’s transcriptome is presented to the model as a rank value encoding where genes are ranked by their expression in that cell normalized by their expression across the entire Genecorpus-30M. The rank value encoding provides a nonparametric representation of that cell’s transcriptome and takes advantage of the many observations of each gene’s expression across Genecorpus-30M to prioritize genes that distinguish cell state. Specifically, this method will deprioritize ubiquitously highly-expressed housekeeping genes by normalizing them to a lower rank. Conversely, genes such as transcription factors that may be lowly expressed when they are expressed but highly distinguish cell state will move to a higher rank within the encoding. Furthermore, this rank-based approach may be more robust against technical artifacts that may systematically bias the absolute transcript counts value while the overall relative ranking of genes within each cell remains more stable.
 
5
  # Geneformer
6
  Geneformer is a foundation transformer model pretrained on a large-scale corpus of ~30 million single cell transcriptomes to enable context-aware predictions in settings with limited data in network biology.
7
 
8
+ - See [our manuscript](https://rdcu.be/ddrx0) for details.
9
+ - See [geneformer.readthedocs.io](https://geneformer.readthedocs.io) for documentation.
10
 
11
  # Model Description
12
  Geneformer is a foundation transformer model pretrained on [Genecorpus-30M](https://huggingface.co/datasets/ctheodoris/Genecorpus-30M), a pretraining corpus comprised of ~30 million single cell transcriptomes from a broad range of human tissues. We excluded cells with high mutational burdens (e.g. malignant cells and immortalized cell lines) that could lead to substantial network rewiring without companion genome sequencing to facilitate interpretation. Each single cell’s transcriptome is presented to the model as a rank value encoding where genes are ranked by their expression in that cell normalized by their expression across the entire Genecorpus-30M. The rank value encoding provides a nonparametric representation of that cell’s transcriptome and takes advantage of the many observations of each gene’s expression across Genecorpus-30M to prioritize genes that distinguish cell state. Specifically, this method will deprioritize ubiquitously highly-expressed housekeeping genes by normalizing them to a lower rank. Conversely, genes such as transcription factors that may be lowly expressed when they are expressed but highly distinguish cell state will move to a higher rank within the encoding. Furthermore, this rank-based approach may be more robust against technical artifacts that may systematically bias the absolute transcript counts value while the overall relative ranking of genes within each cell remains more stable.
docs/source/_static/css/custom.css CHANGED
@@ -26,8 +26,8 @@
26
 
27
  /* class object */
28
  .sig.sig-object {
29
- padding: 5px 5px 5px 8px;
30
- background-color: #e6e6e6;
31
  border-style: solid;
32
  border-color: black;
33
  border-width: 1px 0;
@@ -35,6 +35,6 @@
35
 
36
  /* parameter object */
37
  dt {
38
- padding: 5px 5px 5px 8px;
39
- background-color: #ebebeb;
40
  }
 
26
 
27
  /* class object */
28
  .sig.sig-object {
29
+ padding: 5px 5px 5px 5px;
30
+ background-color: #ececec;
31
  border-style: solid;
32
  border-color: black;
33
  border-width: 1px 0;
 
35
 
36
  /* parameter object */
37
  dt {
38
+ padding: 5px 5px 5px 5px;
39
+ background-color: #ececec;
40
  }
docs/source/_static/gf_logo.png CHANGED
docs/source/about.rst CHANGED
@@ -8,7 +8,7 @@ Model Description
8
 
9
  In `our manuscript <https://rdcu.be/ddrx0>`_, we report results for the 6 layer Geneformer model pretrained on Genecorpus-30M. We additionally provide within the repository a 12 layer Geneformer model, scaled up with retained width:depth aspect ratio, also pretrained on Genecorpus-30M.
10
 
11
- Both the 6 and 12 layer Geneformer models were pretrained in June 2021.
12
 
13
  Application
14
  -----------
 
8
 
9
  In `our manuscript <https://rdcu.be/ddrx0>`_, we report results for the 6 layer Geneformer model pretrained on Genecorpus-30M. We additionally provide within the repository a 12 layer Geneformer model, scaled up with retained width:depth aspect ratio, also pretrained on Genecorpus-30M.
10
 
11
+ Both the `6 <https://huggingface.co/ctheodoris/Geneformer/blob/main/pytorch_model.bin>`_ and `12 <https://huggingface.co/ctheodoris/Geneformer/blob/main/geneformer-12L-30M/pytorch_model.bin>`_ layer Geneformer models were pretrained in June 2021.
12
 
13
  Application
14
  -----------