ctheodoris commited on
Commit
e820326
1 Parent(s): badcca6

Update README.md

Browse files

Update model card to include extracting and plotting cell embeddings as available function

Files changed (1) hide show
  1. README.md +35 -3
README.md CHANGED
@@ -16,13 +16,37 @@ We detail applications and results in [our manuscript](https://rdcu.be/ddrx0).
16
 
17
  During pretraining, Geneformer gained a fundamental understanding of network dynamics, encoding network hierarchy in the model’s attention weights in a completely self-supervised manner. Fine-tuning Geneformer towards a diverse panel of downstream tasks relevant to chromatin and network dynamics using limited task-specific data demonstrated that Geneformer consistently boosted predictive accuracy. Applied to disease modeling with limited patient data, Geneformer identified candidate therapeutic targets. Overall, Geneformer represents a pretrained deep learning model from which fine-tuning towards a broad range of downstream applications can be pursued to accelerate discovery of key network regulators and candidate therapeutic targets.
18
 
19
- In [our manuscript](https://rdcu.be/ddrx0), we report results for the 6 layer Geneformer model pretrained on Genecorpus-30M. We additionally provide within this repository a 12 layer Geneformer model, also pretrained on Genecorpus-30M.
20
 
21
  # Application
22
  The pretrained Geneformer model can be used directly for zero-shot learning, for example for in silico perturbation analysis, or by fine-tuning towards the relevant downstream task, such as gene or cell state classification.
23
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  # Installation
25
- In addition to the pretrained model, contained herein are functions for tokenizing and collating data specific to single cell transcriptomics, pretraining the model, and performing in silico pertrubation with either the pretrained or fine-tuned models. To install:
26
 
27
  ```bash
28
  git clone https://huggingface.co/ctheodoris/Geneformer
@@ -30,6 +54,14 @@ cd Geneformer
30
  pip install .
31
  ```
32
 
33
- For usage, see [examples](https://huggingface.co/ctheodoris/Geneformer/tree/main/examples) for tokenizing, pretraining, fine-tuning, hyperparameter tuning, and in silico perturbation. Please note that the fine-tuning examples are meant to be generally applicable and the input datasets and labels will vary dependent on the downstream task. Example input files for a few of the downstream tasks demonstrated in the manuscript are located within the [example_input_files directory](https://huggingface.co/datasets/ctheodoris/Genecorpus-30M/tree/main/example_input_files) in the dataset repository, but these only represent a few example fine-tuning applications.
 
 
 
 
 
 
 
 
34
 
35
  Please note that GPU resources are required for efficient usage of Geneformer. Additionally, we strongly recommend tuning hyperparameters for each downstream fine-tuning application as this can significantly boost predictive potential in the downstream task (e.g. max learning rate, learning schedule, number of layers to freeze, etc.).
 
16
 
17
  During pretraining, Geneformer gained a fundamental understanding of network dynamics, encoding network hierarchy in the model’s attention weights in a completely self-supervised manner. Fine-tuning Geneformer towards a diverse panel of downstream tasks relevant to chromatin and network dynamics using limited task-specific data demonstrated that Geneformer consistently boosted predictive accuracy. Applied to disease modeling with limited patient data, Geneformer identified candidate therapeutic targets. Overall, Geneformer represents a pretrained deep learning model from which fine-tuning towards a broad range of downstream applications can be pursued to accelerate discovery of key network regulators and candidate therapeutic targets.
18
 
19
+ In [our manuscript](https://rdcu.be/ddrx0), we report results for the 6 layer Geneformer model pretrained on Genecorpus-30M. We additionally provide within this repository a 12 layer Geneformer model, scaled up with retained width:depth aspect ratio, also pretrained on Genecorpus-30M.
20
 
21
  # Application
22
  The pretrained Geneformer model can be used directly for zero-shot learning, for example for in silico perturbation analysis, or by fine-tuning towards the relevant downstream task, such as gene or cell state classification.
23
 
24
+ Example applications demonstrated in [our manuscript](https://rdcu.be/ddrx0) include:
25
+
26
+ *Fine-tuning*:
27
+ - transcription factor dosage sensitivity
28
+ - chromatin dynamics (bivalently marked promoters)
29
+ - transcription factor regulatory range
30
+ - gene network centrality
31
+ - transcription factor targets
32
+ - cell type annotation
33
+ - batch integration
34
+ - cell state classification across differentiation
35
+ - disease classification
36
+ - in silico perturbation to determine disease-driving genes
37
+ - in silico treatment to determine candidate therapeutic targets
38
+
39
+ *Zero-shot learning*:
40
+ - batch integration
41
+ - gene context specificity
42
+ - in silico reprogramming
43
+ - in silico differentiation
44
+ - in silico perturbation to determine impact on cell state
45
+ - in silico perturbation to determine transcription factor targets
46
+ - in silico perturbation to determine transcription factor cooperativity
47
+
48
  # Installation
49
+ In addition to the pretrained model, contained herein are functions for tokenizing and collating data specific to single cell transcriptomics, pretraining the model, fine-tuning the model, extracting and plotting cell embeddings, and performing in silico pertrubation with either the pretrained or fine-tuned models. To install:
50
 
51
  ```bash
52
  git clone https://huggingface.co/ctheodoris/Geneformer
 
54
  pip install .
55
  ```
56
 
57
+ For usage, see [examples](https://huggingface.co/ctheodoris/Geneformer/tree/main/examples) for:
58
+ - tokenizing transcriptomes
59
+ - pretraining
60
+ - hyperparameter tuning
61
+ - fine-tuning
62
+ - extracting and plotting cell embeddings
63
+ - in silico perturbation
64
+
65
+ Please note that the fine-tuning examples are meant to be generally applicable and the input datasets and labels will vary dependent on the downstream task. Example input files for a few of the downstream tasks demonstrated in the manuscript are located within the [example_input_files directory](https://huggingface.co/datasets/ctheodoris/Genecorpus-30M/tree/main/example_input_files) in the dataset repository, but these only represent a few example fine-tuning applications.
66
 
67
  Please note that GPU resources are required for efficient usage of Geneformer. Additionally, we strongly recommend tuning hyperparameters for each downstream fine-tuning application as this can significantly boost predictive potential in the downstream task (e.g. max learning rate, learning schedule, number of layers to freeze, etc.).