canergen commited on
Commit
8454617
·
verified ·
1 Parent(s): e3e563c

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +319 -0
README.md ADDED
@@ -0,0 +1,319 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: scvi-tools
3
+ license: cc-by-4.0
4
+ tags:
5
+ - biology
6
+ - genomics
7
+ - single-cell
8
+ - model_cls_name:TOTALVI
9
+ - scvi_version:1.2.0
10
+ - anndata_version:0.11.1
11
+ - modality:rna
12
+ - modality:protein
13
+ - tissue:thymus
14
+ - annotated:True
15
+ ---
16
+
17
+
18
+ TotalVI is a variational inference model for single-cell RNA-seq as well as protein data that can
19
+ learn an underlying latent space, integrate technical batches, impute dropouts,
20
+ and predict protein expression given gene expression or missing protein data given gene expression
21
+ and protein data for a subset of proteins.
22
+ The learned low-dimensional latent representation of the data can be used for visualization and
23
+ clustering.
24
+
25
+ TotalVI takes as input a scRNA-seq gene expression and protein expression matrix with cells and
26
+ genes.
27
+ We provide an extensive [user guide](https://docs.scvi-tools.org/en/1.2.0/user_guide/models/totalvi.html).
28
+
29
+ - See our original manuscript for further details of the model:
30
+ [TotalVI manuscript](https://www.nature.com/articles/s41592-020-01050-x).
31
+ - See our manuscript on [scvi-hub](https://www.biorxiv.org/content/10.1101/2024.03.01.582887v2)
32
+ how to leverage pre-trained models.
33
+
34
+ This model can be used for fine tuning on new data using our Arches framework:
35
+ [Arches tutorial](https://docs.scvi-tools.org/en/1.0.0/tutorials/notebooks/scarches_scvi_tools.html).
36
+
37
+
38
+ # Model Description
39
+
40
+ CITE-seq to measure RNA and surface proteins in thymocytes from wild-type and T cell lineage-restricted mice to generate a comprehensive timeline of cell state for each T cell lineage.
41
+
42
+ # Metrics
43
+
44
+ We provide here key performance metrics for the uploaded model, if provided by the data uploader.
45
+
46
+ <details>
47
+ <summary><strong>Coefficient of variation</strong></summary>
48
+
49
+ The cell-wise coefficient of variation summarizes how well variation between different cells is
50
+ preserved by the generated model expression. Below a squared Pearson correlation coefficient of 0.4
51
+ , we would recommend not to use generated data for downstream analysis, while the generated latent
52
+ space might still be useful for analysis.
53
+
54
+ **Cell-wise Coefficient of Variation**:
55
+
56
+ Modality: rna
57
+
58
+ | Metric | Training Value | Validation Value |
59
+ |-------------------------|----------------|------------------|
60
+ | Mean Absolute Error | 0.57 | 0.57 |
61
+ | Pearson Correlation | 0.76 | 0.75 |
62
+ | Spearman Correlation | 0.83 | 0.83 |
63
+ | R² (R-Squared) | -0.10 | -0.09 |
64
+
65
+ Modality: protein
66
+
67
+ | Metric | Training Value | Validation Value |
68
+ |-------------------------|----------------|------------------|
69
+ | Mean Absolute Error | 0.32 | 0.32 |
70
+ | Pearson Correlation | 0.53 | 0.53 |
71
+ | Spearman Correlation | 0.78 | 0.78 |
72
+ | R² (R-Squared) | -1.45 | -1.43 |
73
+
74
+
75
+
76
+ The gene-wise coefficient of variation summarizes how well variation between different genes is
77
+ preserved by the generated model expression. This value is usually quite high.
78
+
79
+ **Gene-wise Coefficient of Variation**:
80
+
81
+ Modality: rna
82
+
83
+ | Metric | Training Value |
84
+ |-------------------------|----------------|
85
+ | Mean Absolute Error | 26.96 |
86
+ | Pearson Correlation | 0.95 |
87
+ | Spearman Correlation | 0.99 |
88
+ | R² (R-Squared) | -0.25 |
89
+
90
+ Modality: protein
91
+
92
+ | Metric | Training Value |
93
+ |-------------------------|----------------|
94
+ | Mean Absolute Error | 4.29 |
95
+ | Pearson Correlation | 0.42 |
96
+ | Spearman Correlation | 0.73 |
97
+ | R² (R-Squared) | -6.32 |
98
+
99
+
100
+
101
+ </details>
102
+
103
+ <details>
104
+ <summary><strong>Differential expression metric</strong></summary>
105
+
106
+ The differential expression metric provides a summary of the differential expression analysis
107
+ between cell types or input clusters. We provide here the F1-score, Pearson Correlation
108
+ Coefficient of Log-Foldchanges, Spearman Correlation Coefficient, and Area Under the Precision
109
+ Recall Curve (AUPRC) for the differential expression analysis using Wilcoxon Rank Sum test for each
110
+ cell-type.
111
+
112
+ **Differential expression**:
113
+
114
+ Modality: rna
115
+
116
+ | Index | gene_f1 | lfc_mae | lfc_pearson | lfc_spearman | roc_auc | pr_auc | n_cells |
117
+ | --- | --- | --- | --- | --- | --- | --- | --- |
118
+ | CD14-positive monocyte | 0.94 | 2.13 | 0.59 | 0.91 | 0.09 | 0.02 | 120843.00 |
119
+ | CD16-positive, CD56-dim natural killer cell, human | 0.95 | 2.34 | 0.46 | 0.83 | 0.09 | 0.02 | 92848.00 |
120
+ | naive thymus-derived CD4-positive, alpha-beta T cell | 0.90 | 2.75 | 0.39 | 0.76 | 0.09 | 0.02 | 63096.00 |
121
+ | effector CD8-positive, alpha-beta T cell | 0.87 | 3.49 | 0.39 | 0.73 | 0.07 | 0.02 | 53534.00 |
122
+ | central memory CD4-positive, alpha-beta T cell | 0.94 | 2.52 | 0.38 | 0.75 | 0.06 | 0.02 | 49904.00 |
123
+ | naive B cell | 0.93 | 3.21 | 0.43 | 0.73 | 0.08 | 0.02 | 44136.00 |
124
+ | naive thymus-derived CD8-positive, alpha-beta T cell | 0.92 | 3.47 | 0.39 | 0.65 | 0.06 | 0.02 | 31175.00 |
125
+ | mature NK T cell | 0.91 | 3.70 | 0.42 | 0.63 | 0.04 | 0.01 | 21673.00 |
126
+ | effector memory CD8-positive, alpha-beta T cell | 0.83 | 4.48 | 0.36 | 0.55 | 0.07 | 0.02 | 18917.00 |
127
+ | T-helper 22 cell | 0.90 | 3.98 | 0.42 | 0.61 | 0.06 | 0.02 | 18379.00 |
128
+ | gamma-delta T cell | 0.86 | 4.59 | 0.38 | 0.50 | 0.05 | 0.01 | 15942.00 |
129
+ | platelet | 0.89 | 4.43 | 0.51 | 0.66 | 0.06 | 0.02 | 15847.00 |
130
+ | T follicular helper cell | 0.93 | 4.43 | 0.42 | 0.54 | 0.06 | 0.02 | 13608.00 |
131
+ | mucosal invariant T cell | 0.87 | 5.03 | 0.40 | 0.45 | 0.05 | 0.02 | 10992.00 |
132
+ | CD16-negative, CD56-bright natural killer cell, human | 0.85 | 5.21 | 0.40 | 0.46 | 0.05 | 0.02 | 10442.00 |
133
+ | class switched memory B cell | 0.89 | 5.24 | 0.44 | 0.49 | 0.08 | 0.02 | 7244.00 |
134
+ | immature B cell | 0.89 | 5.50 | 0.46 | 0.46 | 0.09 | 0.02 | 5238.00 |
135
+ | natural killer cell | 0.89 | 5.36 | 0.44 | 0.43 | 0.09 | 0.02 | 4963.00 |
136
+ | plasmacytoid dendritic cell | 0.91 | 5.18 | 0.46 | 0.46 | 0.05 | 0.02 | 4612.00 |
137
+ | CD14-low, CD16-positive monocyte | 0.91 | 4.32 | 0.58 | 0.59 | 0.10 | 0.02 | 4140.00 |
138
+ | plasmablast | 0.82 | 5.05 | 0.51 | 0.54 | 0.10 | 0.02 | 4121.00 |
139
+ | IgG plasma cell | 0.69 | 5.20 | 0.49 | 0.51 | 0.14 | 0.01 | 3527.00 |
140
+ | dendritic cell, human | 0.84 | 5.18 | 0.43 | 0.43 | 0.62 | 0.20 | 3357.00 |
141
+ | unswitched memory B cell | 0.92 | 5.41 | 0.46 | 0.43 | 0.18 | 0.02 | 3285.00 |
142
+ | myeloid dendritic cell | 0.82 | 5.29 | 0.49 | 0.47 | 0.14 | 0.02 | 3243.00 |
143
+ | B cell | 0.86 | 5.34 | 0.48 | 0.44 | 0.20 | 0.02 | 3024.00 |
144
+ | IgA plasma cell | 0.70 | 5.21 | 0.50 | 0.49 | 0.13 | 0.02 | 2699.00 |
145
+ | effector memory CD4-positive, alpha-beta T cell | 0.89 | 5.33 | 0.46 | 0.38 | 0.13 | 0.02 | 2634.00 |
146
+ | malignant cell | 0.93 | 5.74 | 0.44 | 0.40 | 0.32 | 0.02 | 2291.00 |
147
+ | CD34-positive, CD38-negative hematopoietic stem cell | 0.78 | 5.51 | 0.48 | 0.49 | 0.11 | 0.02 | 2238.00 |
148
+ | erythrocyte | 0.79 | 5.67 | 0.40 | 0.25 | 0.62 | 0.33 | 2232.00 |
149
+ | CD8-positive, alpha-beta T cell | 0.84 | 5.01 | 0.51 | 0.43 | 0.35 | 0.03 | 1355.00 |
150
+ | IgM plasma cell | 0.78 | 4.62 | 0.54 | 0.51 | 0.28 | 0.02 | 1163.00 |
151
+ | ILC1, human | 0.81 | 4.58 | 0.52 | 0.45 | 0.39 | 0.03 | 776.00 |
152
+ | erythroid progenitor cell, mammalian | 0.71 | 5.58 | 0.50 | 0.46 | 0.34 | 0.02 | 773.00 |
153
+ | monocyte | 0.88 | 4.17 | 0.57 | 0.51 | 0.37 | 0.02 | 649.00 |
154
+ | CD4-positive, alpha-beta T cell | 0.82 | 4.63 | 0.54 | 0.49 | 0.33 | 0.02 | 624.00 |
155
+ | dendritic cell | 0.76 | 5.16 | 0.50 | 0.42 | 0.45 | 0.03 | 585.00 |
156
+ | T-helper 1 cell | 0.83 | 3.91 | 0.58 | 0.54 | 0.29 | 0.02 | 481.00 |
157
+ | regulatory T cell | 0.76 | 3.91 | 0.56 | 0.53 | 0.32 | 0.02 | 329.00 |
158
+ | hematopoietic precursor cell | 0.70 | 4.55 | 0.58 | 0.54 | 0.26 | 0.02 | 180.00 |
159
+ | group 2 innate lymphoid cell, human | 0.65 | 3.00 | 0.59 | 0.63 | 0.27 | 0.02 | 93.00 |
160
+ | T-helper 2 cell | 0.67 | 2.79 | 0.61 | 0.67 | 0.19 | 0.02 | 55.00 |
161
+ | myeloid lineage restricted progenitor cell | 0.53 | 3.95 | 0.52 | 0.56 | 0.32 | 0.02 | 53.00 |
162
+ | megakaryocyte | 0.70 | 3.62 | 0.62 | 0.61 | 0.30 | 0.02 | 53.00 |
163
+ | T-helper 17 cell | 0.48 | 2.54 | 0.53 | 0.69 | 0.21 | 0.02 | 13.00 |
164
+
165
+ Modality: protein
166
+
167
+ | Index | gene_f1 | lfc_mae | lfc_pearson | lfc_spearman | roc_auc | pr_auc | n_cells |
168
+ | --- | --- | --- | --- | --- | --- | --- | --- |
169
+ | CD14-positive monocyte | 0.95 | 0.07 | 1.00 | 0.99 | 0.26 | 0.12 | 120843.00 |
170
+ | CD16-positive, CD56-dim natural killer cell, human | 0.95 | 0.06 | 0.99 | 0.98 | 0.22 | 0.12 | 92848.00 |
171
+ | naive thymus-derived CD4-positive, alpha-beta T cell | 0.84 | 0.06 | 0.99 | 0.98 | 0.37 | 0.13 | 63096.00 |
172
+ | effector CD8-positive, alpha-beta T cell | 0.95 | 0.05 | 0.99 | 0.97 | 0.17 | 0.09 | 53534.00 |
173
+ | central memory CD4-positive, alpha-beta T cell | 1.00 | 0.06 | 1.00 | 0.99 | 0.33 | 0.12 | 49904.00 |
174
+ | naive B cell | 1.00 | 0.08 | 1.00 | 0.96 | 0.20 | 0.12 | 44136.00 |
175
+ | naive thymus-derived CD8-positive, alpha-beta T cell | 0.95 | 0.07 | 0.99 | 0.95 | 0.13 | 0.08 | 31175.00 |
176
+ | mature NK T cell | 0.89 | 0.06 | 0.99 | 0.98 | 0.21 | 0.10 | 21673.00 |
177
+ | effector memory CD8-positive, alpha-beta T cell | 0.84 | 0.05 | 0.99 | 0.98 | 0.06 | 0.11 | 18917.00 |
178
+ | T-helper 22 cell | 0.95 | 0.06 | 0.99 | 0.98 | 0.11 | 0.08 | 18379.00 |
179
+ | gamma-delta T cell | 0.89 | 0.07 | 0.97 | 0.94 | 0.26 | 0.18 | 15942.00 |
180
+ | platelet | 0.79 | 0.10 | 0.97 | 0.94 | 0.21 | 0.11 | 15847.00 |
181
+ | T follicular helper cell | 1.00 | 0.07 | 0.99 | 0.98 | 0.24 | 0.12 | 13608.00 |
182
+ | mucosal invariant T cell | 0.89 | 0.08 | 0.97 | 0.95 | 0.15 | 0.09 | 10992.00 |
183
+ | CD16-negative, CD56-bright natural killer cell, human | 0.95 | 0.08 | 0.98 | 0.95 | 0.45 | 0.47 | 10442.00 |
184
+ | class switched memory B cell | 0.89 | 0.09 | 0.99 | 0.96 | 0.11 | 0.12 | 7244.00 |
185
+ | immature B cell | 0.89 | 0.12 | 0.98 | 0.94 | 0.26 | 0.14 | 5238.00 |
186
+ | natural killer cell | 0.74 | 0.06 | 0.99 | 0.98 | 0.68 | 0.68 | 4963.00 |
187
+ | plasmacytoid dendritic cell | 0.84 | 0.09 | 0.98 | 0.96 | 0.54 | 0.56 | 4612.00 |
188
+ | CD14-low, CD16-positive monocyte | 0.84 | 0.08 | 0.99 | 0.98 | 0.58 | 0.23 | 4140.00 |
189
+ | plasmablast | 0.79 | 0.08 | 0.99 | 0.97 | 0.47 | 0.49 | 4121.00 |
190
+ | IgG plasma cell | 0.89 | 0.09 | 0.98 | 0.94 | 0.47 | 0.51 | 3527.00 |
191
+ | dendritic cell, human | 0.79 | 0.10 | 0.98 | 0.90 | 0.94 | 0.90 | 3357.00 |
192
+ | unswitched memory B cell | 0.95 | 0.10 | 0.99 | 0.96 | 0.63 | 0.61 | 3285.00 |
193
+ | myeloid dendritic cell | 0.89 | 0.11 | 0.97 | 0.94 | 0.74 | 0.74 | 3243.00 |
194
+ | B cell | 0.89 | 0.10 | 0.98 | 0.93 | 0.58 | 0.59 | 3024.00 |
195
+ | IgA plasma cell | 0.89 | 0.10 | 0.97 | 0.91 | 0.47 | 0.48 | 2699.00 |
196
+ | effector memory CD4-positive, alpha-beta T cell | 0.95 | 0.08 | 0.99 | 0.95 | 0.79 | 0.79 | 2634.00 |
197
+ | malignant cell | 0.89 | 0.09 | 0.99 | 0.98 | 0.17 | 0.08 | 2291.00 |
198
+ | CD34-positive, CD38-negative hematopoietic stem cell | 0.74 | 0.10 | 0.96 | 0.93 | 0.37 | 0.36 | 2238.00 |
199
+ | erythrocyte | 0.84 | 0.08 | 0.99 | 0.98 | 0.21 | 0.25 | 2232.00 |
200
+ | CD8-positive, alpha-beta T cell | 0.68 | 0.09 | 0.87 | 0.87 | 0.69 | 0.54 | 1355.00 |
201
+ | IgM plasma cell | 0.95 | 0.09 | 0.98 | 0.92 | 0.42 | 0.46 | 1163.00 |
202
+ | ILC1, human | 0.84 | 0.10 | 0.96 | 0.88 | 0.53 | 0.55 | 776.00 |
203
+ | erythroid progenitor cell, mammalian | 0.58 | 0.15 | 0.94 | 0.92 | 0.22 | 0.22 | 773.00 |
204
+ | monocyte | 0.79 | 0.09 | 0.98 | 0.96 | 0.69 | 0.70 | 649.00 |
205
+ | CD4-positive, alpha-beta T cell | 0.89 | 0.11 | 0.96 | 0.83 | 0.58 | 0.59 | 624.00 |
206
+ | dendritic cell | 0.74 | 0.20 | 0.83 | 0.80 | 0.52 | 0.41 | 585.00 |
207
+ | T-helper 1 cell | 0.95 | 0.10 | 0.98 | 0.95 | 0.73 | 0.73 | 481.00 |
208
+ | regulatory T cell | 0.89 | 0.14 | 0.97 | 0.94 | 0.89 | 0.85 | 329.00 |
209
+ | hematopoietic precursor cell | 0.63 | 0.28 | 0.29 | 0.88 | 0.36 | 0.28 | 180.00 |
210
+ | group 2 innate lymphoid cell, human | 0.63 | 0.25 | 0.98 | 0.53 | 0.33 | 0.36 | 93.00 |
211
+ | T-helper 2 cell | 0.68 | 0.28 | 0.88 | 0.73 | 0.74 | 0.65 | 55.00 |
212
+ | myeloid lineage restricted progenitor cell | 0.32 | 0.27 | 0.98 | 0.80 | 0.54 | 0.26 | 53.00 |
213
+ | megakaryocyte | 0.63 | 0.23 | 0.98 | 0.81 | 0.57 | 0.52 | 53.00 |
214
+ | T-helper 17 cell | 0.68 | 0.53 | 0.70 | 0.74 | 0.48 | 0.36 | 13.00 |
215
+
216
+
217
+
218
+ </details>
219
+
220
+ # Model Properties
221
+
222
+ We provide here key parameters used to setup and train the model.
223
+
224
+ <details>
225
+ <summary><strong>Model Parameters</strong></summary>
226
+
227
+ These provide the settings to setup the original model:
228
+ ```json
229
+ {
230
+ "n_latent": 20,
231
+ "gene_dispersion": "gene",
232
+ "protein_dispersion": "protein",
233
+ "gene_likelihood": "nb",
234
+ "latent_distribution": "normal",
235
+ "empirical_protein_background_prior": null,
236
+ "override_missing_proteins": false
237
+ }
238
+ ```
239
+
240
+ </details>
241
+
242
+ <details>
243
+ <summary><strong>Setup Data Arguments</strong></summary>
244
+
245
+ Arguments passed to setup_anndata of the original model:
246
+ ```json
247
+ {
248
+ "rna_layer": "counts",
249
+ "protein_layer": null,
250
+ "batch_key": "donor_id",
251
+ "size_factor_key": null,
252
+ "categorical_covariate_keys": null,
253
+ "continuous_covariate_keys": null,
254
+ "modalities": {
255
+ "rna_layer": "rna",
256
+ "protein_layer": "protein",
257
+ "batch_key": "rna"
258
+ }
259
+ }
260
+ ```
261
+
262
+ </details>
263
+
264
+ <details>
265
+ <summary><strong>Data Registry</strong></summary>
266
+
267
+ Registry elements for AnnData manager:
268
+ | Registry Key | scvi-tools Location |
269
+ |--------------------------|--------------------------------------|
270
+ | X | adata.mod['rna'].layers['counts'] |
271
+ | batch | adata.mod['rna'].obs['_scvi_batch'] |
272
+ | labels | adata.obs['_scvi_labels'] |
273
+ | latent_qzm | adata.obsm['totalvi_latent_qzm'] |
274
+ | latent_qzv | adata.obsm['totalvi_latent_qzv'] |
275
+ | minify_type | adata.uns['_scvi_adata_minify_type'] |
276
+ | observed_lib_size | adata.obs['observed_lib_size'] |
277
+ | proteins | adata.mod['protein'].X |
278
+
279
+ - **Data is Minified**: False
280
+
281
+ </details>
282
+
283
+ <details>
284
+ <summary><strong>Summary Statistics</strong></summary>
285
+
286
+ | Summary Stat Key | Value |
287
+ |--------------------------|-------|
288
+ | n_batch | 120 |
289
+ | n_cells | 647366 |
290
+ | n_extra_categorical_covs | 0 |
291
+ | n_extra_continuous_covs | 0 |
292
+ | n_labels | 1 |
293
+ | n_latent_qzm | 20 |
294
+ | n_latent_qzv | 20 |
295
+ | n_proteins | 192 |
296
+ | n_vars | 4000 |
297
+
298
+ </details>
299
+
300
+
301
+ <details>
302
+ <summary><strong>Training</strong></summary>
303
+
304
+ <!-- If your model is not uploaded with any data (e.g., minified data) on the Model Hub, then make
305
+ sure to provide this field if you want users to be able to access your training data. See the
306
+ scvi-tools documentation for details. -->
307
+ **Training data url**: Not provided by uploader
308
+
309
+ If provided by the original uploader, for those interested in understanding or replicating the
310
+ training process, the code is available at the link below.
311
+
312
+ **Training Code URL**: https://github.com/YosefLab/Thymus_CITE-seq/blob/main/totalVI_AllData/totalVI_thymus111.ipynb
313
+
314
+ </details>
315
+
316
+
317
+ # References
318
+
319
+ Steier, Z., Aylard, D.A., McIntyre, L.L. et al. Single-cell multiomic analysis of thymocyte development reveals drivers of CD4+ T cell and CD8+ T cell lineage commitment. Nat Immunol 24, 1579–1590 (2023). https://doi.org/10.1038/s41590-023-01584-0.