Question about reproducing in silico reprogramming

#101
by Oskm - opened

Hello, congratulations on your amazing work! I'm trying to reproduce the in silico fibroblast to iPSC conversion reported in your paper, was able to successfully tokenize the data from Xing et. al. (listed parameters I'm using below), able to perturb individual genes and have a few questions regarding combinatorial perturbations:

  1. How did you perturb 4 genes (OSKM)? Does combos function allow for 4 perturbations at once or did you need to use one of the factors as an anchor gene?
  2. For the anchor_gene function, I tried using ensembl id for various genes (in a format shown below), but getting an error. Can you please double check to see if there is anything wrong with my parameters?
  3. How many cells did you need to model to get statistical significance for OSKM reprogramming reported in the paper?
  4. For some genes (like SOX2), I'm getting extremely low N_detections in the dataset (3 cells out of 1000). Could you please explain why that would be when I'm using an "overexpress" function and if there is any way to increase it?

Thank you very much for your help, this is really a great tool and will help many researchers!

isp = InSilicoPerturber(perturb_type="overexpress",
perturb_rank_shift=None,
genes_to_perturb=["ENSG00000204531", "ENSG00000181449", "ENSG00000136826"],
combos=2,
anchor_gene="ENSG00000136997",
model_type="Pretrained",
num_classes=0,
emb_mode="cell",
cell_emb_style="mean_pool",
filter_data={},
cell_states_to_model={"cell_type":(["D0"],["D16_positive"],[])},
max_ncells=1000,
emb_layer=-1,
forward_batch_size=50,
nproc=1,
save_raw_data=True,
)

ispstats = InSilicoPerturberStats(mode="goal_state_shift",
combos=2,
anchor_gene="ENSG00000136997",
cell_states_to_model={"cell_type":(["D0"],["D16_positive"],[])})

Thank you for your interest in Geneformer! I updated the in silico perturber to allow for efficient modeling of a single perturbation in multiple cells as a batch. Please pull the updated version. You can set up the isp and ispstats as follows. The other issues should also be resolved but please let me know if you still have any issues.

isp = InSilicoPerturber(perturb_type="overexpress",
                        perturb_rank_shift=None,
                        genes_to_perturb=[list_of_reprogramming_factors],
                        combos=0,
                        anchor_gene=None,
                        model_type="Pretrained",
                        num_classes=0,
                        emb_mode="cell",
                        cell_emb_style="mean_pool",
                        filter_data={"cell_type":["fibroblasts","iPSCs"]},
                        cell_states_to_model={"cell_type":(["fibroblasts"],["iPSCs"])},
                        max_ncells=None,
                        emb_layer=-1,
                        forward_batch_size=400,
                        nproc=16)

ispstats = InSilicoPerturberStats(mode="goal_state_shift",
                                genes_perturbed=[list_of_reprogramming_factors],
                                combos=0,
                                anchor_gene=None,
                                cell_states_to_model={"cell_type":(["fibroblasts"],["iPSCs"])})
ctheodoris changed discussion status to closed

Sign up or log in to comment