How to output gene embeddings after single gene perturbation, similar to Fig5a/b?

#256
by junguyen - opened

Hello,

I've had success outputting cosine shifts in cell embeddings when running emb_mode="cell" with the parameters listed below; however, when changing emb_mode="cell_and_gene" and keeping all other parameters unchanged, I get the exact same output (cosine shifts in cell embeddings only).

My goal is to observe how all other gene embeddings are affected after a single gene perturbation, to identify important proteins in a network (similar to Fig5a/b in the paper). I'm currently using a small subset from the Genecorpus-30M dataset as my input data.

How should I change my parameters to get gene embedding outputs?

Thank you!

# Set perturbation parameters
isp = InSilicoPerturber(perturb_type="delete",
                        perturb_rank_shift=None,
                        genes_to_perturb=["ENSG00000196262"],
                        combos=0,
                        anchor_gene=None,
                        model_type="Pretrained",
                        num_classes=0,
                        emb_mode="cell",
                        cell_emb_style="mean_pool",
                        filter_data=None,
                        cell_states_to_model=None,
                        max_ncells=None,
                        cell_inds_to_perturb={"start":0, "end":50},
                        emb_layer=-1,
                        forward_batch_size=50,
                        nproc=16,
                        token_dictionary_file = "/home/ubuntu/Geneformer/geneformer/token_dictionary.pkl")

# Perturb data
isp.perturb_data("/home/ubuntu/Geneformer/",
                 "/data/genecorpus_filtered_nonhep/",
                 "/data/genecorpus_filtered_nonhep/delete_cell/",
                 "cell_and_gene_test_PPIA")

# Set perturbation stats
ispstats = InSilicoPerturberStats(mode="aggregate_data",
                                  genes_perturbed=["ENSG00000196262"],
                                  combos=0,
                                  anchor_gene=None,
                                  cell_states_to_model=None,
                                  token_dictionary_file = "/home/ubuntu/Geneformer/geneformer/token_dictionary.pkl")

# Get perturbation stats
ispstats.get_stats("/data/genecorpus_filtered_nonhep/delete_cell/",
                   None,
                   "/data/genecorpus_filtered_nonhep/delete_cell/",
                   "delete_cell_and_gene_test_PPIA")

Thank you for your interest in Geneformer and for your patience! We pushed an update that should resolve this issue. If you continue to face errors after pulling the updated code, please let us know by either reopening this discussion if it's the same error or opening a new discussion if it's a new error. Thank you!

ctheodoris changed discussion status to closed

Sign up or log in to comment