in_silico_perturber ERROR

#326
by zxh1210 - opened

Hello, I upgraded the package to the current version and had the error when running isp_perturb_all function. I would like to get your suggestion for gene_cos_sims keyerror issue.

start_goal_alt in [{"state_key": "disease","start_state": "d","goal_state": "Uninjured","alt_states":[]}]:

filter_data = {"cluster":cell_type_detail,disease:[k for k in start_goal_alt.values()]}
embex = EmbExtractor(model_type="CellClassifier",
num_classes=num_classes,
filter_data=filter_data,
max_ncells=None,
emb_layer=0,
summary_stat="exact_mean",
forward_batch_size=64,
nproc=16)
state_embs_dict = embex.get_state_embs(start_goal_alt,
output_dir,input_data_path,output_dir,
f"embs}*")

isp = InSilicoPerturber(perturb_type=perturb_type,
perturb_rank_shift=None,
genes_to_perturb="all",
combos=0,
anchor_gene=None,
model_type="CellClassifier",
num_classes=num_classes,
emb_mode="cell_and_gene",
cell_emb_style="mean_pool",
filter_data=filter_data,
cell_states_to_model=start_goal_alt,
state_embs_dict=state_embs_dict,
max_ncells=None,
emb_layer=0,
forward_batch_size=64,
nproc=16,
)

isp.perturb_data(output_dir,input_data_path,intermed_path, "celltype")
ispstats = InSilicoPerturberStats(mode="goal_state_shift",
genes_perturbed="all",
combos=0,
anchor_gene=None,
cell_states_to_model=start_goal_alt)

Error:
KeyError Traceback (most recent call last)
File ~/.conda/envs/geneformer/lib/python3.10/site-packages/geneformer/in_silico_perturber.py:817, in InSilicoPerturber.isp_perturb_all(self, model, filtered_input_data, layer_to_quant, output_path_prefix)
814 try:
815 stored_gene_embs_dict[
816 (perturbed_gene, affected_gene)
--> 817 ].append(gene_cos_sims[perturbation_i, gene_j].item())
818 except KeyError:

KeyError: (0, 0)

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last)
Cell In[18], line 122
119 if not os.path.exists(intermed_path):
120 os.makedirs(intermed_path)
--> 122 isp.perturb_data(output_dir,input_data_path,intermed_path,
123 f"celltype")
125 ispstats = InSilicoPerturberStats(mode="goal_state_shift",
126 genes_perturbed="all",
127 combos=0,
128 anchor_gene=None,
129 cell_states_to_model=start_goal_alt)

File ~/.conda/envs/geneformer/lib/python3.10/site-packages/geneformer/in_silico_perturber.py:437, in InSilicoPerturber.perturb_data(self, model_directory, input_data_file, output_directory, output_prefix)
433 self.isp_perturb_set(
434 model, filtered_input_data, layer_to_quant, output_path_prefix
435 )
436 else:
--> 437 self.isp_perturb_all(
438 model, filtered_input_data, layer_to_quant, output_path_prefix
439 )

File ~/.conda/envs/geneformer/lib/python3.10/site-packages/geneformer/in_silico_perturber.py:821, in InSilicoPerturber.isp_perturb_all(self, model, filtered_input_data, layer_to_quant, output_path_prefix)
815 stored_gene_embs_dict[
816 (perturbed_gene, affected_gene)
817 ].append(gene_cos_sims[perturbation_i, gene_j].item())
818 except KeyError:
819 stored_gene_embs_dict[
820 (perturbed_gene, affected_gene)
--> 821 ] = gene_cos_sims[perturbation_i, gene_j].item()
823 if self.cell_states_to_model is None:
824 cos_sims_data = torch.mean(gene_cos_sims, dim=1)

KeyError: (0, 0)

========================================================
Edited:
It seems I need to use matched emb_mod in EmbExtractor and InSilicoPerturber. If I would like to have emb_mod="cell_and_gene" then I have to use emb_mod="gene" in EmbExtractor, is this correct? Will I get the cell state shift prediction when using emb_mod="gene" in EmbExtractor? "cell_and_gene" emb_mod from InSilicoPerturber is a little confusing to me. Thank you!

Thank you for your question! If you are referring to the EmbExtractor because you are using it to generate the state_embs_dict, then it sounds like you'd like to compare to a goal state, in which case you should use emb_mode = "cell" for the in silico perturber. Please see the in silico perturber documentation ("Gene embedding shifts only available as compared to original cell, not comparing to goal state." If you'd like to compare to a goal state, you should use emb_mode = "cell"). For emb_mode = "gene" for the in silico perturber, it calculates cosine shifts from the original cell only as it is primarily focused on the effect of perturbing a gene on the other genes in the cell, not the shift of the cell state as a whole.

ctheodoris changed discussion status to closed

Sign up or log in to comment