Some questions about gene type prediction task

#6
by iLOVE2D - opened

Hi, congs for your great work! I take a look at the supp table for gene type prediction task, and I found that the second dataset is a little ambiguous. I cannot find that dataset(15K embryonic stem cells (ESCs)29) in PanglaoDB. Could you please offer more information? Thanks a lot.

Thank you for your interest in Geneformer. The dataset used for fine-tuning the model to distinguish bivalent promoters was from PanglaoDB, SRA553822-SRS2119548. In the example_input_files directory, we added the labels for the genes in the 56 highly conserved regions reported in Bernstein et al. 2006.

ctheodoris changed discussion status to closed

image.png
Hi, so the meaning of 15k is the code of this dataset rather than the number of cells, is it correct? Thanks a lot.

iLOVE2D changed discussion status to open

15K refers to the number of cells. [Update: we have stored the embryonic stem cell .dataset in the dataset repository: https://huggingface.co/datasets/ctheodoris/Genecorpus-30M/tree/main/example_input_files]

ctheodoris changed discussion status to closed

Sign up or log in to comment