ctheodoris/Genecorpus-30M
Preview • Updated • 623 • 82
⚠️ This is a derivative fork of Geneformer — not the official repository.
It accompanies the RNAquarium manuscript (Aniseia, Walteri et al.) and provides the Geneformer package with a small set of modifications used in that work. The figure-generation code is maintained separately and references this fork.
- Forked from:
ctheodoris/Geneformerat thegc95Mgeneration (default model GF-12L-95M-i4096), upstream commit31bf641(2025-06-16), prior to the upstreamgc104Mmigration.- What changed: see
CHANGES_vs_upstream.md. Edits are confined to 6 files ingeneformer/— cross-model weight transfer, class subsampling/weighting, multi-GPU multitask training, and enhanced confusion-matrix plotting.- Model weights are not included here. Download pretrained Geneformer weights from upstream: https://huggingface.co/ctheodoris/Geneformer. RNAquarium-specific trained weights, if shared, are hosted separately (see the manuscript).
- License & attribution: Apache-2.0, inherited from upstream. See
LICENSEandNOTICE.
Base model
ctheodoris/Geneformer