Upload folder using huggingface_hub

Browse files

Files changed (3) hide show

README.md +20 -44
metadata.json +15 -19
model.joblib +2 -2

README.md CHANGED Viewed

@@ -1,63 +1,39 @@
 ---
-title: Classical Methods (Sample-centric, 2D)
-emoji: 📊
-colorFrom: purple
-colorTo: blue
-sdk: python
 tags:
 - transcriptomics
 - dimensionality-reduction
-- pca
-- umap
 license: mit
 ---
-# Classical Dimensionality Reduction (Sample-centric, 2D)
-Pre-trained PCA and UMAP models for transcriptomics data compression, part of the TRACERx Datathon 2025 project.
-## Model Details
-- **Methods**: PCA and UMAP
-- **Compression Mode**: Sample-centric
-- **Output Dimensions**: 2
-- **Training Data**: TRACERx open dataset (VST-normalized counts)
-## Contents
-The model file contains:
-- **PCA**: Principal Component Analysis model
-- **UMAP**: Uniform Manifold Approximation and Projection model (2-4D only)
-- **Scaler**: StandardScaler fitted on TRACERx data
-- **Feature Order**: Gene/sample order for alignment
 ## Usage
-These models are designed to be used with the TRACERx Datathon 2025 analysis pipeline.
-They will be automatically downloaded and cached when needed.
 ```python
 import joblib
-# Load the model bundle
-model_data = joblib.load("model.joblib")
-# Access components
-pca = model_data['pca']
-scaler = model_data['scaler']
-gene_order = model_data.get('gene_order')  # For sample-centric
-# Transform new data
-scaled_data = scaler.transform(aligned_data)
-embeddings = pca.transform(scaled_data)
 ```
-## Training Details
-- **Input Features**: 20,136 genes
-- **Training Samples**: 1,051 samples
-- **Preprocessing**: StandardScaler normalization
-## Files
-- `model.joblib`: Model bundle containing PCA, UMAP, scaler, and feature order

 ---
 tags:
 - transcriptomics
 - dimensionality-reduction
+- classical
+- TRACERx
+- UMAP
+- PCA
 license: mit
 ---
+# Classical Models (PCA + UMAP) - samples mode - 2D
+Pre-trained PCA and UMAP models for transcriptomic data compression.
+**UMAP models support transform()** - new data can be projected into the same embedding space.
+## Details
+- **Mode**: samples-centric compression
+- **Dimensions**: 2
+- **Training data**: TRACERx lung cancer transcriptomics
+- **Created**: 2026-01-13T12:05:54.092383
+- **UMAP transform**: Enabled (low_memory=False)
 ## Usage
 ```python
 import joblib
+from huggingface_hub import snapshot_download
+# Download model
+local_dir = snapshot_download("jruffle/classical_samples_2d")
+model = joblib.load(f"{local_dir}/model.joblib")
+# Model contains: 'pca', 'umap', 'robust_scaler', 'gene_order'
+# Use UMAP transform on new data:
+new_embeddings = model['umap'].transform(preprocessed_new_data)
 ```

metadata.json CHANGED Viewed

@@ -1,21 +1,17 @@
 {
-  "compression_mode": "samples",
-  "latent_dims": 2,
-  "preprocessing_method": "robust",
-  "preprocessing_quantile_range": [
-    5.0,
-    95.0
-  ],
-  "norm_range": [
-    -1,
-    1
-  ],
-  "n_samples": 1050,
-  "n_features": 20136,
-  "pca_explained_variance": [
-    0.15058058614897457,
-    0.07173255882979852
-  ],
-  "has_umap": true,
-  "n_genes": 20136
 }

 {
+  "model_type": "classical",
+  "mode": "samples",
+  "dimensions": 2,
+  "created": "2026-01-13T12:05:54.092536",
+  "umap_transform_enabled": true,
+  "keys": [
+    "robust_scaler",
+    "norm_params",
+    "pca",
+    "preprocessing_method",
+    "preprocessing_quantile_range",
+    "gene_order",
+    "sample_ids",
+    "umap"
+  ]
 }

model.joblib CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f91639e7ccd622491bdfbcc8bbf3e873ca523369046c26a45418f3f162373d51
-size 85797006

 version https://git-lfs.github.com/spec/v1
+oid sha256:9b37f818d9e9dac2fe8743a9a4fe421dc81395c33af145135f01da3703239659
+size 85799458