jruffle commited on
Commit
5f7b33f
·
verified ·
1 Parent(s): d6e8424

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +20 -44
  2. metadata.json +15 -19
  3. model.joblib +2 -2
README.md CHANGED
@@ -1,63 +1,39 @@
1
  ---
2
- title: Classical Methods (Sample-centric, 2D)
3
- emoji: 📊
4
- colorFrom: purple
5
- colorTo: blue
6
- sdk: python
7
  tags:
8
  - transcriptomics
9
  - dimensionality-reduction
10
- - pca
11
- - umap
 
 
12
  license: mit
13
  ---
14
 
15
- # Classical Dimensionality Reduction (Sample-centric, 2D)
16
 
17
- Pre-trained PCA and UMAP models for transcriptomics data compression, part of the TRACERx Datathon 2025 project.
18
 
19
- ## Model Details
20
 
21
- - **Methods**: PCA and UMAP
22
- - **Compression Mode**: Sample-centric
23
- - **Output Dimensions**: 2
24
- - **Training Data**: TRACERx open dataset (VST-normalized counts)
25
-
26
- ## Contents
27
-
28
- The model file contains:
29
- - **PCA**: Principal Component Analysis model
30
- - **UMAP**: Uniform Manifold Approximation and Projection model (2-4D only)
31
- - **Scaler**: StandardScaler fitted on TRACERx data
32
- - **Feature Order**: Gene/sample order for alignment
33
 
34
  ## Usage
35
 
36
- These models are designed to be used with the TRACERx Datathon 2025 analysis pipeline.
37
- They will be automatically downloaded and cached when needed.
38
-
39
  ```python
40
  import joblib
 
41
 
42
- # Load the model bundle
43
- model_data = joblib.load("model.joblib")
 
44
 
45
- # Access components
46
- pca = model_data['pca']
47
- scaler = model_data['scaler']
48
- gene_order = model_data.get('gene_order') # For sample-centric
49
 
50
- # Transform new data
51
- scaled_data = scaler.transform(aligned_data)
52
- embeddings = pca.transform(scaled_data)
53
  ```
54
-
55
- ## Training Details
56
-
57
- - **Input Features**: 20,136 genes
58
- - **Training Samples**: 1,051 samples
59
- - **Preprocessing**: StandardScaler normalization
60
-
61
- ## Files
62
-
63
- - `model.joblib`: Model bundle containing PCA, UMAP, scaler, and feature order
 
1
  ---
 
 
 
 
 
2
  tags:
3
  - transcriptomics
4
  - dimensionality-reduction
5
+ - classical
6
+ - TRACERx
7
+ - UMAP
8
+ - PCA
9
  license: mit
10
  ---
11
 
12
+ # Classical Models (PCA + UMAP) - samples mode - 2D
13
 
14
+ Pre-trained PCA and UMAP models for transcriptomic data compression.
15
 
16
+ **UMAP models support transform()** - new data can be projected into the same embedding space.
17
 
18
+ ## Details
19
+ - **Mode**: samples-centric compression
20
+ - **Dimensions**: 2
21
+ - **Training data**: TRACERx lung cancer transcriptomics
22
+ - **Created**: 2026-01-13T12:05:54.092383
23
+ - **UMAP transform**: Enabled (low_memory=False)
 
 
 
 
 
 
24
 
25
  ## Usage
26
 
 
 
 
27
  ```python
28
  import joblib
29
+ from huggingface_hub import snapshot_download
30
 
31
+ # Download model
32
+ local_dir = snapshot_download("jruffle/classical_samples_2d")
33
+ model = joblib.load(f"{local_dir}/model.joblib")
34
 
35
+ # Model contains: 'pca', 'umap', 'robust_scaler', 'gene_order'
 
 
 
36
 
37
+ # Use UMAP transform on new data:
38
+ new_embeddings = model['umap'].transform(preprocessed_new_data)
 
39
  ```
 
 
 
 
 
 
 
 
 
 
metadata.json CHANGED
@@ -1,21 +1,17 @@
1
  {
2
- "compression_mode": "samples",
3
- "latent_dims": 2,
4
- "preprocessing_method": "robust",
5
- "preprocessing_quantile_range": [
6
- 5.0,
7
- 95.0
8
- ],
9
- "norm_range": [
10
- -1,
11
- 1
12
- ],
13
- "n_samples": 1050,
14
- "n_features": 20136,
15
- "pca_explained_variance": [
16
- 0.15058058614897457,
17
- 0.07173255882979852
18
- ],
19
- "has_umap": true,
20
- "n_genes": 20136
21
  }
 
1
  {
2
+ "model_type": "classical",
3
+ "mode": "samples",
4
+ "dimensions": 2,
5
+ "created": "2026-01-13T12:05:54.092536",
6
+ "umap_transform_enabled": true,
7
+ "keys": [
8
+ "robust_scaler",
9
+ "norm_params",
10
+ "pca",
11
+ "preprocessing_method",
12
+ "preprocessing_quantile_range",
13
+ "gene_order",
14
+ "sample_ids",
15
+ "umap"
16
+ ]
 
 
 
 
17
  }
model.joblib CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f91639e7ccd622491bdfbcc8bbf3e873ca523369046c26a45418f3f162373d51
3
- size 85797006
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9b37f818d9e9dac2fe8743a9a4fe421dc81395c33af145135f01da3703239659
3
+ size 85799458