gbyuvd
/

miniChembed-prototype

Sentence Similarity

sentence-transformers

molecular-similarity

cheminformatics

feature-extraction

text-embeddings-inference

Model card Files Files and versions

gbyuvd commited on Oct 27

Commit

6a2f303

·

verified ·

1 Parent(s): 3a31377

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -36,12 +36,12 @@ The Barlow Twins objective explicitly minimizes redundancy between embedding dim
 | Attribute | Value |
 |----------|-------|
-| **Base architecture** | Custom RoBERTa-style transformer (4 layers, 320 hidden dim, 4 attention heads, ~4M params) |
 | **Initialization** | Random (not pretrained on text or chemistry) |
 | **Training objective** | **Barlow Twins**, redundancy-reduction via cross-correlation matrix |
 | **Augmentation** | Stochastic SMILES enumeration (`MolToSmiles(..., doRandom=True)`) |
 | **Training data** | ~24K unique molecules → augmented into positive pairs |
-| **Sequence length** | 512 tokens |
 | **Embedding dimension** | 320 |
 | **Projection head** | 3-layer MLP with BatchNorm (2048 → 2048 → 2048) |
 | **Pooling** | Mean pooling over token embeddings |

 | Attribute | Value |
 |----------|-------|
+| **Base architecture** | Custom RoBERTa-style transformer (6 layers, 320 hidden dim, 4 attention heads, ~8M params) |
 | **Initialization** | Random (not pretrained on text or chemistry) |
 | **Training objective** | **Barlow Twins**, redundancy-reduction via cross-correlation matrix |
 | **Augmentation** | Stochastic SMILES enumeration (`MolToSmiles(..., doRandom=True)`) |
 | **Training data** | ~24K unique molecules → augmented into positive pairs |
+| **Sequence length** | 514 tokens |
 | **Embedding dimension** | 320 |
 | **Projection head** | 3-layer MLP with BatchNorm (2048 → 2048 → 2048) |
 | **Pooling** | Mean pooling over token embeddings |