Spaces:
Running
Running
File size: 2,140 Bytes
85ec4af |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
---
license: apache-2.0
library_name: transformers
pipeline_tag: feature-extraction
tags:
- chemistry
---
# selfies-ted
selfies-ted is a project for encoding SMILES (Simplified Molecular Input Line Entry System) into SELFIES (SELF-referencing Embedded Strings) and generating embeddings for molecular representations.
![selfies-ted](selfies-ted.png)
## Model Architecture
Configuration details
Encoder and Decoder FFN dimensions: 256
Number of attention heads: 4
Number of encoder and decoder layers: 2
Total number of hidden layers: 6
Maximum position embeddings: 128
Model dimension (d_model): 256
## Pretrained Models and Training Logs
We provide checkpoints of the selfies-ted model pre-trained on a dataset of molecules curated from PubChem. The pre-trained model shows competitive performance on molecular representation tasks. For model weights: "HuggingFace link".
To install and use the pre-trained model:
Download the selfies_ted_model.pkl file from the "HuggingFace link".
Add the selfies-ted selfies_ted_model.pkl to the models/ directory. The directory structure should look like the following:
```
models/
└── selfies_ted_model.pkl
```
## Installation
To use this project, you'll need to install the required dependencies. We recommend using a virtual environment:
```bash
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
```
Install the required dependencies
```
pip install -r requirements.txt
```
## Usage
### Import
```
import load
```
### Training the Model
To train the model, use the train.py script:
```
python train.py -f <path_to_your_data_file>
```
Note: The actual usage may depend on the specific implementation in load.py. Please refer to the source code for detailed functionality.
### Load the model and tokenizer
```
load.load("path/to/checkpoint.pkl")
```
### Encode SMILES strings
```
smiles_list = ["COC", "CCO"]
```
```
embeddings = load.encode(smiles_list)
```
## Example Notebook
Example notebook of this project is `selfies-ted-example.ipynb`.
|