File size: 3,255 Bytes
39bc9b9
 
1634315
39bc9b9
1634315
39bc9b9
 
 
 
 
1634315
39bc9b9
9c92b60
39bc9b9
1634315
39bc9b9
1634315
39bc9b9
1634315
39bc9b9
9c92b60
39bc9b9
1634315
39bc9b9
9c92b60
39bc9b9
9c92b60
39bc9b9
 
 
1634315
39bc9b9
9c92b60
39bc9b9
 
 
 
 
1634315
39bc9b9
9c92b60
39bc9b9
 
 
 
 
 
 
 
9c92b60
39bc9b9
1634315
 
 
 
 
39bc9b9
9c92b60
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# Model documentation & parameters

**Algorithm Version**: Which model version to use.

**Maximal sequence length**: The maximal number of SMILES tokens in the generated molecule.

**Number of samples**: How many samples should be generated (between 1 and 50).



# Model card -- PolymerBlocks

**Model Details**: *PolymerBlocks* is a sequence-based molecular generator tuned to generate blocks of polymers (e.g., catalysts and monomers). The model relies on a Variational Autoencoder architecture as described in [Born et al. (2021; *iScience*)](https://www.sciencedirect.com/science/article/pii/S2589004221002376).

**Developers**: Matteo Manica and colleagues from IBM Research.

**Distributors**: Original authors' code integrated into GT4SD.

**Model date**: Not yet published.

**Model version**: Only initial model version. The model has been pre-trained on 500K compounds from PubChem and further fine-tuned on the SMILES representing monomers and catalysts collected in the database presented in [Park et al. (2022)](https://doi.org/10.26434/chemrxiv-2022-811rl).

**Model type**: A sequence-based molecular generator tuned to generate blocks of polymers (e.g., catalysts and monomers).

**Information about training algorithms, parameters, fairness constraints or other applied approaches, and features**: the sequence-based model is a standard GRU-based VAE trained to reconstruct SMILES representation of molecules. Given the nature of the pre-training and fine-tuning data, the model is biased to create molecules that resemble catalysts and monomers employed in ring-opening polymerization.

**Paper or other resource for more information**: Details on the model used and code can be found in [Born et al. (2021; *iScience*)](https://www.sciencedirect.com/science/article/pii/S2589004221002376).

**License**: MIT

**Where to send questions or comments about the model**: Open an issue on [GT4SD repository](https://github.com/GT4SD/gt4sd-core).

**Intended Use. Use cases that were envisioned during development**: Chemical research, in particular discovery and catalysts for polymerization.

**Primary intended uses/users**: Researchers and computational chemists using the model for model comparison or research exploration purposes.

**Out-of-scope use cases**: Production-level inference, producing molecules with harmful properties.

**Metrics**: N.A.

**Datasets**: See description in the model versions.

**Ethical Considerations**: Unclear, please consult with original authors in case of questions.

**Caveats and Recommendations**: Unclear, please consult with original authors in case of questions.

Model card prototype inspired by [Mitchell et al. (2019)](https://dl.acm.org/doi/abs/10.1145/3287560.3287596?casa_token=XD4eHiE2cRUAAAAA:NL11gMa1hGPOUKTAbtXnbVQBDBbjxwcjGECF_i-WC_3g1aBgU1Hbz_f2b4kI_m1in-w__1ztGeHnwHs)

## Citation

```bib
@article{manica2022gt4sd,
  title={GT4SD: Generative Toolkit for Scientific Discovery},
  author={Manica, Matteo and Cadow, Joris and Christofidellis, Dimitrios and Dave, Ashish and Born, Jannis and Clarke, Dean and Teukam, Yves Gaetan Nana and Hoffman, Samuel C and Buchan, Matthew and Chenthamarakshan, Vijil and others},
  journal={arXiv preprint arXiv:2207.03928},
  year={2022}
}
```