christofid commited on
Commit
800e1b6
1 Parent(s): 07353a2

Update model_cards/article.md

Browse files
Files changed (1) hide show
  1. model_cards/article.md +42 -35
model_cards/article.md CHANGED
@@ -4,15 +4,15 @@
4
 
5
  ### Property
6
  The supported properties are:
7
- - `Metal NonMetal Classifier`: Predicted by a RF model (WHICH? )
8
- - `Metal Semiconductor Classifier`: Classifying whether a metal could be a semiconductor. Predicted with CGCNN (ToDo: Add Ref!)
9
- - `Poisson Ratio`: ToDo: Description + Reference
10
- - `Shear Moduli` ...
11
- - `Bulk Moduli`
12
- - `Fermi Energy`
13
- - `Band Gap`
14
- - `Absolute Energy`
15
- - `Formation Energy`
16
 
17
 
18
  ### Input file for crystal model
@@ -24,46 +24,45 @@ The file with information about the metal. Dependent on the property you want to
24
 
25
  # Model card - CGCNN
26
 
27
- **Model Details**: The [Regression Transformer](https://arxiv.org/abs/2202.01338) is a multitask Transformer that reformulates regression as a conditional sequence modeling task. This yields a dichotomous language model that seamlessly integrates property prediction with property-driven conditional generation.
28
 
29
- **Developers**: Jannis Born and Matteo Manica from IBM Research.
30
 
31
  **Distributors**: Original authors' code wrapped and distributed by GT4SD Team (2023) from IBM Research.
32
 
33
- **Model date**: Preprint released in 2022, currently under review at *Nature Machine Intelligence*.
34
 
35
  **Algorithm version**: Models trained and distributed by the original authors.
36
- - **Molecules: QED**: Model trained on 1.6M molecules (SELFIES) from ChEMBL and their QED scores.
37
- - **Molecules: Solubility**: QED model finetuned on the ESOL dataset from [Delaney et al (2004), *J. Chem. Inf. Comput. Sci.*](https://pubs.acs.org/doi/10.1021/ci034243x) to predict water solubility. Model trained on augmented SELFIES.
38
- - **Molecules: USPTO**: Model trained on 2.8M [chemical reactions](https://figshare.com/articles/dataset/Chemical_reactions_from_US_patents_1976-Sep2016_/5104873) from the US patent office. The model used SELFIES and a synthetic property (total molecular weight of all precursors).
39
- - **Molecules: Polymer**: Model finetuned on 600 ROPs (ring-opening polymerizations) with monomer-catalyst pairs. Model used three properties: conversion (`<conv>`), PDI (`<pdi>`) and Molecular Weight (`<molwt>`). Model trained with augmented SELFIES, optimized only to generate catalysts, given a monomer and the property constraints. See the example for details.
40
- - **Molecules: Cosmo_acdl**: Model finetuned on 56k molecules with two properties (*pKa_ACDL* and *pKa_COSMO*). Model used augmented SELFIES.
41
- - **Molecules: Pfas**: Model finetuned on ~1k PFAS (Perfluoroalkyl and Polyfluoroalkyl Substances) molecules with 9 properties including some experimentally measured ones (biodegradability, LD50 etc) and some synthetic ones (SCScore, molecular weight). Model trained on augmented SELFIES.
42
- - **Molecules: Logp_and_synthesizability**: Model trained on 2.9M molecules (SELFIES) from PubChem with **two** synthetic properties, the logP (partition coefficient) and the [SCScore by Coley et al. (2018); *J. Chem. Inf. Model.*](https://pubs.acs.org/doi/full/10.1021/acs.jcim.7b00622?casa_token=JZzOrdWlQ_QAAAAA%3A3_ynCfBJRJN7wmP2gyAR0EWXY-pNW_l-SGwSSU2SGfl5v5SxcvqhoaPNDhxq4THberPoyyYqTZELD4Ck)
43
- - **Molecules: Crippen_logp**: Model trained on 2.9M molecules (SMILES) from PubChem, but *only* on logP (partition coefficient).
44
- - **Proteins: Stability**: Model pretrained on 2.6M peptides from UniProt with the Boman index as property. Finetuned on the [**Stability**](https://www.science.org/doi/full/10.1126/science.aan0693) dataset from the [TAPE benchmark](https://proceedings.neurips.cc/paper/2019/hash/37f65c068b7723cd7809ee2d31d7861c-Abstract.html) which has ~65k samples.
45
-
46
- **Model type**: A Transformer-based language model that is trained on alphanumeric sequence to simultaneously perform sequence regression or conditional sequence generation.
47
 
48
  **Information about training algorithms, parameters, fairness constraints or other applied approaches, and features**:
49
- All models are trained with an alternated training scheme that alternated between optimizing the cross-entropy loss on the property tokens ("regression") or the self-consistency objective on the molecular tokens. See the [Regression Transformer](https://arxiv.org/abs/2202.01338) paper for details.
50
 
51
  **Paper or other resource for more information**:
52
- The [Regression Transformer](https://arxiv.org/abs/2202.01338) paper. See the [source code](https://github.com/IBM/regression-transformer) for details.
53
 
54
  **License**: MIT
55
 
56
- **Where to send questions or comments about the model**: Open an issue on [GT4SD repository](https://github.com/GT4SD/gt4sd-core).
57
 
58
- **Intended Use. Use cases that were envisioned during development**: Chemical research, in particular drug discovery.
59
 
60
- **Primary intended uses/users**: Researchers and computational chemists using the model for model comparison or research exploration purposes.
61
 
62
  **Out-of-scope use cases**: Production-level inference, producing molecules with harmful properties.
63
 
64
- **Factors**: Not applicable.
65
 
66
- **Metrics**: High predictive power for the properties of that specific algorithm version.
67
 
68
  **Datasets**: Different ones, as described under **Algorithm version**.
69
 
@@ -82,10 +81,18 @@ ToDo...
82
  # Citation
83
 
84
  ```bib
85
- @article{manica2022gt4sd,
86
- title={GT4SD: Generative Toolkit for Scientific Discovery},
87
- author={Manica, Matteo and Cadow, Joris and Christofidellis, Dimitrios and Dave, Ashish and Born, Jannis and Clarke, Dean and Teukam, Yves Gaetan Nana and Hoffman, Samuel C and Buchan, Matthew and Chenthamarakshan, Vijil and others},
88
- journal={arXiv preprint arXiv:2207.03928},
89
- year={2022}
 
 
 
 
 
 
 
 
90
  }
91
  ```
 
4
 
5
  ### Property
6
  The supported properties are:
7
+ - `Metal NonMetal Classifier`: Classifying whether a crystal could be metal or nonmetal using a [RandomForest classifier](https://www.nature.com/articles/s41524-022-00850-3)
8
+ - `Metal Semiconductor Classifier`: Classifying whether a crystal could be metal or semiconductor using the [CGCNN framework](https://link.aps.org/doi/10.1103/PhysRevLett.120.145301).
9
+ - `Poisson Ratio`: Predicted using the [CGCNN framework](https://link.aps.org/doi/10.1103/PhysRevLett.120.145301).
10
+ - `Shear Moduli`: Predicted using the [CGCNN framework](https://link.aps.org/doi/10.1103/PhysRevLett.120.145301).
11
+ - `Bulk Moduli`: Predicted using the [CGCNN framework](https://link.aps.org/doi/10.1103/PhysRevLett.120.145301).
12
+ - `Fermi Energy`: Predicted using the [CGCNN framework](https://link.aps.org/doi/10.1103/PhysRevLett.120.145301).
13
+ - `Band Gap`: Predicted using the [CGCNN framework](https://link.aps.org/doi/10.1103/PhysRevLett.120.145301).
14
+ - `Absolute Energy`: Predicted using the [CGCNN framework](https://link.aps.org/doi/10.1103/PhysRevLett.120.145301).
15
+ - `Formation Energy`: Predicted using the [CGCNN framework](https://link.aps.org/doi/10.1103/PhysRevLett.120.145301).
16
 
17
 
18
  ### Input file for crystal model
 
24
 
25
  # Model card - CGCNN
26
 
27
+ **Model Details**: Eight CGCNN models trained to predict various properties for crystals.
28
 
29
+ **Developers**: [CGCNN's](https://github.com/txie-93/cgcnn) developers.
30
 
31
  **Distributors**: Original authors' code wrapped and distributed by GT4SD Team (2023) from IBM Research.
32
 
33
+ **Model date**: 2018.
34
 
35
  **Algorithm version**: Models trained and distributed by the original authors.
36
+ - **Metal Semiconductor Classifier**: Model trained to classify whether a crystal could be metal or semiconductor using instances from this [database](https://aip.scitation.org/doi/10.1063/1.4812323) that includes a diverse set of inorganic crystals ranging from simple metals to complex minerals..
37
+ - **Poisson Ratio**: Model to predict the Poisson ratio trained on 2041 instances from this [database](https://aip.scitation.org/doi/10.1063/1.4812323) that includes a diverse set of inorganic crystals ranging from simple metals to complex minerals.
38
+ - **Shear Moduli**: Model to predict the Shear moduli trained on 2041 instances from this [database](https://aip.scitation.org/doi/10.1063/1.4812323) that includes a diverse set of inorganic crystals ranging from simple metals to complex minerals. Unit log(GPa).
39
+ - **Bulk Moduli**: Model to predict the Bulk moduli trained on 2041 instances from this [database](https://aip.scitation.org/doi/10.1063/1.4812323) that includes a diverse set of inorganic crystals ranging from simple metals to complex minerals. Unit log(GPa).
40
+ - **Fermi Energy**: Model to predict the Fermi energy trained on 28046 instances from this [database](https://aip.scitation.org/doi/10.1063/1.4812323) that includes a diverse set of inorganic crystals ranging from simple metals to complex minerals. Unit eV.
41
+ - **Band Gap**: Model to predict the Band Gap trained on 16458 instances from this [database](https://aip.scitation.org/doi/10.1063/1.4812323) that includes a diverse set of inorganic crystals ranging from simple metals to complex minerals. Unit eV.
42
+ - **Absolute Energy**: Model to predict the Absolute energy trained on 28046 instances from this [database](https://aip.scitation.org/doi/10.1063/1.4812323) that includes a diverse set of inorganic crystals ranging from simple metals to complex minerals. Unit eV/atom.
43
+ - **Formation Energy**: Model to predict the formation energy trained on 28046 instances from this [database](https://aip.scitation.org/doi/10.1063/1.4812323) that includes a diverse set of inorganic crystals ranging from simple metals to complex minerals. Unit eV/atom.
44
+
45
+ **Model type**: Crystal Graph Convolutional Neural Networks (CGCNN) that take an arbitary crystal structure to predict material properties.
 
46
 
47
  **Information about training algorithms, parameters, fairness constraints or other applied approaches, and features**:
48
+ See the [CGCNN](https://link.aps.org/doi/10.1103/PhysRevLett.120.145301) paper for details.
49
 
50
  **Paper or other resource for more information**:
51
+ The [CGCNN](https://link.aps.org/doi/10.1103/PhysRevLett.120.145301) paper. See the [source code](https://github.com/txie-93/cgcnn) for details.
52
 
53
  **License**: MIT
54
 
55
+ **Where to send questions or comments about the model**: Open an issue on [CGCNN](https://github.com/txie-93/cgcnn) repo.
56
 
57
+ **Intended Use. Use cases that were envisioned during development**: Materials research.
58
 
59
+ **Primary intended uses/users**: Researchers using the model for model comparison or research exploration purposes.
60
 
61
  **Out-of-scope use cases**: Production-level inference, producing molecules with harmful properties.
62
 
63
+ **Factors**: N.A.
64
 
65
+ **Metrics**: N.A.
66
 
67
  **Datasets**: Different ones, as described under **Algorithm version**.
68
 
 
81
  # Citation
82
 
83
  ```bib
84
+ @article{PhysRevLett.120.145301,
85
+ title = {Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties},
86
+ author = {Xie, Tian and Grossman, Jeffrey C.},
87
+ journal = {Phys. Rev. Lett.},
88
+ volume = {120},
89
+ issue = {14},
90
+ pages = {145301},
91
+ numpages = {6},
92
+ year = {2018},
93
+ month = {Apr},
94
+ publisher = {American Physical Society},
95
+ doi = {10.1103/PhysRevLett.120.145301},
96
+ url = {https://link.aps.org/doi/10.1103/PhysRevLett.120.145301}
97
  }
98
  ```