n0w0f commited on
Commit
5e10185
1 Parent(s): ee1e246

update model card

Browse files
Files changed (1) hide show
  1. README.md +29 -81
README.md CHANGED
@@ -45,7 +45,7 @@ Model Pretrained using Masked Language Modelling on 2 million crystal structures
45
 
46
  ### Direct Use
47
 
48
- The base model can be used for generating meaningful embeddings of bulk structures without further training.
49
  This model is ideal if finetuned for narrowdown tasks.
50
 
51
  ### Downstream Use
@@ -55,84 +55,49 @@ This model can be used with fientuning for property prediction, classification o
55
 
56
  ## Bias, Risks, and Limitations
57
 
58
- > Model was trained only on bulk structures (**n0w0f/MatText - pretrain2m** - dataset) computationally investigated using DFT with GGA functional using VASP .
59
 
60
- ### Recommendations
61
 
62
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
63
 
64
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
65
 
66
  ## How to Get Started with the Model
67
 
68
- Use the code below to get started with the model.
69
-
70
- [More Information Needed]
 
71
 
72
  ## Training Details
73
 
74
  ### Training Data
75
 
76
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 
77
 
78
- [More Information Needed]
79
 
80
- ### Training Procedure
81
-
82
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
83
 
84
- #### Preprocessing [optional]
85
 
86
- [More Information Needed]
87
 
88
 
89
  #### Training Hyperparameters
90
 
91
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
92
-
93
- #### Speeds, Sizes, Times [optional]
94
 
95
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
96
 
97
- [More Information Needed]
98
-
99
- ## Evaluation
100
-
101
- <!-- This section describes the evaluation protocols and provides the results. -->
102
 
103
  ### Testing Data, Factors & Metrics
104
 
105
  #### Testing Data
106
 
107
- <!-- This should link to a Dataset Card if possible. -->
108
-
109
- [More Information Needed]
110
-
111
- #### Factors
112
-
113
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
114
 
115
- [More Information Needed]
116
-
117
- #### Metrics
118
-
119
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
120
 
121
- [More Information Needed]
122
-
123
- ### Results
124
 
125
- [More Information Needed]
126
-
127
- #### Summary
128
-
129
-
130
-
131
- ## Model Examination [optional]
132
-
133
- <!-- Relevant interpretability work for the model goes here -->
134
-
135
- [More Information Needed]
136
 
137
  ## Environmental Impact
138
 
@@ -140,31 +105,22 @@ Use the code below to get started with the model.
140
 
141
  Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
142
 
143
- - **Hardware Type:** [More Information Needed]
144
- - **Hours used:** [More Information Needed]
145
- - **Cloud Provider:** [More Information Needed]
146
- - **Compute Region:** [More Information Needed]
147
- - **Carbon Emitted:** [More Information Needed]
148
-
149
- ## Technical Specifications [optional]
150
-
151
- ### Model Architecture and Objective
152
-
153
- [More Information Needed]
154
-
155
- ### Compute Infrastructure
156
 
157
- [More Information Needed]
158
-
159
- #### Hardware
160
 
161
- [More Information Needed]
162
 
163
  #### Software
164
 
165
- [More Information Needed]
 
 
166
 
167
- ## Citation [optional]
168
 
169
  <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
170
 
@@ -176,20 +132,12 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
176
 
177
  [More Information Needed]
178
 
179
- ## Glossary [optional]
180
-
181
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
182
-
183
- [More Information Needed]
184
-
185
- ## More Information [optional]
186
 
187
- [More Information Needed]
188
-
189
- ## Model Card Authors [optional]
190
 
191
- [More Information Needed]
192
 
193
  ## Model Card Contact
194
 
195
- [More Information Needed]
 
 
45
 
46
  ### Direct Use
47
 
48
+ The base model can be used for generating meaningful features/embeddings of bulk structures without further training.
49
  This model is ideal if finetuned for narrowdown tasks.
50
 
51
  ### Downstream Use
 
55
 
56
  ## Bias, Risks, and Limitations
57
 
58
+ > Model was trained only on bulk structures (**n0w0f/MatText - pretrain2m** - dataset).
59
 
60
+ The pertaining dataset is a subset of the materials deposited in the NOMAD archive. We queried only 3D-connected structures (i.e., excluding 2D materials, which often require special treatment) and, for consistency, limited our query to materials for which the bandgap has been computed using the PBE functional and the VASP code.
61
 
62
+ ### Recommendations
63
 
 
64
 
65
  ## How to Get Started with the Model
66
 
67
+ ```python
68
+ from transformers import AutoModel
69
+ model = AutoModel.from_pretrained("n0w0f/MatText-cifp1-2m")
70
+ ```
71
 
72
  ## Training Details
73
 
74
  ### Training Data
75
 
76
+ **n0w0f/MatText - pretrain2m**
77
+ The dataset contains crystal structures in various text representations and labels for some subsets.
78
 
79
+ https://huggingface.co/datasets/n0w0f/MatText
80
 
81
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 
 
82
 
 
83
 
84
+ ### Training Procedure
85
 
86
 
87
  #### Training Hyperparameters
88
 
89
+ - **Training regime:** fp32 <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 
 
90
 
 
91
 
 
 
 
 
 
92
 
93
  ### Testing Data, Factors & Metrics
94
 
95
  #### Testing Data
96
 
97
+ https://huggingface.co/datasets/n0w0f/MatText/viewer/pretrain2m/test
 
 
 
 
 
 
98
 
 
 
 
 
 
99
 
 
 
 
100
 
 
 
 
 
 
 
 
 
 
 
 
101
 
102
  ## Environmental Impact
103
 
 
105
 
106
  Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
107
 
108
+ - **Hardware Type:** 8 A100 GPUs with 40GB
109
+ - **Hours used:** 72h
110
+ - **Cloud Provider:** Private Infrastructure
111
+ - **Compute Region:** US/EU
112
+ - **Carbon Emitted:** 250W x 72h = 18 kWh x 0.432 kg eq. CO2/kWh = 7.78 kg eq. CO2
 
 
 
 
 
 
 
 
113
 
 
 
 
114
 
115
+ ## Technical Specifications
116
 
117
  #### Software
118
 
119
+ Pretrained using https://github.com/lamalab-org/MatText
120
+
121
+ ## Citation
122
 
123
+ To be published
124
 
125
  <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
126
 
 
132
 
133
  [More Information Needed]
134
 
 
 
 
 
 
 
 
135
 
136
+ ## Model Card Authors
 
 
137
 
138
+ The model was trained by Nawaf Alampara ([n0w0f](https://github.com/n0w0f)), Santiago Miret ([LinkedIn]()), and Kevin Maik Jablonka ([kjappelbaum](https://github.com/kjappelbaum)).
139
 
140
  ## Model Card Contact
141
 
142
+ [Nawaf](https://github.com/n0w0f),
143
+ [Kevin](https://github.com/kjappelbaum)