pearsonkyle commited on
Commit
3ddee7e
1 Parent(s): 9eeba41
Files changed (5) hide show
  1. README.md +1 -1
  2. optimizer.pt +1 -1
  3. pytorch_model.bin +1 -1
  4. scheduler.pt +1 -1
  5. trainer_state.json +423 -3
README.md CHANGED
@@ -1,7 +1,7 @@
1
  # Exo-Machina
2
  A deep language model, GPT-2, is trained on scientific manuscripts from NASA's Astrophysical Data System pertaining to extrasolar planets and the references therein. This pilot study uses the abstracts of each article as training data in order to explore correlations in scientific literature from a language perspective. A language model is a mathematical representation for an algorithm used to generate sequences in the same way a human would to form sentances. Each word or letter in a sentance is encoded to a numerical value (e.g. using word2vec) and is appended to a list forming sequences that represent up to a paragraph worth of text. The sequences are fed into the [GPT-2](https://openai.com/blog/better-language-models/) 117M model and trained for 500,000 steps with fine tuning. After training, the language model is used to generate new text from scratch and from user input.
3
 
4
- - ### [Browse some samples](https://pearsonkyle.github.io/Exo-Machina/)
5
 
6
  - ### [Train a model on Google Colab](https://colab.research.google.com/drive/1Pur0rFi5YVdn7axYRacXWFMic4NxRexV?usp=sharing)
7
 
 
1
  # Exo-Machina
2
  A deep language model, GPT-2, is trained on scientific manuscripts from NASA's Astrophysical Data System pertaining to extrasolar planets and the references therein. This pilot study uses the abstracts of each article as training data in order to explore correlations in scientific literature from a language perspective. A language model is a mathematical representation for an algorithm used to generate sequences in the same way a human would to form sentances. Each word or letter in a sentance is encoded to a numerical value (e.g. using word2vec) and is appended to a list forming sequences that represent up to a paragraph worth of text. The sequences are fed into the [GPT-2](https://openai.com/blog/better-language-models/) 117M model and trained for 500,000 steps with fine tuning. After training, the language model is used to generate new text from scratch and from user input.
3
 
4
+ - ### [Browse samples](https://pearsonkyle.github.io/Exo-Machina/)
5
 
6
  - ### [Train a model on Google Colab](https://colab.research.google.com/drive/1Pur0rFi5YVdn7axYRacXWFMic4NxRexV?usp=sharing)
7
 
optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a221bc4b1968eda415bba8c105db6a2b8b579edc18d1e3592730d101e8d80126
3
  size 995610991
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3c30d157af2f57cabc3a5a2c810287a09033fbf373612568cef620f66f82e0f8
3
  size 995610991
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:221f869bcca2d877a6bdf0f9b7bf14aac02003800ac264f8b39100f9db2d1f1c
3
  size 510407951
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c305cbf33d44d65fd5ea0fb1a4d8c4aaeae0fd2dc88eb7ce4ffb2c5e954ecb93
3
  size 510407951
scheduler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4538d9d56412255e685cbe4b709f49c4c2ce0cf165b70b84bab6f912af923e07
3
  size 623
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c6149ede9fc96f56bbd7ff6a474936250fd17b0a96c98ce43da847600b309d96
3
  size 623
trainer_state.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
- "epoch": 63.36296576508417,
5
- "global_step": 335000,
6
  "is_hyper_param_search": false,
7
  "is_local_process_zero": true,
8
  "is_world_process_zero": true,
@@ -2016,11 +2016,431 @@
2016
  "learning_rate": 1.8318517117457917e-05,
2017
  "loss": 1.382125,
2018
  "step": 335000
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2019
  }
2020
  ],
2021
  "max_steps": 528700,
2022
  "num_train_epochs": 100,
2023
- "total_flos": 512235918148829184,
2024
  "trial_name": null,
2025
  "trial_params": null
2026
  }
 
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
+ "epoch": 76.60298846226594,
5
+ "global_step": 405000,
6
  "is_hyper_param_search": false,
7
  "is_local_process_zero": true,
8
  "is_world_process_zero": true,
 
2016
  "learning_rate": 1.8318517117457917e-05,
2017
  "loss": 1.382125,
2018
  "step": 335000
2019
+ },
2020
+ {
2021
+ "epoch": 63.55210894647248,
2022
+ "learning_rate": 1.8223945526763762e-05,
2023
+ "loss": 1.3925,
2024
+ "step": 336000
2025
+ },
2026
+ {
2027
+ "epoch": 63.74125212786079,
2028
+ "learning_rate": 1.8129373936069604e-05,
2029
+ "loss": 1.3989375,
2030
+ "step": 337000
2031
+ },
2032
+ {
2033
+ "epoch": 63.9303953092491,
2034
+ "learning_rate": 1.8034802345375452e-05,
2035
+ "loss": 1.4040625,
2036
+ "step": 338000
2037
+ },
2038
+ {
2039
+ "epoch": 64.11953849063741,
2040
+ "learning_rate": 1.7940230754681294e-05,
2041
+ "loss": 1.3838125,
2042
+ "step": 339000
2043
+ },
2044
+ {
2045
+ "epoch": 64.30868167202573,
2046
+ "learning_rate": 1.784565916398714e-05,
2047
+ "loss": 1.3745625,
2048
+ "step": 340000
2049
+ },
2050
+ {
2051
+ "epoch": 64.49782485341403,
2052
+ "learning_rate": 1.7751087573292984e-05,
2053
+ "loss": 1.381875,
2054
+ "step": 341000
2055
+ },
2056
+ {
2057
+ "epoch": 64.68696803480235,
2058
+ "learning_rate": 1.765651598259883e-05,
2059
+ "loss": 1.3859375,
2060
+ "step": 342000
2061
+ },
2062
+ {
2063
+ "epoch": 64.87611121619065,
2064
+ "learning_rate": 1.756194439190467e-05,
2065
+ "loss": 1.4001875,
2066
+ "step": 343000
2067
+ },
2068
+ {
2069
+ "epoch": 65.06525439757897,
2070
+ "learning_rate": 1.7467372801210516e-05,
2071
+ "loss": 1.387,
2072
+ "step": 344000
2073
+ },
2074
+ {
2075
+ "epoch": 65.25439757896727,
2076
+ "learning_rate": 1.737280121051636e-05,
2077
+ "loss": 1.3645625,
2078
+ "step": 345000
2079
+ },
2080
+ {
2081
+ "epoch": 65.44354076035559,
2082
+ "learning_rate": 1.7278229619822206e-05,
2083
+ "loss": 1.373375,
2084
+ "step": 346000
2085
+ },
2086
+ {
2087
+ "epoch": 65.6326839417439,
2088
+ "learning_rate": 1.7183658029128048e-05,
2089
+ "loss": 1.384,
2090
+ "step": 347000
2091
+ },
2092
+ {
2093
+ "epoch": 65.82182712313221,
2094
+ "learning_rate": 1.7089086438433897e-05,
2095
+ "loss": 1.386875,
2096
+ "step": 348000
2097
+ },
2098
+ {
2099
+ "epoch": 66.01097030452053,
2100
+ "learning_rate": 1.6994514847739738e-05,
2101
+ "loss": 1.3924375,
2102
+ "step": 349000
2103
+ },
2104
+ {
2105
+ "epoch": 66.20011348590883,
2106
+ "learning_rate": 1.6899943257045583e-05,
2107
+ "loss": 1.357375,
2108
+ "step": 350000
2109
+ },
2110
+ {
2111
+ "epoch": 66.38925666729715,
2112
+ "learning_rate": 1.680537166635143e-05,
2113
+ "loss": 1.3655625,
2114
+ "step": 351000
2115
+ },
2116
+ {
2117
+ "epoch": 66.57839984868545,
2118
+ "learning_rate": 1.6710800075657274e-05,
2119
+ "loss": 1.3670625,
2120
+ "step": 352000
2121
+ },
2122
+ {
2123
+ "epoch": 66.76754303007377,
2124
+ "learning_rate": 1.661622848496312e-05,
2125
+ "loss": 1.3799375,
2126
+ "step": 353000
2127
+ },
2128
+ {
2129
+ "epoch": 66.95668621146207,
2130
+ "learning_rate": 1.6521656894268964e-05,
2131
+ "loss": 1.3880625,
2132
+ "step": 354000
2133
+ },
2134
+ {
2135
+ "epoch": 67.14582939285039,
2136
+ "learning_rate": 1.642708530357481e-05,
2137
+ "loss": 1.3585,
2138
+ "step": 355000
2139
+ },
2140
+ {
2141
+ "epoch": 67.3349725742387,
2142
+ "learning_rate": 1.633251371288065e-05,
2143
+ "loss": 1.3579375,
2144
+ "step": 356000
2145
+ },
2146
+ {
2147
+ "epoch": 67.52411575562701,
2148
+ "learning_rate": 1.6237942122186496e-05,
2149
+ "loss": 1.3641875,
2150
+ "step": 357000
2151
+ },
2152
+ {
2153
+ "epoch": 67.71325893701533,
2154
+ "learning_rate": 1.614337053149234e-05,
2155
+ "loss": 1.3701875,
2156
+ "step": 358000
2157
+ },
2158
+ {
2159
+ "epoch": 67.90240211840363,
2160
+ "learning_rate": 1.6048798940798186e-05,
2161
+ "loss": 1.375875,
2162
+ "step": 359000
2163
+ },
2164
+ {
2165
+ "epoch": 68.09154529979195,
2166
+ "learning_rate": 1.5954227350104027e-05,
2167
+ "loss": 1.359,
2168
+ "step": 360000
2169
+ },
2170
+ {
2171
+ "epoch": 68.28068848118025,
2172
+ "learning_rate": 1.5859655759409876e-05,
2173
+ "loss": 1.3483125,
2174
+ "step": 361000
2175
+ },
2176
+ {
2177
+ "epoch": 68.46983166256857,
2178
+ "learning_rate": 1.5765084168715718e-05,
2179
+ "loss": 1.3565,
2180
+ "step": 362000
2181
+ },
2182
+ {
2183
+ "epoch": 68.65897484395687,
2184
+ "learning_rate": 1.5670512578021563e-05,
2185
+ "loss": 1.362125,
2186
+ "step": 363000
2187
+ },
2188
+ {
2189
+ "epoch": 68.84811802534519,
2190
+ "learning_rate": 1.5575940987327408e-05,
2191
+ "loss": 1.36725,
2192
+ "step": 364000
2193
+ },
2194
+ {
2195
+ "epoch": 69.0372612067335,
2196
+ "learning_rate": 1.5481369396633253e-05,
2197
+ "loss": 1.3636875,
2198
+ "step": 365000
2199
+ },
2200
+ {
2201
+ "epoch": 69.22640438812181,
2202
+ "learning_rate": 1.5386797805939095e-05,
2203
+ "loss": 1.33725,
2204
+ "step": 366000
2205
+ },
2206
+ {
2207
+ "epoch": 69.41554756951012,
2208
+ "learning_rate": 1.5292226215244943e-05,
2209
+ "loss": 1.3505,
2210
+ "step": 367000
2211
+ },
2212
+ {
2213
+ "epoch": 69.60469075089843,
2214
+ "learning_rate": 1.5197654624550786e-05,
2215
+ "loss": 1.3563125,
2216
+ "step": 368000
2217
+ },
2218
+ {
2219
+ "epoch": 69.79383393228674,
2220
+ "learning_rate": 1.510308303385663e-05,
2221
+ "loss": 1.3588125,
2222
+ "step": 369000
2223
+ },
2224
+ {
2225
+ "epoch": 69.98297711367505,
2226
+ "learning_rate": 1.5008511443162473e-05,
2227
+ "loss": 1.369125,
2228
+ "step": 370000
2229
+ },
2230
+ {
2231
+ "epoch": 70.17212029506337,
2232
+ "learning_rate": 1.491393985246832e-05,
2233
+ "loss": 1.3346875,
2234
+ "step": 371000
2235
+ },
2236
+ {
2237
+ "epoch": 70.36126347645167,
2238
+ "learning_rate": 1.4819368261774163e-05,
2239
+ "loss": 1.338,
2240
+ "step": 372000
2241
+ },
2242
+ {
2243
+ "epoch": 70.55040665783999,
2244
+ "learning_rate": 1.4724796671080007e-05,
2245
+ "loss": 1.3443125,
2246
+ "step": 373000
2247
+ },
2248
+ {
2249
+ "epoch": 70.73954983922829,
2250
+ "learning_rate": 1.4630225080385854e-05,
2251
+ "loss": 1.3544375,
2252
+ "step": 374000
2253
+ },
2254
+ {
2255
+ "epoch": 70.9286930206166,
2256
+ "learning_rate": 1.4535653489691697e-05,
2257
+ "loss": 1.3595625,
2258
+ "step": 375000
2259
+ },
2260
+ {
2261
+ "epoch": 71.11783620200492,
2262
+ "learning_rate": 1.444108189899754e-05,
2263
+ "loss": 1.343875,
2264
+ "step": 376000
2265
+ },
2266
+ {
2267
+ "epoch": 71.30697938339323,
2268
+ "learning_rate": 1.4346510308303387e-05,
2269
+ "loss": 1.332625,
2270
+ "step": 377000
2271
+ },
2272
+ {
2273
+ "epoch": 71.49612256478154,
2274
+ "learning_rate": 1.425193871760923e-05,
2275
+ "loss": 1.3395625,
2276
+ "step": 378000
2277
+ },
2278
+ {
2279
+ "epoch": 71.68526574616985,
2280
+ "learning_rate": 1.4157367126915074e-05,
2281
+ "loss": 1.344125,
2282
+ "step": 379000
2283
+ },
2284
+ {
2285
+ "epoch": 71.87440892755816,
2286
+ "learning_rate": 1.406279553622092e-05,
2287
+ "loss": 1.3505625,
2288
+ "step": 380000
2289
+ },
2290
+ {
2291
+ "epoch": 72.06355210894647,
2292
+ "learning_rate": 1.3968223945526764e-05,
2293
+ "loss": 1.3428125,
2294
+ "step": 381000
2295
+ },
2296
+ {
2297
+ "epoch": 72.25269529033478,
2298
+ "learning_rate": 1.387365235483261e-05,
2299
+ "loss": 1.324875,
2300
+ "step": 382000
2301
+ },
2302
+ {
2303
+ "epoch": 72.44183847172309,
2304
+ "learning_rate": 1.3779080764138453e-05,
2305
+ "loss": 1.332125,
2306
+ "step": 383000
2307
+ },
2308
+ {
2309
+ "epoch": 72.6309816531114,
2310
+ "learning_rate": 1.36845091734443e-05,
2311
+ "loss": 1.339,
2312
+ "step": 384000
2313
+ },
2314
+ {
2315
+ "epoch": 72.82012483449972,
2316
+ "learning_rate": 1.3589937582750143e-05,
2317
+ "loss": 1.3420625,
2318
+ "step": 385000
2319
+ },
2320
+ {
2321
+ "epoch": 73.00926801588803,
2322
+ "learning_rate": 1.3495365992055986e-05,
2323
+ "loss": 1.342875,
2324
+ "step": 386000
2325
+ },
2326
+ {
2327
+ "epoch": 73.19841119727634,
2328
+ "learning_rate": 1.3400794401361833e-05,
2329
+ "loss": 1.3198125,
2330
+ "step": 387000
2331
+ },
2332
+ {
2333
+ "epoch": 73.38755437866465,
2334
+ "learning_rate": 1.3306222810667676e-05,
2335
+ "loss": 1.3253125,
2336
+ "step": 388000
2337
+ },
2338
+ {
2339
+ "epoch": 73.57669756005296,
2340
+ "learning_rate": 1.321165121997352e-05,
2341
+ "loss": 1.328875,
2342
+ "step": 389000
2343
+ },
2344
+ {
2345
+ "epoch": 73.76584074144127,
2346
+ "learning_rate": 1.3117079629279367e-05,
2347
+ "loss": 1.335875,
2348
+ "step": 390000
2349
+ },
2350
+ {
2351
+ "epoch": 73.95498392282958,
2352
+ "learning_rate": 1.302250803858521e-05,
2353
+ "loss": 1.3428125,
2354
+ "step": 391000
2355
+ },
2356
+ {
2357
+ "epoch": 74.14412710421789,
2358
+ "learning_rate": 1.2927936447891053e-05,
2359
+ "loss": 1.3216875,
2360
+ "step": 392000
2361
+ },
2362
+ {
2363
+ "epoch": 74.3332702856062,
2364
+ "learning_rate": 1.2833364857196897e-05,
2365
+ "loss": 1.3190625,
2366
+ "step": 393000
2367
+ },
2368
+ {
2369
+ "epoch": 74.52241346699452,
2370
+ "learning_rate": 1.2738793266502744e-05,
2371
+ "loss": 1.3250625,
2372
+ "step": 394000
2373
+ },
2374
+ {
2375
+ "epoch": 74.71155664838282,
2376
+ "learning_rate": 1.2644221675808587e-05,
2377
+ "loss": 1.3285625,
2378
+ "step": 395000
2379
+ },
2380
+ {
2381
+ "epoch": 74.90069982977114,
2382
+ "learning_rate": 1.254965008511443e-05,
2383
+ "loss": 1.3365625,
2384
+ "step": 396000
2385
+ },
2386
+ {
2387
+ "epoch": 75.08984301115945,
2388
+ "learning_rate": 1.2455078494420275e-05,
2389
+ "loss": 1.3183125,
2390
+ "step": 397000
2391
+ },
2392
+ {
2393
+ "epoch": 75.27898619254776,
2394
+ "learning_rate": 1.236050690372612e-05,
2395
+ "loss": 1.309375,
2396
+ "step": 398000
2397
+ },
2398
+ {
2399
+ "epoch": 75.46812937393607,
2400
+ "learning_rate": 1.2265935313031966e-05,
2401
+ "loss": 1.31425,
2402
+ "step": 399000
2403
+ },
2404
+ {
2405
+ "epoch": 75.65727255532438,
2406
+ "learning_rate": 1.217136372233781e-05,
2407
+ "loss": 1.32575,
2408
+ "step": 400000
2409
+ },
2410
+ {
2411
+ "epoch": 75.84641573671269,
2412
+ "learning_rate": 1.2076792131643656e-05,
2413
+ "loss": 1.330375,
2414
+ "step": 401000
2415
+ },
2416
+ {
2417
+ "epoch": 76.035558918101,
2418
+ "learning_rate": 1.1982220540949501e-05,
2419
+ "loss": 1.3319375,
2420
+ "step": 402000
2421
+ },
2422
+ {
2423
+ "epoch": 76.22470209948932,
2424
+ "learning_rate": 1.1887648950255344e-05,
2425
+ "loss": 1.2995625,
2426
+ "step": 403000
2427
+ },
2428
+ {
2429
+ "epoch": 76.41384528087762,
2430
+ "learning_rate": 1.179307735956119e-05,
2431
+ "loss": 1.309,
2432
+ "step": 404000
2433
+ },
2434
+ {
2435
+ "epoch": 76.60298846226594,
2436
+ "learning_rate": 1.1698505768867033e-05,
2437
+ "loss": 1.3225,
2438
+ "step": 405000
2439
  }
2440
  ],
2441
  "max_steps": 528700,
2442
  "num_train_epochs": 100,
2443
+ "total_flos": 619270336176979968,
2444
  "trial_name": null,
2445
  "trial_params": null
2446
  }