appvoid/arco-mini-run-1
Updated
•
8
this represent an effort to discover optimal improvements on slms using curriculum learning as n+10k samples on a 250m parameters danube model
Note final loss: 2.353
Note final loss: 2.1243
Note final loss: 1.7117
Note final loss: 1.5411
Note final loss: 1.558 (high learning rate dropped performance, working on it)
Note arco-mini-run-5 trained on 25k additional samples