hotsuyuki/gpt_1.3B_global_step2000_zero-1_dp-4_pp-2_tp-2_flashattn2-on Text Generation • Updated Mar 3 • 1
hotsuyuki/gpt_0.125B_global_step90000_zero-1_dp-16_pp-1_tp-1_flashattn2-on Text Generation • Updated Feb 29
hotsuyuki/gpt_0.125B_global_step300_zero-1_dp-1_pp-4_tp-4_flashattn2-on Text Generation • Updated Feb 29
hotsuyuki/gpt_0.125B_global_step300_zero-1_dp-8_pp-1_tp-2_flashattn2-on Text Generation • Updated Feb 29 • 1
hotsuyuki/gpt_0.125B_global_step300_zero-1_dp-8_pp-2_tp-1_flashattn2-on Text Generation • Updated Feb 29 • 1
hotsuyuki/gpt_0.125B_global_step30000_zero-1_dp-4_pp-2_tp-2_flashattn2-on Text Generation • Updated Feb 29