Performance oddities of the 3M model.
#3 opened 23 days ago
by
MartialTerran
GPT-2 model having16 4-float attention heads
#2 opened 23 days ago
by
MartialTerran
Adding `safetensors` variant of this model
#1 opened 6 months ago
by
SFconvertbot