TowerInstruct takes twice as much space as TowerBase

#3
by bpop - opened

Hello all,

I've been experimenting with both TowerBase and TowerInstruct. When I load them in python, they both have the expected number of parameters (6.7 billion, give or take). However, they take up drastically different amounts of space in my .cache. TowerInstruct is 26G, while TowerBase is only 13G. I imagine this is because TowerBase's weights are stored as bf16, whereas TowerInstruct's are fp32. But...was this on purpose?

Sign up or log in to comment