参数总数应该 7000293376 吧?

#16
by J22 - opened

64000x4096 + 32 (4096x12288 + 4096x4096 + 4096x11008 + 11008x4096 + 4096x11008) + 4096x64000 = 7000293376

decoder中还有 input_layernormpost_attention_layernorm ,以及在LM_head前的norm
7000293376 + 32 * (1 * 4096 * 2) + 1 * 4096= 7000559616

明白了,谢谢

64000x4096 + 32 (4096x12288 + 4096x4096 + 4096x11008 + 11008x4096 + 4096x11008) + 4096x64000 = 7000293376

@J22 请问12288是指什么

12288 是把 $W_q$, $W_k$, $W_v$ 打包到一起的 W_pack。每个的大小都 4096 * 4096.

J22 changed discussion status to closed

Sign up or log in to comment