Why k/v proj exist in layers where "is_kv_shared_layer = True"?

#23
by jiayuanm - opened

Given the model's unique kv state sharing, k/v proj and k/v norm are not needed in layers where is_kv_shared_layer = True.

However, in the released checkpoint, k/v proj/norm exist for all layers. Just out of curiosity, what are in these unused k/v proj/norm?

Sign up or log in to comment