Differences bewteen OrionForCausalLM and LlamaForCausalLM

#5
by J22 - opened

As far as I can tell, the only differences are that input_layernorm, post_attention_layernorm and final norm are changed to nn.LayerNorm from LlamaRMSNorm.

The attention and embedding are also different by trust remote code

Sign up or log in to comment