I can't figure out how this was done. Is it because qwen's structure is very similar to llama's, and some parameters were directly transferred and retrained?
· Sign up or log in to comment