Most similar existing model architecture?

#3
by cckm - opened

I would love to try this model on inference platforms like llama.cpp and MLC. However, these platforms require some custom code for model conversion, so it would be easiest if I could start from the conversion code of an existing model, and then adapt it for MobiLlama. Which model's conversion code would you recommend I start from, and what are the key changes I need to pay attention to?

These are the architectures currently converted by llama.cpp:
https://github.com/ggerganov/llama.cpp/blob/052051d8ae4639a1c3c61e7da3237bcc572469d4/convert-hf-to-gguf.py#L178

and by MLC:
https://github.com/mlc-ai/mlc-llm/tree/main/python/mlc_chat/model

Ah, I see it now. The biggest change comes from params savings from sharing the MLP across all layers.

cckm changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment