Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

This is the Chameleon-7b checkpoint, converted using the script convert_chameleon_weights_to_hf.py from the Lumina-mGPT repository.

This release is intended to ease the initialization of Lumina-mGPT training. Before using this model, please ensure you have obtained permission to access the official Chameleon checkpoints available at Hugging Face. Usage of this model is at the user's own risk.

Differences from the official chameleon-7B release

This model is almost the same as the official chameleon-7B release, with one important difference in the qk-norm implementation: Due to unknown reasons, for the 34B Chameleon model, where 8-way model parallelism is employed during training, the weights in the qk-norm layers, which are expected to be the same across model-parallel ranks, are found to be different (See here for details). More intuitively, this means that the attention heads can be divided into 1 group for 7B model and 8 groups for 34B model, where the qk-norm parameters are the same within the groups but different among them. To mitigate this problem, transformers has developed the implementation to copy the qk-norm parameters to the shape num_heads * head_dim, however, this means that if we want to further finetune the Chameleon model, like the case of Lumina-mGPT, the qk-norm parameters will further diverge to the extent that the parameters are different between every two attention heads, which is not ideal. To solve this problem, we slightly change the implementation so that the qk-norm parameters are instead of shape model_parallel_size x head_dim, where model_parallel_size is 1 for 7B model and 8 for 34B model, and they are expanded to num_heads * head_dim during forward time through repeat_interleave. This modification ensures that the qk-norm parameters can always be consistent within existing groups.

Downloads last month
381
Safetensors
Model size
7.04B params
Tensor type
F32
·
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.