JunxiongWang/MambaByte_PG19_972M · Issues with config file

In config.json rms_norm is set to True, but the pre-training model contains bias parameters such as backbone.layers.1.norm.bias, so it is necessary to set rms_norm to False. Also, in config_mamba.py, MambaConfig does not contain the pad_id member variable, so the "pad_id": 0 needs to be removed.

Note: All changes are made in mamba-ssm==1.2.0. After that you can load the pre-trained model and run it without any problem.

Also, the input_id line in the example code may better be changed to:

input_ids = torch.from_numpy(text_byte[None, :].copy()).long().cuda()

It is a great paper! Thanks to all the authors for this wonderful research.