segfault when trying to run the model using chat_full.py
I have no issues when trying to run the models from anemll HF. but this model causes segfault and crashes python... Both the chat.py and chat_full.py scripts crash at the same line:
state = create_unified_state(ffn_models, metadata['context_length'])
Hmm, strange, these files work for me. Have you tried passing in context as a parameter? I often have issues with the meta file not being read properly.
strange... the 2048 ctx model works. I will try redownloading the model...
I followed the same process for both ctx models that I have uploaded and both work locally. But it's still possible that the files are the issues, who knows
still the same issue persists after the redownload...
I'm thinking maybe it's a hardware limitation in M1 ANE? what are you running this on?
I got a base m4 mbp. Not sure, I guess it seems to be the case x)
Personally I can't seem to convert models larger than 3k context but Anemll creators can
If someone else has the same issues for any of the above maybe you can put an issue about it on the anemll git repo cus right now I don't know if this is only local or for everyone
im curious what kind of TPS are you getting on M4? im getting 17tps on M1 air with Llama 3.2 1B 2048ctx
512 context - 46.9 tps
2048 context - 22.0 tps
3072 context - 14.5 tps
For this Llama 3.2 1B instruct
Btw when you downloaded these two models how far down the instructions did you have to go? I was thinking that they might be excessive and its enough to just lfs clone the repo and run it with chat.py straight away?