Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
as-cle-bertΒ 
posted an update 4 days ago
Post
1981
Hi HF community!πŸ€—
Hope y'all are as excited as me for the release of Llama 3.1! πŸ¦™
Following the release, I built a space exploiting HF Inference API, thanks to a recipe you can find in this awesome GitHub repo (https://github.com/huggingface/huggingface-llama-recipes/): you can now run Llama-3.1-405B customizing its system instructions and other parameters, for free! πŸ˜‡
Follow this link: as-cle-bert/Llama-3.1-405B-FP8 and let the fun begin!πŸ•

very interseting as this time they DID update the codebase So it is a new model !
Forget the training !!!
most important is the codebase changes and context exteensions and sliding window implementaions as awell as rotary and scalled embeddings , they have not added the ring embeddings yet !

intesting againn is that ALL models are generally clones of the llama codebase !!
so they all enjoy incresed capabilitys :
mistral actually copied the llama codebase 100% with no changes !!!

Obviousy check out the codebases in the trnsformer library !

but in general the mistral 7b Will still outperform them as the NUMBERS are correct !
the llama 3 and all this models are released with BAD numbers with pure mismatches ! ( this is the trick when you want to release open source models and NOT share the capabilitys with the public ! ( in fact they are supposed to know the right numbers and generate a model for them self and pretrain yourself !
or they would be releasing a comerically READY! model !
the comerically Ready Models (guarded ) are kept on the company hosts !!!

So go and generate a model with the correct values and you will have a good model ! - ( mistral also realized this and released nemo ( 5120 hidden size (this is a bomb to the model )<<<< 5120 hidden size does noit follow ANY convention and cannot even factor down by 2 ! to a standard bit or byte size !<
hence all mathmatical operations (training and tensor calcs ) will be intensive and unnnatural breeding unnatural numbers for the model !( hence bad performance ) ----

So Pretrianing is a waste !