Amazing model

by CatUkraine - opened Mar 30

Mar 30

Hello,
This is mind-blowing! It is only 2.43 GB in size!
Is the token embedding matrix and attention layers weights also in 2-bit?
Also Java and C is Turing-complete, so, in theory, if reimplement forward pass of this model, process of loading weights and tokenizer in one of these languages, then it is probably possible to run this model on a smartphone with very optimized Android system and no other apps open.

mlabonne

Owner Mar 31

Thanks @CatUkraine ! No, as far as I know, only the linear layers are replaced using HQQ.

CatUkraine

Apr 1

Thank you for response @mlabonne ! I am starting to understand rotary positional encoding and attention, and it is not as hard as i expected. I am going to port your model(and probably some other, smaller models) to some platforms and my devices.

Kalemnor

Apr 2

Thanks @CatUkraine ! No, as far as I know, only the linear layers are replaced using HQQ.

Any chance to see a HQQ+ version of this?

mlabonne

Owner Apr 2

@Kalemnor No plans at the moment but it would be fun!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment