Good one...

by usermma - opened 1 day ago

its very slow on cpu (slower than 350M model..)

i tried to GGUF it, but always failed...

CLI chat — type /help for commands. No context memory (each prompt is standalone).
you: Hi
assistant: Hello! I'm Open Assistant, a chat-based assistant that can answer questions and answers to various fields. It's well known that you're interested in the development of a language model, but I don't have access to it. However, I can provide you with some general ideas for building a language model using a variety of languages.

Here are some general ideas:

A language model model that is designed to be easy to understand and learn.
A language model that is designed to be easy to understand and learn.
A language model that is designed to be easy to understand and learn.
A language model that is designed to be easy to understand and learn.
A language model that is designed to be easy to understand and learn.
A language model that is designed to be easy to understand and learn.
A language model that is designed to be easy to understand and learn.
you: Try Again.
assistant: I'm sorry, but I cannot solve this problem without you. It is important to take care of your situation and avoid any potential harm or harm that may cause the problem. Please let me know what you need to do next!
you:

ThingsAI

ThingAI org 1 day ago

Hi, thanks again for testing. You're right — the CPU speed issue is real, and I think I know exactly why.

The core problem is the tokenizer.
This model was originally designed to be ~35M parameters, but it inherited the 65,536-token vocabulary from the larger Quark family. With a hidden size of 512, the embedding matrix alone is 65,536 × 512 = 33.5 million parameters — that's almost half the model's total size just for looking up tokens.

For a tiny model (<100M), this is a bad trade-off: the embedding eats up parameters that should have gone into the transformer layers, and the large vocabulary makes every forward pass more expensive, especially on CPU where large matrix operations aren't as optimized.

What I'm planning to do:

Train a smaller tokenizer (8K–16K tokens) specifically for tiny models

Re-train a proper 35M–40M model with that tokenizer — much faster, lighter, and more coherent

Release proper GGUF/ONNX versions once the architecture is stable

For now, the 72M model is more of a lesson learned than a usable release. I appreciate you trying it out — feedback like this is exactly what helps me fix things.

I'll update the repo once I have a better version. Thanks for the patience!

usermma

1 day ago

oh wait... i see the problem...
its:
No context memory (each prompt is standalone).
so when i send the next message, they didn't seems to see the previous one ("Hi")...

usermma

about 23 hours ago

Apache 2.0 is better than MIT because it offers legal safeguards against submarine patents. It does so through an explicit, irrevocable patent grant from contributors and a defensive termination clause that automatically revokes this grant if the user initiates a patent infringement lawsuit.

so more orgs may look into this model... instead of let it being forgetted...

usermma

about 23 hours ago

this is it in your prefered language:

translation by DSeek
"Il motivo principale per cui Apache 2.0 è superiore alla licenza MIT risiede nel fatto che include clausole esplicite per proteggere gli utenti dai cosiddetti 'brevetti sommergibile' (submarine patents) — brevetti che vengono volutamente tenuti nascosti finché una tecnologia non si è ampiamente diffusa, per essere fatti valere solo in un secondo momento, al fine di esigere il pagamento di royalty."

ThingsAI

ThingAI org about 16 hours ago

i changed the license

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment