whats this for?
hi! just wanted to know what is this for, what best is it for in your opinion? thanks.
Thanks for your interest @stormchaser !
This one is not good at anything yet, unfortunately. It's just a prototype for checking if the base model could be trained to follow instructions. But with the small data it was trained, there isn't much to do with it, unless confirming that it can be changed to follow instructions. Fortunately, I have another version of it under training, that will be more useful. (It may finish its training tomorrow)
The reason behind those super-small models is https://xenova.github.io/transformers.js - When we run those models directly in the browser, we're limited by their format and sizes. So for this library in particular, we convert the model to ONNX format, using 8-bit quantization, which is still big. To run in a desktop browser, the file has to be smaller than 1GB. And to run in a mobile browser, we need to keep it under 150MB.
It's hard to find small text-generation models with good outputs, but they exist; check MBZUAI/LaMini-Flan-T5-248M, which is a successful small instruction-tuned model. I've been using it in MiniSearch, for browsers that don't support WebGPU.
Felladrin/llama2_xs_460M_experimental_evol_instruct has been published a few hours ago.
It has double the size but has been trained with a larger instruction dataset.
It's better than TinyMistral-248M-Alpaca, if you want to try it.
Hello there!
I just wanted to share that my intention with the base model (I am the creator) was to demonstrate that we don't absolutely need trillion-scale datasets and that language models can be pretrained on a single GPU. Even though the model's performance isn't quite there yet, I'm optimistic that it will get better as more training is done. I think it'll take about another 5 days to wrap up this phase. After that, I'm looking forward to fully fine-tuning it on an instruction dataset (possibly a large one).
Great work, keep it up
Hello everyone!
Since the content of this discussion is now outdated, I will be closing it. However, I suggest that everyone keep an eye on the list of models that are derived from TinyMistral-248M, which can be found at https://huggingface.co/models?other=base_model:Locutusque/TinyMistral-248M.
I hope to see more and more derivatives, as this model is exceptional considering its size!