whats this for?

by stormchaser - opened Nov 18, 2023

Discussion

stormchaser

Nov 18, 2023

hi! just wanted to know what is this for, what best is it for in your opinion? thanks.

Felladrin

Owner Nov 18, 2023

•

edited Nov 18, 2023

Thanks for your interest @stormchaser !

This one is not good at anything yet, unfortunately. It's just a prototype for checking if the base model could be trained to follow instructions. But with the small data it was trained, there isn't much to do with it, unless confirming that it can be changed to follow instructions. Fortunately, I have another version of it under training, that will be more useful. (It may finish its training tomorrow)

The reason behind those super-small models is https://xenova.github.io/transformers.js - When we run those models directly in the browser, we're limited by their format and sizes. So for this library in particular, we convert the model to ONNX format, using 8-bit quantization, which is still big. To run in a desktop browser, the file has to be smaller than 1GB. And to run in a mobile browser, we need to keep it under 150MB.

It's hard to find small text-generation models with good outputs, but they exist; check MBZUAI/LaMini-Flan-T5-248M, which is a successful small instruction-tuned model. I've been using it in MiniSearch, for browsers that don't support WebGPU.

Felladrin

Owner Nov 18, 2023

Felladrin/llama2_xs_460M_experimental_evol_instruct has been published a few hours ago.
It has double the size but has been trained with a larger instruction dataset.
It's better than TinyMistral-248M-Alpaca, if you want to try it.

Locutusque

Nov 18, 2023

Hello there!

I just wanted to share that my intention with the base model (I am the creator) was to demonstrate that we don't absolutely need trillion-scale datasets and that language models can be pretrained on a single GPU. Even though the model's performance isn't quite there yet, I'm optimistic that it will get better as more training is done. I think it'll take about another 5 days to wrap up this phase. After that, I'm looking forward to fully fine-tuning it on an instruction dataset (possibly a large one).

Sirclavin

Nov 24, 2023

Great work, keep it up

Felladrin

Owner Dec 11, 2023

•

edited Dec 11, 2023

Hello everyone!

Since the content of this discussion is now outdated, I will be closing it. However, I suggest that everyone keep an eye on the list of models that are derived from TinyMistral-248M, which can be found at https://huggingface.co/models?other=base_model:Locutusque/TinyMistral-248M.

I hope to see more and more derivatives, as this model is exceptional considering its size!

Felladrin changed discussion status to closed Dec 11, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment