s3nh

s3nh

AI & ML interests

Quantization, LLMs, Deep Learning for good. Follow me if you like my work. Patreon.com/s3nh

Organizations

s3nh's activity

replied to Undi95's post 3 months ago
view reply

awesome work as always, as you said, sadly these models are freaking heavy for gpu poor/cpu user ^^

replied to JustinLin610's post 3 months ago
view reply

For me Qwen always was top notch, especially the fact that it was possible to prompt in different languages. Awesome work, cannot wait to test the 1.5 stack ^^

replied to BramVanroy's post 3 months ago
replied to their post 3 months ago
replied to their post 3 months ago
posted an update 3 months ago
view post
Post
GPU Poor POV: Burnout

Sometimes we do not have an energy to post about AI and new methods.
And thats totally ok, I guess.
Remember to sleep well and drink a lot of water. Have a great day :D <3
posted an update 3 months ago
view post
Post
GPU Poor POV: Quantization

Today I want to share with you my notebook plug and play code
which help me a lot through my quantization journey.
Hope youll find it interesting it could be a good starter point to
gguf some of your awesome models :)

Have a great day <3

https://s3nh.bearblog.dev/gpu-poor-pov-gguf-snippet/
·
replied to their post 3 months ago
view reply

Uh, gist was wrongly formatted now seems to be good ^^

replied to their post 3 months ago
view reply

Glad you like it! :) I tried qdrant for a while, also postgres with some vector db addon, but I really wanted to build something lightweight. Started to combine tinydb with .pt files (pt as a embedding storage ;D) but then chroma somehow appear and it was relatively easy to start.

posted an update 3 months ago
view post
Post
GPU Poor POV: Willingness of Customization

I love to use libraries in which you can customize a lot of things. Chromadb is my choice of db if it comes to store embeddings. Te cool feature is that you can define your own embeddings function which can be called on every chromadb collection initialisation or creation. It is useful because sometimes we want to use different prompts, different models, and it can be easily written as inheritence from EmbeddingFunction class.

Edit:

My CustomEmbeddingFunction can be found here:
https://gist.github.com/s3nh/cfbbf43f5e9e3cfe8c3e4e2f0d550b80

and you can use it by initializing or calling the chroma collection.

import chromadb 
from your_custom_fn import CustomEmbeddingFunction
class ChromaStorage:
    def __init__(self, config):
        self.config = config
        self.client = self.init_client()
        self.embedding_function = CustomEmbeddingFunction()

    def check_config(self):
        assert os.path.exists(self.config.path), ValueError('Provided path does not exists!!')

    def init_client(self):
        return chromadb.PersistentClient(path = self.config.path,)

    def init_collection(self, name: str): 
        return self.client.get_or_create_collection(name = name, embedding_function = self.embedding_function)
  • 3 replies
·
replied to their post 3 months ago
view reply

Could you please try to change this line to

pip install e  .

We do not need deep speed and flash attention for this exact run. Also, I forgot to mention that it is tested to run on Linux environment. Maybe I should prepare colab notebook, itll be much nicer for eventual error tracking.
Let me know if it works, you can also DM me on x.com/s3nhs3nh , Ill be glad to help ^^

replied to clem's post 3 months ago
view reply

There is a lot of assumptions in this article, I like the idea but in my opinion the AGI thing is stricly combines with willingness to find the place in which these automation will be really useful. It was relatively straight forward in autonomus cars, its more blurry in AGI thing. I see it like Einsteins approach to describe everything with one equation, then quantim physics appears. Hopefully we are at the beginning of a really interesting journey, and we have impact of how itll look like 🤗

posted an update 3 months ago
view post
Post
GPU Poor POV: Dont be Afraid :D

Sometimes we dont want to do something because of low self esteem,
I ofter hear 'its to hard for me','i am not an expert','i do not know how to do it', etc. These words are never the truth, we should not be afraid and try to build something because there is no additive value without a failure.

Same things comes in LLMs, there is a lot of fancy words happening, but whats is more important is that there are also people who are constantly building so other can build. Diving into finetuning LLMs is incredibly simple if we assume using axolotl library and pretrains stored on huggingface.

All we need is an idea, our GPU Poor desktop or colab notebooks and these steps:
git clone https://github.com/OpenAccess-AI-Collective/axolotl
cd axolotl

pip3 install packaging
pip3 install -e '.[flash-attn,deepspeed]'

After installation process we can go to examples, and modify configs to our own needs.
Lets jump into
axolotl\examples\llama-2\qlora.yml

and change
base_model: NousResearch/Llama-2-7b-hf

to
base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0

choose dataset from huge amounts of dataset that are possible to use from hf.co/datasets and tweak additional params like batch_size, number of epochs, how often do we want to save our model and many more (which I wont focus on rn).
Then,
accelerate launch -m axolotl.cli.train examples/llama-2/qlora.yml

Will allow to start the finetuning process on structure defined strictly by you. After finetuning, model will be saved in path provided in config, and you can check out if it performs better than the base one. Or even you can put it on llm Leaderboard to check if we do not have new SOTA :)
Have fun and have a great day <3
·
replied to chansung's post 3 months ago
view reply

Thats so cool, really helpful for non English language audience <3

replied to their post 3 months ago
view reply

Thanks! The interesting thing is that many rp models comes from merging process, and its behaviour differ significantly than its behaviour on base model. i am also curious about the inference on longer context, and what is the point (proper lenght) when the personality starts to dissapear? These are really interesting points.

I have to extend my thought in next posts and provide it with some technical details. Your feedback and thinking process is amazing, thank you very much <3

posted an update 3 months ago
view post
Post
GPU Poor POV: My storytelling choices of the week

Its end of the week, I decided to summarize my observations in community based LLMs and mention few models in specific area which are very interesting and has capability to create some insightful stories despite of its relatively lightweight form.

I personally did not use LLMs in my daily routine to tasks like function calling, parsing or assist in code writing. What I tried to use for is storytelling, because it always amaze me how different these models comes to different preferred tasks.

How this model are able to generalize the stories and sometimes, how high level of creativity they carry.

BlueNipples/DaringLotus-v2-10.7b its main target is to generate prose. Quoting the author 'It shares it's good prose, and relatively decent coherency, being a little bit more on the side of prose, and a little bit less on the side of coherency. I like this model for generating great prose if I feel like regening a bit. '

https://huggingface.co/NeuralNovel/Aeryth-7B-v0.1
great work by @NeuralNovel , I really like how flexible this model is, there is no strict focus on a certain role, so definitely worth a try. Would love to hear more about dataset on which was trained, afaik is private rn. best suited for Science Fiction, History & Romance genres due to the training data used.

And the last one for today is FPHam/Sydney_Pirate_Mistral_7b @FPHam work always amaze me how the models are able to stick to provided role. awesome work as always, Ill for sure use this model to generate some interesting stories.

I know that hype train is going fast but as I observe people here on huggingface are creating really creative models which are for sure worth to try. Have a great day <3
·
replied to Norod78's post 3 months ago
replied to Norod78's post 3 months ago
view reply

Great work! What do you think are potential use cases to implement this type of technology ?

replied to their post 3 months ago
view reply

Inference speed for this specific task was really a boost and was visible faster than base model. Did not try onnx conversion but can check it out.

And I callculate embeddings for translated knowledge base. Then, for every question the translation is performed. Thats why the relatively lightweight solution was a must have :)

replied to their post 3 months ago
view reply

Let me know if its interesting for you, i can jump into more details ^^

posted an update 3 months ago
view post
Post
GPU Poor POV: Low Hanging Fruits


Sometimes we had to work with different language than English (what a surprise!) and it can be problematic, because as you may know many algorithms are mainly developed in English.
I was involved in building RAG in Polish language. At first, we need an proper embeddings for Polish language to feed them into lightweight LLM.
Looking through possible solution I become aware that existing/possible models are not accurate enough, and worked much worse than its 'english equivalent'.
First thing that comes to mind is:
Lets become a mad scientist, download all possible data and train model for months to get the proper one.

But there are few cons of this.
- Its computionally heavy
- You are not full time researcher
- you have potential clients who want to use your solution, and they really happy to use it (in optimistic mood).
Here comes the low hanging fruits.
We developed a easier, workable solution. Instead of training new SOTA, we can use translation module like this one:

Helsinki-NLP/opus-mt-pl-en
translate your knowledge base to english, and use proper embedding model accurately.
I converted existing model using ctranslate2,

ct2-transformers-converter --model Helsinki-NLP/opus-mt-pl-en --output_dir opus-mt-pl-en

so making an inference is not heavy (we observe 5 times speedup in compare to original version).

And by indexing knowledge base, we can return answer to LLM in any language. (Indexes of context found in english language are equal to indexes in native language knowledge base).

Of course there are some tweaks required, we have to validate accuracy of the translation.

It was nice episode, we have our work done, there are people who can use it, so additive value exists.
Have a great day and I wish you more effective deploys! <3
  • 4 replies
·
posted an update 4 months ago
view post
Post
GPU Poor POV: Building a RAG which solves specific task.

Everyone loves benchmarks.
They are great because we have standarized approach, competitive feeling. But if you are in specific area, trying to implement some LLM/RAG use case, these benchmarks cannot exactly reflect on the data that you have to deal with.

I built RAG system on bunch of niche procedures/regulation etc, which can be finally deployed as an virtual assistant to minimize the effort in searching through a lot of documentations manually.

Tested a lot of different methods/models/pretrains, finetunes and whats interesting is that, final solution which was scored by human feedback is based on relatively low param models, with multitask ability
Something like:

BAAI/llm-embedder

LLMs help summarize the chunk version of knowledge base found, does not require the model with high number of params, because tradeoff between inference time and accuracy has to be made. Some lightweight models have ability to perform certain task based on instructions, so eg. qwen 7b or mistral 7b (not moe one), realized a task really nicely. And what is more important is that in overall we are able to deploy a RAG system in smaller tasks, in specific area. They can be used by people who need it, give additive value and positive feedback, which IMO is what is all of the building process about.

Have a great day and think about problem which your models have to solve <3
  • 2 replies
·
replied to clem's post 4 months ago
view reply

VLMs are not my pair of shoes but maybe itll motivate me to start ngl. !

replied to Tonic's post 4 months ago
view reply

Ill try it right now, its relatively heavy so I am little bit worried about computation time. Ill let you know about my feelings!