James Wright's picture

James Wright PRO

jsfs11

AI & ML interests

ML engineering, SOTA techniques, LLM fine-tuning and merging

Recent Activity

Organizations

None yet

jsfs11's activity

reacted to davidberenstein1957's post with πŸ‘€ 3 days ago
view post
Post
4058
Introducing the Synthetic Data Generator, a user-friendly application that takes a no-code approach to creating custom datasets with Large Language Models (LLMs). The best part: A simple step-by-step process, making dataset creation a non-technical breeze, allowing anyone to create datasets and models in minutes and without any code.

Blog: https://huggingface.co/blog/synthetic-data-generator
Space: argilla/synthetic-data-generator
Β·
reacted to bartowski's post with πŸ‘ 4 months ago
view post
Post
10044
So turns out I've been spreading a bit of misinformation when it comes to imatrix in llama.cpp

It starts true; imatrix runs the model against a corpus of text and tracks the activation of weights to determine which are most important

However what the quantization then does with that information is where I was wrong.

I think I made the accidental connection between imatrix and exllamav2's measuring, where ExLlamaV2 decides how many bits to assign to which weight depending on the goal BPW

Instead, what llama.cpp with imatrix does is it attempts to select a scale for a quantization block that most accurately returns the important weights to their original values, ie minimizing the dequantization error based on the importance of activations

The mildly surprising part is that it actually just does a relatively brute force search, it picks a bunch of scales and tries each and sees which one results in the minimum error for weights deemed important in the group

But yeah, turns out, the quantization scheme is always the same, it's just that the scaling has a bit more logic to it when you use imatrix

Huge shoutout to @compilade for helping me wrap my head around it - feel free to add/correct as well if I've messed something up
Β·