zero

wonderboy

AI & ML interests

None yet

Recent Activity

liked a model 9 days ago
cross-encoder/nli-deberta-v3-base
updated a collection 10 days ago
Datasets
updated a collection 10 days ago
Datasets
View all activity

Organizations

None yet

wonderboy's activity

reacted to bartowski's post with 🚀👍 5 months ago
view post
Post
10050
So turns out I've been spreading a bit of misinformation when it comes to imatrix in llama.cpp

It starts true; imatrix runs the model against a corpus of text and tracks the activation of weights to determine which are most important

However what the quantization then does with that information is where I was wrong.

I think I made the accidental connection between imatrix and exllamav2's measuring, where ExLlamaV2 decides how many bits to assign to which weight depending on the goal BPW

Instead, what llama.cpp with imatrix does is it attempts to select a scale for a quantization block that most accurately returns the important weights to their original values, ie minimizing the dequantization error based on the importance of activations

The mildly surprising part is that it actually just does a relatively brute force search, it picks a bunch of scales and tries each and sees which one results in the minimum error for weights deemed important in the group

But yeah, turns out, the quantization scheme is always the same, it's just that the scaling has a bit more logic to it when you use imatrix

Huge shoutout to @compilade for helping me wrap my head around it - feel free to add/correct as well if I've messed something up
·
replied to bartowski's post 5 months ago
view reply

I saw the imatrix dataset which is a whole text file, I'm trying to recreate your wizardry in ONNX lol and I wonder how you make sense of the whole text, how do you chunk it? etc, etc? Help appreciated, and I'm glad you started posting, just found out about this new feature last week, take care. You doing god's work, and your quants are the best. GGUF quants have come such a long way, I see smaller files, and faster outputs, but even ONNX is beating GGUF in my tests, it just take more refined approach.

After examining it, the most I could take away was questions + answers + random text.

I coded Python script:

with open("calibration_datav3.txt", "rt") as file:
    data = file.read()

data_blocks = data.split("Q:\n\n")[1:]

for i, block in enumerate(data_blocks, 1):
    block = block.split("A:\n\n")
    question = block[0].strip()
    answer = block[1].strip().split("\n\n")[0].strip()
    print(f"### QUESTION:\n{question}\n")
    print(f"### ANSWER:\n{answer}")
    if i != len(data_blocks):
        print("\n---\n")

and it give me some structured data, although some parts of the answers are truncated 😅, example:

### QUESTION:
как передать json на сервер

Здравствуйте, у меня есть 2 json объекта, находящиеся в javascript. Каким образом мне хранить их на сервере, файлами или в запросе передавать? Пожалуйста, с примерами кода.
Бэкэнд на ASP.NET 4.5

### ANSWER:
На клиенте конвертировать его в string:
myStringObj = JSON.stringify(myObj);

---

...

---

### QUESTION:
Show that $S_5$ does not have a quotient group isomorphic to $S_4$

Show that $S_5$ does not have a quotient group isomorphic to $S_4$.

If we to assume that $H$ is such a group, than $H$ must be normal in $S_5$ and $|H|=|S_5|/|S_4|=5$. So $H$ must be isomorphic to $\mathbb{Z}/5\Bbb Z$.
That's as far as my logic goes. I couldn't arrive at a contradiction.
Any ideas?

### ANSWER:
The possible candidates for such an $H$ are the subgroups of $S_5$ that are cyclic of order 5.  All elements of $S_5$ of order 5 are given by $5$-cycles.  However, the subgroup generated by a 5-cycle is not normal, so no $H$ can exist, as desired.
reacted to bartowski's post with ❤️ 5 months ago
view post
Post
10050
So turns out I've been spreading a bit of misinformation when it comes to imatrix in llama.cpp

It starts true; imatrix runs the model against a corpus of text and tracks the activation of weights to determine which are most important

However what the quantization then does with that information is where I was wrong.

I think I made the accidental connection between imatrix and exllamav2's measuring, where ExLlamaV2 decides how many bits to assign to which weight depending on the goal BPW

Instead, what llama.cpp with imatrix does is it attempts to select a scale for a quantization block that most accurately returns the important weights to their original values, ie minimizing the dequantization error based on the importance of activations

The mildly surprising part is that it actually just does a relatively brute force search, it picks a bunch of scales and tries each and sees which one results in the minimum error for weights deemed important in the group

But yeah, turns out, the quantization scheme is always the same, it's just that the scaling has a bit more logic to it when you use imatrix

Huge shoutout to @compilade for helping me wrap my head around it - feel free to add/correct as well if I've messed something up
·
replied to TuringsSolutions's post 5 months ago
view reply

Thank you for the fast reply and improved code.

Dumb questions:

  1. This doesn't improve already finetuned models, rather I have to run this code, and then run the training, correct?
  2. Also, when I save the model, and load it later in another instance, this improvement did not get saved too, so I need to load this code every time, correct?
reacted to singhsidhukuldeep's post with 🚀 5 months ago
view post
Post
1702
✨ Feeling thankful...

🇮🇳 15th August, 2024; on India's 78th Independence Day

🎉 Crossed 100 followers on Hugging Face

🏆 Got LinkedIn Top Voice

🤖 AI has never been more exciting and I am here for it

👀 @clem Can I be a Hugging Face fellow now?
replied to TuringsSolutions's post 5 months ago
view reply

Could you help us with an example? In this case, maybe I'm on the money, maybe I'm not lol, I made this:

import torch
from torch import nn
from transformers import AutoTokenizer, AutoConfig, AutoModelForTokenClassification

# Define the fractal functions
def f1(x):
    return x**2 + 0.1

def f2(x):
    return 1 - (2 * x - 1)**4

# Custom P-FAF Embedding Layer
class PFAFEmbedding(nn.Module):
    def __init__(self, embed_size, num_fractals):
        super().__init__()
        self.p = nn.Parameter(torch.rand(num_fractals))  # Probabilistic weights
        self.d = nn.Parameter(torch.rand(num_fractals) * 1.5 + 0.5)  # Fractional dimensions
        self.embed_size = embed_size
        self.num_fractals = num_fractals
        self.fractals = [f1, f2]  # List of fractal functions

    def forward(self, x):
        # x: [batch_size, seq_length, embed_size]
        batch_size, seq_length, _ = x.shape
        x_expanded = x.unsqueeze(1).expand(-1, self.num_fractals, -1, -1)  # Shape: [batch_size, num_fractals, seq_length, embed_size]
        x_dim = torch.pow(x_expanded, 1 / self.d.unsqueeze(0).unsqueeze(-1).unsqueeze(-1))  # Apply fractional power
        t = torch.stack([p * f(xd) for p, f, xd in zip(self.p, self.fractals, torch.unbind(x_dim, dim=1))], dim=1)
        return torch.sum(t, dim=1)  # Sum over fractals

# Custom BERT Model with P-FAF Embedding
class AutoModelWithPFAF(AutoModelForTokenClassification):
    def __init__(self, config):
        super().__init__(config)
        self.pfaf_embedding = PFAFEmbedding(config.hidden_size, 2)  # Using 2 fractal functions for demonstration

    def forward(self, input_ids, attention_mask=None):
        # Normal BERT inputs handling
        inputs_embeds = self.embeddings.word_embeddings(input_ids)
        inputs_embeds = self.pfaf_embedding(inputs_embeds)  # Apply P-FAF transformation
        
        # Rest of the BERT model
        extended_attention_mask = self.get_extended_attention_mask(attention_mask, input_ids.shape, input_ids.device)
        head_mask = self.get_head_mask(None, self.config.num_hidden_layers)
        encoder_outputs = self.encoder(
            inputs_embeds,
            attention_mask=extended_attention_mask,
            head_mask=head_mask
        )
        sequence_output = encoder_outputs[0]
        pooled_output = self.pooler(sequence_output) if self.pooler is not None else None
        outputs = (sequence_output, pooled_output) + encoder_outputs[1:]
        return outputs  # Return the base BERT outputs for compatibility

# Load pre-trained BERT and modify it
config = AutoConfig.from_pretrained("google-bert/bert-base-cased")
model = AutoModelWithPFAF.from_config(config)
tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-cased")
replied to nroggendorff's post 5 months ago
view reply

i feel you lol, i hate the super long times, but also the small ones, cause like I play movie for like 5-10 mins before im interrupted again so it feels like a game of cat and mouse haha, but patience, hopefully we reap some rewards XD stay strong.