Quantize llama 3.2 1B into onnx INT8

by lakpriya - opened Oct 9

Oct 9

•

How can i quantize llama 3.2 1B into onnx INT8 like this model? Can someone tell me how do it for onnx quantization?

lakpriya changed discussion title from Quantize llama 3.1 1B into onnx INT8? to Quantize llama 3.2 1B into onnx INT8? Oct 9

lakpriya changed discussion title from Quantize llama 3.2 1B into onnx INT8? to Quantize llama 3.2 1B into onnx INT8 Oct 9

Felladrin

Owner Oct 10

Hi, @lakpriya .

@Xenova already converted llama 3.2 1B into onnx INT8 like this model:
https://huggingface.co/onnx-community/Llama-3.2-1B-Instruct

In case you want to convert it yourself, you can follow these instructions: https://github.com/xenova/transformers.js?tab=readme-ov-file#convert-your-models-to-onnx

lakpriya

Oct 10

@Felladrin Thanks so much. I tried to use it with my app, which uses onnx-runtime, but it gives me a regex error from the tokenizers.js. but its working fine for llama 160m. that's why I thought to convert it myself. Do you know why I get that error from the tokenizers.js?

Felladrin

Owner Oct 10

•

edited Oct 11

Unfortunately, I'm out of clues about this error.
And now that you asked, I noticed there is an open issue about the conversion of (another) Llama 3.2 1B conversion: https://github.com/xenova/transformers.js/issues/967 (So I'm not sure the conversion will work at this moment)

lakpriya

Oct 10

Thank you @Felladrin , I will check

Xenova

Oct 10

@lakpriya can you try with Transformers.js v3? We have made some modifications to be able to consume python regular expressions

npm i @huggingface/transformers

lakpriya

Oct 11

•

edited Oct 11

@Xenova checked it didn't work. I checked the createPattern, and it's the same in both libs. let regex = pattern.Regex.replace(/\\([#&~])/g, '$1');

I tried replacing the existing @xenova /transformerss tokernizers.js with @huggingface/transformerss tockernizers.js, but I'm getting the same issue as before.

function createPattern(pattern, invert = true) {

    if (pattern.Regex !== undefined) {
        // In certain cases, the pattern may contain unnecessary escape sequences (e.g., \# or \& or \~).
        // i.e., valid in Python (where the patterns are exported from) but invalid in JavaScript (where the patterns are parsed).
        // This isn't an issue when creating the regex w/o the 'u' flag, but it is when the 'u' flag is used.
        // For this reason, it is necessary to remove these backslashes before creating the regex.
        // See https://stackoverflow.com/a/63007777/13989043 for more information
        let regex = pattern.Regex.replace(/\\([#&~])/g, '$1'); // TODO: add more characters to this list if necessary

        // We also handle special cases where the regex contains invalid (non-JS compatible) syntax.
        for (const [key, value] of PROBLEMATIC_REGEX_MAP) {
            regex = regex.replaceAll(key, value);
        }

        return new RegExp(regex, 'gu');

    } else if (pattern.String !== undefined) {
        const escaped = escapeRegExp(pattern.String);
        // NOTE: if invert is true, we wrap the pattern in a group so that it is kept when performing .split()
        return new RegExp(invert ? escaped : `(${escaped})`, 'gu');

    } else {
        console.warn('Unknown pattern type:', pattern)
        return null;
    }
}

lakpriya

Oct 11

•

edited Oct 11

@Xenova I fixed the issue with tokenizers.js , Now Llama 3.2 1b works fine on my iPhone. Can I create a PR for that?

Xenova

Oct 11

@lakpriya Yes please! :)

Felladrin changed discussion status to closed 4 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment