Cool but i have a quick question please.

by ctranslate2-4you - opened Mar 5

Mar 5

When you say that it's not perfect, what do you mean exactly? I've been looking for this to be made for the longest time so thanks first of all...

Can it use bitsandbytes and "better transformer" just like the llava 1.5-hf models can? Here is a portion of a script of mine where you can see that I give users the option to use bitsandbytes depending on what they choose within my GUI, and I'd like to do the same thing with this model...as long as the issues you've mentioned can be fixed with a little code!

class loader_llava:
    def initialize_model_and_tokenizer(self, config):
        chosen_model = config['vision']['chosen_model']
        chosen_size = config['vision']['chosen_size']
        chosen_quant = config['vision']['chosen_quant']
        
        model_id = ""
        if chosen_model == 'llava' and chosen_size == '7b':
            model_id = "llava-hf/llava-1.5-7b-hf"
        elif chosen_model == 'bakllava':
            model_id = "llava-hf/bakLlava-v1-hf"
        elif chosen_model == 'llava' and chosen_size == '13b':
            model_id = "llava-hf/llava-1.5-13b-hf"

        print(f"Selected model: {chosen_model}")
        print(f"Selected size: {chosen_size}")
        print(f"Selected quant: {chosen_quant}")

        device = get_best_device()
        print(f"Using device: {device}")

        if chosen_model == 'llava' and chosen_quant == 'float16':
            model = LlavaForConditionalGeneration.from_pretrained(
                model_id,
                torch_dtype=torch.float16,
                low_cpu_mem_usage=True,
                resume_download=True
            ).to(device)
        elif chosen_model == 'llava' and chosen_quant == '8-bit':
            model = LlavaForConditionalGeneration.from_pretrained(
                model_id,
                torch_dtype=torch.float16,
                low_cpu_mem_usage=True,
                load_in_8bit=True,
                resume_download=True
            )
        elif chosen_model == 'llava' and chosen_quant == '4-bit':
            model = LlavaForConditionalGeneration.from_pretrained(
                model_id,
                torch_dtype=torch.float32,
                low_cpu_mem_usage=True,
                load_in_4bit=True,
                resume_download=True
            )
        elif chosen_model == 'bakllava' and chosen_quant == 'float16':
            model = LlavaForConditionalGeneration.from_pretrained(
                model_id,
                torch_dtype=torch.float16,
                low_cpu_mem_usage=True,
                resume_download=True
            ).to(device)
        elif chosen_model == 'bakllava' and chosen_quant == '8-bit':
            model = LlavaForConditionalGeneration.from_pretrained(
                model_id,
                torch_dtype=torch.float16,
                low_cpu_mem_usage=True,
                load_in_8bit=True,
                resume_download=True
            )
        elif chosen_model == 'bakllava' and chosen_quant == '4-bit':
            model = LlavaForConditionalGeneration.from_pretrained(
                model_id,
                torch_dtype=torch.float32,
                low_cpu_mem_usage=True,
                load_in_4bit=True,
                resume_download=True
            )

PerRing

Owner Mar 5

•

edited Mar 5

what i mean 'not perfect' is 3 reasons.

as it written at READ.me, this model keep generate "\n"despite the completion of generation.
in converting process, there is so many errors about tokenizer. i tried and fixed it, but i'm not sure it fixed perfectly.
the results are not satisfactory compared to the llava GitHub version (repetitions, hallucinations, etc.).

and about quantization, "load_in_4bit=True" is working. if you run READ.me code with "load_in_4bit=True", it takes vram less than 24GB.

model_id = "PerRing/llava-v1.6-34b-hf"
model = LlavaForConditionalGeneration.from_pretrained(
    model_id, 
    torch_dtype=torch.float16, 
    low_cpu_mem_usage=True, 
    load_in_4bit=True
)
processor = AutoProcessor.from_pretrained(model_id)

ctranslate2-4you

Mar 5

Thanks, I'll try that code snippet. Can you paste here any examples of the poor output...would hate to spend 2 hours writing a script for my larger program if it's too bad ya know? I have no idea why the quality would be bad...the "\n" issue seems like it could be addressed, but just the output being "not as good" is hard...Would you say that it's currently worst than llava 1.5 even?

Also, do you plan on doing other sizes besides the 34b version???

ctranslate2-4you

Mar 5

Are you affiliated with these guys by chance? If not, maybe you could look at their code base if they're willing to share to troubleshoot...

https://huggingface.co/llava-hf

PerRing

Owner Mar 6

This comment has been hidden

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment