Can't run the model in tabbyAPI
Hello.
The tabby's log complained to not being able to find next parameters:
"rms_norm_eps": 1e-06,
"rope_local_base_freq": 10000.0,
"vocab_size": 262208
After adding them into "config.json", now there are these errors:
ERROR: raise ValueError(f" ## Could not find {prefix}.* in model")
ERROR: ValueError: ## Could not find lm_head.* in model
Cannot find any useful info in the internet...
Help, please
The model is currently supported on the dev branch of ExLlamaV2, not the latest release version which Tabby pulls by default. If you can switch ExLlamaV2 over to the dev branch (requires the build prerequisites (CUDA Toolkit, plus VS Build Tools if you're on Windows) it should work, otherwise there will be a new release in a couple of days most likely.
Thank you kindly for your reply, I restored the original "config.json", then cloned dev branch of exllamav2, built it. Now it is working!
However, looks like it is having a difficulty generating code with roo-code, it starts to generate but ends up cycling same word:
Okay, I will write a Tetris game logic in Python using the Pygame library. I'm ready to write the code. I's a Python file, so I'll create a file named tetris.code. I'll use the write_to_code tool to write the code. I'll start by writing the code. ```python import pygame import random def main pygame.init() screen = pygame.display.set_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_main_
Here is my tabbyAPI config for the model:
max_seq_len: 32768
cache_mode: Q4
Can it be some tabby specific changes in exllamav2 usage at fault?
Should I try to experiment with temperature or some other parameters like min or max p or something?
Thanks
Try cache Q6+, some models break almost completely under Q4.
I second this. There's no guarantee that the distribution of the keys and/or values will be amenable to quantization, especially Q4 which relies on groups of 64 consecutive values aligning well to a regular 16-point grid after Hadamard regularization. It might be some interesting interaction between that regularization and Gemma3's use of Q/K norms. I'm not sure. Could also be SWA which only uses 1024 keys/values per token for 5/6 layers, making rounding errors in that smaller chunk of cache more critical. Either way, try Q6, Q8, or FP16 to see if the results improve.