π EnceladusHyperStock 24B
crashed 2 weeks ago, let's try again
It's queued!
You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#EnceladusHyperStock-24B-GGUF for quants to appear.
It seems to have crashed again π
seems something is wrong, either tokenizer is modified or the model is not supported, the usual NotImplementedError: BPE pre-tokenizer was not recognized - update get_vocab_base_pre() from llama cpp
seems something is wrong, either tokenizer is modified or the model is not supported, the usual
NotImplementedError: BPE pre-tokenizer was not recognized - update get_vocab_base_pre()from llama cpp
How am I not surprised? We are currently running inference on this model and BPE was a total shitshow. It is in fact so broken I had to write the following function to postprocess the vllm output which is the first model we ever tried that required postprocessing:
def nuke_llm_garbage(text: str) -> str:
"""Forcefully converts byte-level artifacts back to normal whitespace."""
if not text:
return ""
# Map the entire known family of BPE character artifacts
garbage_map = str.maketrans({
'Δ ': ' ', # The annoying space replacement
'Δ': '\n', # The annoying newline replacement
'Δ': '\r', # Carriage return
'Δ': '\t', # Tab fallback
})
return text.translate(garbage_map)
should i take this as a compliment or roast? i genuinely dont know about tokenizer and stuff, im just doing 'random bullshit go!' while merging models to achieve good-ish model, then testing little bit by myself. how this happen? lol
should i take this as a compliment or roast?
Both? There may be a structural issue with Mistral 24B merges.
If the model is corrupt you may need to try re-merging all 3 stages with tokenizer source union and chat_template auto.
for example, you'd add these lines to the end of each yaml
tokenizer:
source: union
chat_template: auto
Like this
merge_method: model_stock
base_model: R:\MergeOutput\EnceladusModelStock-Heretic
models:
- model: R:\MergeOutput\EnceladusModelStock-Heretic
- model: R:\Sakura-24B-Spice
- model: R:\Dolphin-Mistral-GLM-4.7-Flash-24B-Venice-Edition-Thinking-Uncensored
- model: R:\WeirdCompound
- model: R:\Broken-Tutu-24B-Unslop-v2.0
- model: R:\Cydonia-24B-Heretic-v4
dtype: bfloat16
tokenizer:
source: union
chat_template: auto
If you try this and it's still corrupt, then it could be an issue with the donors and methods, and you may want to consider swapping the stage 1 base_model to either 2501 or 2506 Mistral Instruct (text only).
I haven't tested your merge yet but looking at the yamls it appears each stage is likely going to be at least 90% dominated by the 2501 models. Even methods like karcher and model_stock aren't immune to this L2 norm effect.
Basically, Mistral 24B released their first version in January 2025 (version 2501). This was Mistral Small 3.0. This is the version that has probably the most finetunes, but is also the least compatible with all future versions.
Mistral Small 3.1 (2503), 3.2 (2506), and Magistral Small (2509) are the 3 versions that followed, yet they are all HIGHLY similar due to using the 2503 as a root/base. You can merge these models with less issues or "over-dominance". 2501 models can still be combined with the newer mistrals but it may require extra fiddling to get it stabilized or balanced.