🌒 EnceladusHyperStock 24B

#2428

by redaihf - opened 10 days ago

Discussion

redaihf

10 days ago

https://huggingface.co/ShyliaSafetensors/EnceladusHyperStock-24B

RichardErkhov

10 days ago

crashed 2 weeks ago, let's try again
It's queued!

You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#EnceladusHyperStock-24B-GGUF for quants to appear.

redaihf

9 days ago

It seems to have crashed again 😒

RichardErkhov

9 days ago

seems something is wrong, either tokenizer is modified or the model is not supported, the usual NotImplementedError: BPE pre-tokenizer was not recognized - update get_vocab_base_pre() from llama cpp

nicoboss

8 days ago

•

edited 8 days ago

seems something is wrong, either tokenizer is modified or the model is not supported, the usual NotImplementedError: BPE pre-tokenizer was not recognized - update get_vocab_base_pre() from llama cpp

How am I not surprised? We are currently running inference on this model and BPE was a total shitshow. It is in fact so broken I had to write the following function to postprocess the vllm output which is the first model we ever tried that required postprocessing:

def nuke_llm_garbage(text: str) -> str:
    """Forcefully converts byte-level artifacts back to normal whitespace."""
    if not text:
        return ""

    # Map the entire known family of BPE character artifacts
    garbage_map = str.maketrans({
        'Ġ': ' ',   # The annoying space replacement
        'Ċ': '\n',  # The annoying newline replacement
        'ĉ': '\r',  # Carriage return
        'ċ': '\t',  # Tab fallback
    })
    return text.translate(garbage_map)

redaihf

8 days ago

Seems like @ShyliaSafetensors has created a white crow!

ShyliaSafetensors

5 days ago

should i take this as a compliment or roast? i genuinely dont know about tokenizer and stuff, im just doing 'random bullshit go!' while merging models to achieve good-ish model, then testing little bit by myself. how this happen? lol

redaihf

5 days ago

•

edited 5 days ago

should i take this as a compliment or roast?

Both? There may be a structural issue with Mistral 24B merges.

Naphula

about 12 hours ago

If the model is corrupt you may need to try re-merging all 3 stages with tokenizer source union and chat_template auto.

for example, you'd add these lines to the end of each yaml

tokenizer:
  source: union
chat_template: auto

Like this

merge_method: model_stock
base_model: R:\MergeOutput\EnceladusModelStock-Heretic
models:
  - model: R:\MergeOutput\EnceladusModelStock-Heretic
  - model: R:\Sakura-24B-Spice
  - model: R:\Dolphin-Mistral-GLM-4.7-Flash-24B-Venice-Edition-Thinking-Uncensored
  - model: R:\WeirdCompound
  - model: R:\Broken-Tutu-24B-Unslop-v2.0
  - model: R:\Cydonia-24B-Heretic-v4
dtype: bfloat16
tokenizer:
  source: union
chat_template: auto

If you try this and it's still corrupt, then it could be an issue with the donors and methods, and you may want to consider swapping the stage 1 base_model to either 2501 or 2506 Mistral Instruct (text only).

I haven't tested your merge yet but looking at the yamls it appears each stage is likely going to be at least 90% dominated by the 2501 models. Even methods like karcher and model_stock aren't immune to this L2 norm effect.

Basically, Mistral 24B released their first version in January 2025 (version 2501). This was Mistral Small 3.0. This is the version that has probably the most finetunes, but is also the least compatible with all future versions.

Mistral Small 3.1 (2503), 3.2 (2506), and Magistral Small (2509) are the 3 versions that followed, yet they are all HIGHLY similar due to using the 2503 as a root/base. You can merge these models with less issues or "over-dominance". 2501 models can still be combined with the newer mistrals but it may require extra fiddling to get it stabilized or balanced.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment