Instructions to use DarkArtsForge/Helix-SCE-12B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use DarkArtsForge/Helix-SCE-12B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="DarkArtsForge/Helix-SCE-12B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("DarkArtsForge/Helix-SCE-12B") model = AutoModelForCausalLM.from_pretrained("DarkArtsForge/Helix-SCE-12B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use DarkArtsForge/Helix-SCE-12B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "DarkArtsForge/Helix-SCE-12B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DarkArtsForge/Helix-SCE-12B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/DarkArtsForge/Helix-SCE-12B
- SGLang
How to use DarkArtsForge/Helix-SCE-12B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "DarkArtsForge/Helix-SCE-12B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DarkArtsForge/Helix-SCE-12B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "DarkArtsForge/Helix-SCE-12B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DarkArtsForge/Helix-SCE-12B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use DarkArtsForge/Helix-SCE-12B with Docker Model Runner:
docker model run hf.co/DarkArtsForge/Helix-SCE-12B
Cool model merge
Thx for using my models, hope this merge can deliver something new and exciting, being an SCE merge.
However I noticed a place that can be probably improved.
I'm talking about tokenizer_config.json - especially tokens 2, 3 (which are ususally used for marking start and end of turns).
Mistral should use [INST] [/INST], but it mostly outputs [INST] </s> sequence instead, so in ChatML it's common practice to map those, as they are used more often.
ChatML is pretty messy in Nemo, as everyone does their own thing, but usually it boils down to:
- manual config updates to map ChatML tokens to Mistal Tekken template (Delta-Vector, PocketDoc, my models)
- adding additional tokens to tokenizer (LatitudeGames, SicariusSicariiStuff)
Probaby this model will work better in regular Mistral Tekken v7 Template, rather than in ChatML with current config, as majority of models took 1st approach and it should be backwards compatible.
In my experience mixing and mathching those two approaches can produce inconsitent message ending behavior (either infinitely long replies, or a reply that starts to output emojis at the end).
I usually try to use mergekit's token-surgeon to make everything uniformal, but it defenitely doesn't fix everything, so manual testing of intermediate results is still preferred.
I'd try to test how the model behaves after this:
mergekit-tokensurgeon "DarkArtsForge/Helix-SCE-12B" "Vortex5/Prototype-X-12b" Helix-SCE-12B-jh --approximation-method john_hewitt
I would appreciate if you tried that out, maybe it behaves a bit differently (in a good way) as a result.
Thanks, I will try this soon. I haven't really figured out how to use token surgeon yet and usually just test either "mistral tekken" or "chatml" presets via kobold.cpp but for this model I only tested using "mistral tekken" and got good results.
I'll let you know how it compares after running the token surgeon command, testing both modes for each tokenizer version.
Evertide RX is a great model btw, I am including it in the next merge as well (Vesper), which I did test Mistral Tekken and ChatML for, and ChatML did much, MUCH better. Unfortunately Impish Bloodmoon doesn't play nice with most of these models so I couldn't include it.
Thanks, I will try this soon. I haven't really figured out how to use token surgeon yet and usually just test either "mistral tekken" or "chatml" presets via
kobold.cppbut for this model I only tested using "mistral tekken" and got good results.I'll let you know how it compares after running the token surgeon command, testing both modes for each tokenizer version.
Evertide RX is a great model btw, I am including it in the next merge as well (Vesper), which I did test Mistral Tekken and ChatML for, and ChatML did much, MUCH better. Unfortunately Impish Bloodmoon doesn't play nice with most of these models so I couldn't include it.
I assume it should perform better (or at least the same) in ChatML after token surgeon pass.
Impish Bloodmoon - yeah, that's what I described in my msg - it uses it's own trained tokens, so it's harder to merge in, it should however merge nicely with Wayfarer 1/2 or Muse for example (or other SicariusSicariiStuff) models.
As a suggestion you could try merging it into middle layers (15-30), this may mitigate observed effects from tokenization, as those layers are used mainly for knowledge.
Also, thx for feedback! I'm glad you liked Evertide RX. I'm working on a full finetune from Base model using the same dataset (now expanded and updated). It's much more uncensored.
Good idea with just using middle layers, I'll have to try that sometime.
I tested the retoknized version: For Q0F it wrote much different using the john hewitt token surgeon. It wrote more in the style of Vesper. However, it hallucinated an [END] tag and then briefly generated its own refusal reply, after having completed the instructions (saying it couldn't do the thing it just did).
I have a log file saved of several 12B tests for basic story slop. In fact what this seems to measure is how well the finetune or merge broke out of the typical pretrained Mistral tropes (positive fairy tales). So, a lot of merges revert to Nemo's "a sparkling river and a girl named Lily" slop. But others are much different, talking about the Vietnam war, an old bookstore, or being swallowed by a Kraken, etc. And I just test the prompt at 0 temperature TELL ME A STORY to measure this.
Even MPOA ablation can affect this and change the output, potentially unlocking the 'darker aspects' of storytelling. And it usually spams the russian word for "artillery" артиллерия quite early if Mistral Tekken is unsupported.
TELL ME A STORY
Fat_Fish
Once upon a time, in a small village nestled between rolling hills and lush forests, there lived a young boy named Tom. He was an adventurous child who loved exploring the woods around his home. One day, while wandering through the forest, he stumbled upon an old, abandoned cottage hidden deep within the trees.
Fat_Fish MPOA (scale 1.1, measurement 53) (this experiment was unstable with other prompt tests)
Once upon a time, there was a little girl named Alice. One day, while playing in the garden, she found a rabbit hole that led to Wonderland. In Wonderland, Alice met many strange creatures and had many adventures. She encountered the Cheshire Cat, who could disappear at will; the Mad Hatter, whose tea party never ended; and the Red Queen, who always ran late for her own tea parties. Alice also met the White Rabbit, who showed her how to reach the top of the rabbit hole by using a hookah.
So for Helix, the retokenization affected it slightly, but its still themed around Rusty the Fox (one of the other major themes I've seen in Mistral Nemo models). Retokenized output is the first example here
Thank you for testing!
Also testing at 0 temp seems interesting, I might also incorporate that into my tests.
