sometimesanotion PRO

sometimesanotion

AI & ML interests

Agentic LLM services, model merging, finetunes, distillation

Recent Activity

Organizations

Hugging Face Discord Community's profile picture

sometimesanotion's activity

replied to their post 8 days ago
view reply

@Inschrift-Spruch-Raum ,I am looking through recent PRs to mergekit, and I am optimistic that Lamarck's recipes will be working again soon!

When that happens, there will be two efforts: one to make a compelling non-CoT model, and another to blend CoT in right amounts.

Lamarck's multilingual capabilities improved noticeably from light influence of Krystalan/DRT-14B in v0.6, and merging from other CoT models like DeepSeek R1 is a matter of careful moderation. I will always put the overall apparent quality of translation, prose, and reasoning first.

replied to their post 11 days ago
view reply

No worries! See, I agree, the recipe behind Lamarck is pretty good, and there's a lot more to get out of it. It'll likely depend on getting multiple mergekit versions working on the pipeline. The new mergekit's fusion and sce merges offer some interesting potential, but I use fine-grained sliced merges to control the mix of branches, which last I checked, work only with older mergekit and bitsnbytes.

By now there are ample upgrades to try. I did feel Lamarck v0.7 was a proof-of-concept and had plenty of headroom to grow!

replied to their post 11 days ago
New activity in wanlige/li-14b-v0.4-slerp0.1 21 days ago
replied to their post 22 days ago
view reply

You need to keep testing models in pytorch, not just GGUF, to catch this bug. If you submit it for evaluation on the open leaderboard, it will abort.

For those who need a bit of Python to test their merged models:

import os
from typing import List

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

def main(checkpoint: str) -> None:
    """Load and return tokenizers and models for specified checkpoints."""
    
    tokenizers = [AutoTokenizer.from_pretrained(checkpoint)]
    print(f"Loaded tokenizer from {checkpoint}")
   
    models = [
        AutoModelForCausalLM.from_pretrained(
            checkpoint, device_map="auto", torch_dtype=torch.bfloat16
        ).to("cuda" if torch.cuda.is_available() else "cpu")
    ]
    
    for model in models:
        print(f"Loaded model to {model.device}")

def cli():
    """CLI entry point."""
    import argparse
    
    parser = argparse.ArgumentParser(description='Load a tokenizer and model from a given checkpoint.')
    parser.add_argument('checkpoint', type=str, help='The pre-trained checkpoint name or path')
    
    args = parser.parse_args()
    
    main(args.checkpoint)

if __name__ == "__main__":
    cli()
posted an update 23 days ago
view post
Post
2305
I have tracked down a blocker preventing Lamarck releases to a della_linear bug in newer mergekit versions.

If you use slices in della_linear merges that have multiple models - as you'd expect of a merge! - an attempt to load the output model in torch will get you:

ValueError: Trying to set a tensor of shape torch.Size([1, 5120]) in "weight" (which has shape torch.Size([5120])), this looks incorrect.


This strategy was key to Lamarck v0.6 and v0.7's success. Their merge recipes haven't been working with newer mergekits.

These work:
models:
  - model:           sometimesanotion/Qwen2.5-14B-Vimarckoso-v3
  - model:           sthenno-com/miscii-14b-0218

slices:
  - sources:
    - { layer_range: [  0,  2 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 }
  - sources:
    - { layer_range: [  2,  6 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 }


This does not:
slices:
  - sources:
    - { layer_range: [  0,  2 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 }
    - { layer_range: [  0,  2 ], model: sthenno-com/miscii-14b-0218 }
  - sources:
    - { layer_range: [  2,  6 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 }
    - { layer_range: [  2,  6 ], model: sthenno-com/miscii-14b-0218 }


@Crystalcareai , do you know of any work on this? Will @arcee-ai need a detailed report? These della_linear recipes used to work. Overall, thank you for all the cool work, I hope to get this fixed!
  • 1 reply
·