46 4 155

sometimesanotion PRO

sometimesanotion

https://ko-fi.com/sometimesanotion

AI & ML interests

Agentic LLM services, model merging, finetunes, distillation

Recent Activity

new activity 7 days ago

open-llm-leaderboard/open_llm_leaderboard:It's been a wild ride, folks :) (end of the Open LLM Leaderboard)

replied to their post 8 days ago

I'd like to draw your attention to a Lamarck-based experiment which uses Arcee AI's newly published arcee_fusion merge method for three out of its four merges. Yes, just four. This is a simple one, and its recipe is fully open: https://huggingface.co/sometimesanotion/Lamarck-14B-v0.7-Fusion It unifies three branches, all of which feature models which bring Lamarck-14B-v0.7 and Qwenvergence-14B-v12-Prose together. One side features @jpacifico's http://huggingface.co/jpacifico/Chocolatine-2-14B-Instruct-v2.0.3 and the other features @suayptalha's http://huggingface.co/suayptalha/Lamarckvergence-14B paired with my models which were their merge ancestors. A fusion merge - of a fusion merge and a SLERP of a fusion and older merge - should demonstrate the new merge method's behavior in interesting ways, especially in the first 1/4th of the model where the SLERP has less impact. I welcome you to kick the tires and learn from it. It has prose quality near Qwenvergence v12's - as you'd expect. Thank you, @mradermacher and @MaziyarPanahi, for the first-day quantizations! Your work helped get me started. https://huggingface.co/models?other=base_model:quantized:sometimesanotion/Lamarck-14B-v0.7-Fusion

updated a model 8 days ago

sometimesanotion/Lamarck-14B-v0.7

View all activity

Organizations

sometimesanotion's activity

New activity in open-llm-leaderboard/open_llm_leaderboard 7 days ago

It's been a wild ride, folks :) (end of the Open LLM Leaderboard)

#1135 opened 12 days ago by

clefourrier

replied to their post 8 days ago

@Inschrift-Spruch-Raum ,I am looking through recent PRs to mergekit, and I am optimistic that Lamarck's recipes will be working again soon!

When that happens, there will be two efforts: one to make a compelling non-CoT model, and another to blend CoT in right amounts.

Lamarck's multilingual capabilities improved noticeably from light influence of Krystalan/DRT-14B in v0.6, and merging from other CoT models like DeepSeek R1 is a matter of careful moderation. I will always put the overall apparent quality of translation, prose, and reasoning first.

updated a model 8 days ago

sometimesanotion/Lamarck-14B-v0.7

Text Generation • Updated 8 days ago • 4.44k • 40

liked a model 10 days ago

google/gemma-3-12b-it

Image-Text-to-Text • Updated 4 days ago • 168k • 257

replied to their post 11 days ago

No worries! See, I agree, the recipe behind Lamarck is pretty good, and there's a lot more to get out of it. It'll likely depend on getting multiple mergekit versions working on the pipeline. The new mergekit's fusion and sce merges offer some interesting potential, but I use fine-grained sliced merges to control the mix of branches, which last I checked, work only with older mergekit and bitsnbytes.

By now there are ample upgrades to try. I did feel Lamarck v0.7 was a proof-of-concept and had plenty of headroom to grow!

replied to their post 11 days ago

Yes, there's a reason for that! I ran into a mergekit bug with the use of slices and della_linear merge methods which are key to intentionally crafting Lamarck's releases. @Crystalcareai , @Arcee , are there fixes in the queue?

https://huggingface.co/posts/sometimesanotion/507492798113402

liked a model 16 days ago

OpenLLM-France/Lucie-7B-Instruct-v1.1

Text Generation • Updated 5 days ago • 1.34k • 8

New activity in wanlige/li-14b-v0.4-slerp0.1 21 days ago

Fusion vs. SLERP?

#2 opened 25 days ago by

sometimesanotion

replied to their post 22 days ago

You need to keep testing models in pytorch, not just GGUF, to catch this bug. If you submit it for evaluation on the open leaderboard, it will abort.

For those who need a bit of Python to test their merged models:

import os
from typing import List

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

def main(checkpoint: str) -> None:
    """Load and return tokenizers and models for specified checkpoints."""
    
    tokenizers = [AutoTokenizer.from_pretrained(checkpoint)]
    print(f"Loaded tokenizer from {checkpoint}")
   
    models = [
        AutoModelForCausalLM.from_pretrained(
            checkpoint, device_map="auto", torch_dtype=torch.bfloat16
        ).to("cuda" if torch.cuda.is_available() else "cpu")
    ]
    
    for model in models:
        print(f"Loaded model to {model.device}")

def cli():
    """CLI entry point."""
    import argparse
    
    parser = argparse.ArgumentParser(description='Load a tokenizer and model from a given checkpoint.')
    parser.add_argument('checkpoint', type=str, help='The pre-trained checkpoint name or path')
    
    args = parser.parse_args()
    
    main(args.checkpoint)

if __name__ == "__main__":
    cli()

liked a model 23 days ago

Lunzima/NQLSG-Qwen2.5-14B-MegaFusion-v8

Text Generation • Updated 17 days ago • 227 • 2

posted an update 23 days ago

Post

2305

I have tracked down a blocker preventing Lamarck releases to a della_linear bug in newer mergekit versions.

If you use slices in della_linear merges that have multiple models - as you'd expect of a merge! - an attempt to load the output model in torch will get you:

ValueError: Trying to set a tensor of shape torch.Size([1, 5120]) in "weight" (which has shape torch.Size([5120])), this looks incorrect.

This strategy was key to Lamarck v0.6 and v0.7's success. Their merge recipes haven't been working with newer mergekits.

These work:

models:
  - model:           sometimesanotion/Qwen2.5-14B-Vimarckoso-v3
  - model:           sthenno-com/miscii-14b-0218

slices:
  - sources:
    - { layer_range: [  0,  2 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 }
  - sources:
    - { layer_range: [  2,  6 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 }

This does not:

slices:
  - sources:
    - { layer_range: [  0,  2 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 }
    - { layer_range: [  0,  2 ], model: sthenno-com/miscii-14b-0218 }
  - sources:
    - { layer_range: [  2,  6 ], model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3 }
    - { layer_range: [  2,  6 ], model: sthenno-com/miscii-14b-0218 }

@Crystalcareai , do you know of any work on this? Will @arcee-ai need a detailed report? These della_linear recipes used to work. Overall, thank you for all the cool work, I hope to get this fixed!

1 reply

liked 6 models 25 days ago

liked a model 26 days ago

CultriX/Qwen2.5-14B-GeneralReasoning

Text Generation • Updated Feb 18 • 30 • 2

updated a model 26 days ago

sometimesanotion/Lamarck-14B-v0.7-Fusion

Text Generation • Updated 26 days ago • 1.72k • 8