Does this model still hold it's own?

#3
by SaisExperiments - opened

I'm curious how this model compares to the likes of Nemo based models. I can't go to the size of command-r but I could probably squeeze this model in.
And the dumb question, with the improvement to ppl, would it theoretically be able to extend further with rope scaling without noticeable degradation?

My initial take on NEMO is it is great. (I tested the version for creating Brainstorm versions of it, at my repo).
And the fine tune versions out there are really good. (and getting high marks too!)

And yeah - Command-R is breathtaking.

PST CET is powerful too - but is designed specifically for it's task - rather than general usage like Nemo, Command-R ; I would suggest trying both the Imatrix and non-imatrix versions.
RE: ROPE ; 16k to 24k seems to be it's limit.

To compensate for "rope" ; up the temp ; and add more instructions / more details in your prompts.
I have not done enough NEMO testing yet to give you a better idea comparing it to PSYCET.
However; I will say NEMO is stronger than Llama 3.1 instruct on a creative level.

you may also want to check out the "Grand Horror" versions too - they off the scale in terms of creative output - and not just for "horror".

I'm really hoping the mysterious sus-column-r is a new command model that's smaller in size. I wouldn't call the model super smart but it's reasoning is really really strong, stronger than any model I've been able to run locally at least.
Although I'm not too convinced it is a smaller model with how slow the responses are ×^×

I've found nemo to be a refreshing change from llama-3, I've mainly been using nemomix-4.0 and now nemoremix-4.0 and they're the first models that have come close to the level of the best solar models.

I never really got try llama-2-13B or merges/upscales? Due to the bigger context size. But I finally can cause of cache quantization and 5gb extra vram :3

Owner

Command-R (35B, and larger or smaller) is a unique but "heavy" model to run. It is super dense.
If you compare it to another 35B , T/S for Command-R is about 33% slower.

RE: Llama2-13B.
The love put into these models is impressive , regardless of context size limits.
Tiefighter, Mythomax, TiefighterLR, Psyfighter and others are all top notch.
And likewise 20Bs of these -> PSYCET, Emeryst, DarkForest ... top notch again.

Sign up or log in to comment