Text Generation
Transformers
Safetensors
English
mixtral
Generated from Trainer
axolotl
conversational
Inference Endpoints
text-generation-inference

Congrats!

#1
by aaronday3 - opened

This fine tune is a work of art. It's super smart and super obedient to the system message, way better than 2.9.1.

I think we are getting closer and closer to close source with open source thanks to your great work! :)

I'd say we already beat them in a lot of use cases.

1 week with 8xH100's is crazy too, thats a lot of compute for a finetune. This seems like the real deal certainly!

How much does that cost? I wouldn't mind a WizardLM2-8x22b finetune like this

to rent 8xH100's for a week is roughly around 5 grand USD give or take average pricing

Possibly less, I guess it depends but the quotes im looking at are around there

Cognitive Computations org

1 week with 8xH100's is crazy too, thats a lot of compute for a finetune. This seems like the real deal certainly!

We have some new techniques for FFT we'll share soon - but in total this model took 3 days 22 hours to train.

Cognitive Computations org

oops, I think I forgot to update the model card there

I had no idea it was so expensive. I thought maybe a few hundred bucks...

Thanks for releasing these finetunes ehartford

The H100 is probably within the top 3 most powerful gpu's in the world right now. The H200 is king IIRC and I know AMD has something out to compete. Thus why i think its probably within the top 3 or 4.

Cognitive Computations org

I had no idea it was so expensive. I thought maybe a few hundred bucks...

Thanks for releasing these finetunes ehartford

We have a compute sponsor for most of these models, so while yes it’s very expensive - it’s not coming out of our pocket.

This fine tune is a work of art. It's super smart and super obedient to the system message, way better than 2.9.1.

I think we are getting closer and closer to close source with open source thanks to your great work! :)

I'd say we already beat them in a lot of use cases.

How smart actually?

How smart actually?

I am wondering if it would top the newest qwen model that just came out

Cognitive Computations org
edited 24 days ago

Qwen2 is not yet released.

I really enjoy Dolphin 2.9.2 Mixtral 8x22b. For now it's my favorite Dolphin that's ever been released.

But there will absolutely be a Dolphin trained on Qwen2.

Ah, I thought I saw that it had been released on Reddit but I must have read it wrong. I tried quill which is supposedly an early version and it was decent.

Qwen2 is not yet released.

I really enjoy Dolphin 2.9.2 Mixtral 8x22b. For now it's my favorite Dolphin that's ever been released.

But there will absolutely be a Dolphin trained on Qwen2.

Will it be follow systems prompt good like this finetune?

And Qwen It's quite bad to often insert Chinese into answers, I hope Qwen 2 will fix it.

I hope this model be hosted somewhere so i can try it.

And Qwen It's quite bad to often insert Chinese into answers

Yes I have also witnessed this issue. It seems to plague the qwen models as I have tried other chinese made models and they do not do this.

Sign up or log in to comment