Congrats!

by aaronday3 - opened May 29, 2024

Discussion

aaronday3

May 29, 2024

This fine tune is a work of art. It's super smart and super obedient to the system message, way better than 2.9.1.

I think we are getting closer and closer to close source with open source thanks to your great work! :)

I'd say we already beat them in a lot of use cases.

dillfrescott

May 30, 2024

•

edited May 30, 2024

1 week with 8xH100's is crazy too, thats a lot of compute for a finetune. This seems like the real deal certainly!

gghfez

May 30, 2024

How much does that cost? I wouldn't mind a WizardLM2-8x22b finetune like this

dillfrescott

May 30, 2024

to rent 8xH100's for a week is roughly around 5 grand USD give or take average pricing

dillfrescott

May 30, 2024

Possibly less, I guess it depends but the quotes im looking at are around there

Crystalcareai

Cognitive Computations org May 30, 2024

1 week with 8xH100's is crazy too, thats a lot of compute for a finetune. This seems like the real deal certainly!

We have some new techniques for FFT we'll share soon - but in total this model took 3 days 22 hours to train.

ehartford

Cognitive Computations org May 31, 2024

oops, I think I forgot to update the model card there

gghfez

May 31, 2024

I had no idea it was so expensive. I thought maybe a few hundred bucks...

Thanks for releasing these finetunes ehartford

dillfrescott

May 31, 2024

The H100 is probably within the top 3 most powerful gpu's in the world right now. The H200 is king IIRC and I know AMD has something out to compete. Thus why i think its probably within the top 3 or 4.

Crystalcareai

Cognitive Computations org May 31, 2024

I had no idea it was so expensive. I thought maybe a few hundred bucks...

Thanks for releasing these finetunes ehartford

We have a compute sponsor for most of these models, so while yes it’s very expensive - it’s not coming out of our pocket.

Tungnamh9911

May 31, 2024

This fine tune is a work of art. It's super smart and super obedient to the system message, way better than 2.9.1.

I think we are getting closer and closer to close source with open source thanks to your great work! :)

I'd say we already beat them in a lot of use cases.

How smart actually?

dillfrescott

Jun 1, 2024

How smart actually?

I am wondering if it would top the newest qwen model that just came out

ehartford

Cognitive Computations org Jun 1, 2024

•

edited Jun 1, 2024

Qwen2 is not yet released.

I really enjoy Dolphin 2.9.2 Mixtral 8x22b. For now it's my favorite Dolphin that's ever been released.

But there will absolutely be a Dolphin trained on Qwen2.

dillfrescott

Jun 1, 2024

Ah, I thought I saw that it had been released on Reddit but I must have read it wrong. I tried quill which is supposedly an early version and it was decent.

Tungnamh9911

Jun 4, 2024

•

edited Jun 4, 2024

Qwen2 is not yet released.

I really enjoy Dolphin 2.9.2 Mixtral 8x22b. For now it's my favorite Dolphin that's ever been released.

But there will absolutely be a Dolphin trained on Qwen2.

Will it be follow systems prompt good like this finetune?

And Qwen It's quite bad to often insert Chinese into answers, I hope Qwen 2 will fix it.

I hope this model be hosted somewhere so i can try it.

dillfrescott

Jun 5, 2024

And Qwen It's quite bad to often insert Chinese into answers

Yes I have also witnessed this issue. It seems to plague the qwen models as I have tried other chinese made models and they do not do this.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment