You could make the best open source coding model.

by rombodawg - opened Oct 23, 2023

Oct 23, 2023

The recently released "codebooga" which is wizardcoder34b and phind-codellama34b combined, is the best coding model right now. Possibly matching or beating this model in performance. I highly recommend training codebooga on my LosslessMegacodeV3 (not v2) as i recenty updated and fixed all errors. V3 is much better than V2, has added code and non code instructions which are all high quality.

Code booga ai model:
https://huggingface.co/oobabooga/CodeBooga-34B-v0.1

Megacode v3:
https://huggingface.co/datasets/rombodawg/LosslessMegaCodeTrainingV3_1.6m_Evol

I would do it myself but im living the broke life sooo. 😢

Nondzu

Oct 24, 2023

•

edited Oct 24, 2023

@rombodawg codebooga and phind-codellama34b has similar result (~ 70.1 points) at humaneval+
https://github.com/evalplus/evalplus/

rombodawg

Oct 24, 2023

@Nondzu yes but so does wizardcoder and phind-codellama. the point isnt that they have simlar results. the point is that there is an inscrease in quality. these tests only show a fraction of the real change that the models have in terms of capabilities. Considering that all the increased in coding performance between models have been incremental. This isnt a bad thing, even the diffrence between gpt 3.5 and gpt 4 is less than 12 but the actual coding performance diffrence is huge between those two models.

My point being i firmly believe its worth training a new model using codebooga using my dataset. the real world results will should have a decent improvement, possibly reaching gpt 3.5 levels of coding performance

Hisma

Oct 25, 2023

•

edited Oct 25, 2023

watching this thread. So far I am using @latimars Phind-Codellama-34B-v2 5bpw evol-ins version of this model, here - https://huggingface.co/latimar/Phind-Codellama-34B-v2-exl2
It's the best performing coding model I've used so far, and it does outperform codebooga from my experience.
I'm willing to give codebooga another shot though if someone releases a performant fine-tuned exl2 version in 5bpw (or close to it)

rombodawg

Oct 25, 2023

•

edited Oct 25, 2023

@oobabooga response in another thread made the results more clear.

See

https://github.com/evalplus/evalplus/issues/36#issuecomment-1780001485

and

https://evalplus.github.io/leaderboard.html

here is the thread i mentioned
https://huggingface.co/oobabooga/CodeBooga-34B-v0.1/discussions/2#65397d3f2a25dcfb560e5bcd

latimar

Owner Oct 25, 2023

@rombodawg currently I'm busy re-doing phind quants once again ( @Hisma I think I'll have even better performing 5.0 phind quant soon), and I'm not really convinced that Codebooga is really better than Phind. But I did not have time to quantize it and run tests, but I'm planning to do it...hopefully this weekend...

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment