You could make the best open source coding model.
The recently released "codebooga" which is wizardcoder34b and phind-codellama34b combined, is the best coding model right now. Possibly matching or beating this model in performance. I highly recommend training codebooga on my LosslessMegacodeV3 (not v2) as i recenty updated and fixed all errors. V3 is much better than V2, has added code and non code instructions which are all high quality.
Code booga ai model:
https://huggingface.co/oobabooga/CodeBooga-34B-v0.1
Megacode v3:
https://huggingface.co/datasets/rombodawg/LosslessMegaCodeTrainingV3_1.6m_Evol
I would do it myself but im living the broke life sooo. π’
@rombodawg
codebooga and phind-codellama34b has similar result (~ 70.1 points) at humaneval+
https://github.com/evalplus/evalplus/
@Nondzu yes but so does wizardcoder and phind-codellama. the point isnt that they have simlar results. the point is that there is an inscrease in quality. these tests only show a fraction of the real change that the models have in terms of capabilities. Considering that all the increased in coding performance between models have been incremental. This isnt a bad thing, even the diffrence between gpt 3.5 and gpt 4 is less than 12 but the actual coding performance diffrence is huge between those two models.
My point being i firmly believe its worth training a new model using codebooga using my dataset. the real world results will should have a decent improvement, possibly reaching gpt 3.5 levels of coding performance
watching this thread. So far I am using @latimars Phind-Codellama-34B-v2 5bpw evol-ins version of this model, here - https://huggingface.co/latimar/Phind-Codellama-34B-v2-exl2
It's the best performing coding model I've used so far, and it does outperform codebooga from my experience.
I'm willing to give codebooga another shot though if someone releases a performant fine-tuned exl2 version in 5bpw (or close to it)
@oobabooga response in another thread made the results more clear.
See
https://github.com/evalplus/evalplus/issues/36#issuecomment-1780001485
and
https://evalplus.github.io/leaderboard.html
here is the thread i mentioned
https://huggingface.co/oobabooga/CodeBooga-34B-v0.1/discussions/2#65397d3f2a25dcfb560e5bcd
@rombodawg currently I'm busy re-doing phind quants once again ( @Hisma I think I'll have even better performing 5.0 phind quant soon), and I'm not really convinced that Codebooga is really better than Phind. But I did not have time to quantize it and run tests, but I'm planning to do it...hopefully this weekend...