Please create google Gemma-7b (8.5b) based version

#4
by rombodawg - opened

Googles Gemma 7b model is very good, and the instruction based version is quite good at coding. Im sure if you actually trained it for coding it would do amazing. Plus Gemma is a much more general purpose model, if you created a Gemma based OpenCodeInterpreter model, it would not only be amazing at coding, but also have many other use cases outside of coding unlike the coding specific models you used which lack quality in other areas that arent coding.

The larger 7b model would be preferred, but the 2b would be nice also if you are able to consider that one.

@rombodawg
Is it possible to create merges between these models with different architectures?

@Ji-Ha Across diffrent architectures. No, no one has made a working merge method for merging across architectures. FuseLLM made a fake version, but its not actually merging.

I agree, the next OpenCodeInterpreter model they release should not be focusing on coding only, instead they should focus more on creating a google Gemma 8.5b based model wich has more general purpose

Multimodal Art Projection org

Based on your suggestion, we're excited to inform you that we have initiated the process of leveraging the Gemma 7b model to enhance our coding capabilities. This move aligns with our goal to create a more versatile and powerful model that excels not only in coding tasks but also in a broad range of other applications. We anticipate making this enhanced model open source by next week, and we're eager for you to experience its capabilities.

Furthermore, we have made our multi-turn data open source, hoping that it will inspire you and others to experiment with different base models using our dataset. We are looking forward to seeing the innovative models that emerge from these experiments. We also invite you to contribute any refined models back to our organization, fostering a collaborative environment for improvement and innovation.

@aaabiao This is amazing, you must be a really great company to take solid advice from the little guy. Most companies would ignore others and only follow their own corperate agendy. HUGE props to you

e5903f4ba5b0b4a8c36deeed01e17f66.jpg

Multimodal Art Projection org

Hi @rombodawg @not-lain ,We're excited to inform you that we have recently open-sourced the OpenCodeInterpreter models based on both Gemma 7b and the Starcoder2 series.

aaabiao changed discussion status to closed

@aaabiao Thank you so much!!! I cant wait to use them! Quick question though, do you plan to do Starcoder v2 15b? The 7b did so well, that i feel like the 15b would be an amazing code interpreter model.

Multimodal Art Projection org

We actually fine-tuned a 15B version but the results look pretty wired. It might be due to the transformer version issue. We are still debugging this. It should be resolved very soon. Will definitely release a 15B version. :)

Multimodal Art Projection org

@aaabiao Thank you so much!!! I cant wait to use them! Quick question though, do you plan to do Starcoder v2 15b? The 7b did so well, that i feel like the 15b would be an amazing code interpreter model.

@rombodawg We've indeed open-sourced the more powerful OpenCodeInterpreter-SC2-15B. Can't wait to see what you'll create with it!

@aaabiao Thank you for your great work! This model is by far the most powerful coding model I have used!

You mentioned that OpenCodeInterpreter-SC2-15B is more powerful, however from the benchmark section in readme, it seems that this 6.7b model out performs the SC2 15B. I'm not very familiar with those benchmark numbers so correct me if I''m wrong. Can you provide any insights why this model with less params has higher scores?

Multimodal Art Projection org

@aaabiao Thank you for your great work! This model is by far the most powerful coding model I have used!

You mentioned that OpenCodeInterpreter-SC2-15B is more powerful, however from the benchmark section in readme, it seems that this 6.7b model out performs the SC2 15B. I'm not very familiar with those benchmark numbers so correct me if I''m wrong. Can you provide any insights why this model with less params has higher scores?

Thank you for your feedback! The primary reason the 6.7B model outperforms the SC2-15B in benchmarks is due to the underlying base model's strength. In these scenarios, the base model's capabilities significantly outweigh the impact of our SFT efforts. This demonstrates that the DeepSeek 6.7B base model is inherently stronger, leading to its higher scores despite having fewer parameters.

@aaabiao If thats the case, you should be aware that a really great base model is waiting dormant just needing someone to train it. I may have tagged you before, but just as a reminder the Replete-AI/Mistral-Evolved-11b-v0.1 ai model can code extremely well despite only being a generalized model, similar to gpt 3.5. I'd put its coding performance at around CodeLlama-13b-Instruct or higher, with the addition of having non-coding capabilities. With a finetune its sure to become a really good coding model. I know I said this about Gemma, but transformers code was not ready for Gemma finetuning, and because of that, there was extreme loss with that model, my model however is based on mistral, so there should be no issues finetuning it.

https://huggingface.co/Replete-AI/Mistral-Evolved-11b-v0.1

Sign up or log in to comment