Please create google Gemma-7b (8.5b) based version

by rombodawg - opened Feb 24, 2024

Feb 24, 2024

•

edited Feb 24, 2024

Googles Gemma 7b model is very good, and the instruction based version is quite good at coding. Im sure if you actually trained it for coding it would do amazing. Plus Gemma is a much more general purpose model, if you created a Gemma based OpenCodeInterpreter model, it would not only be amazing at coding, but also have many other use cases outside of coding unlike the coding specific models you used which lack quality in other areas that arent coding.

The larger 7b model would be preferred, but the 2b would be nice also if you are able to consider that one.

Ji-Ha

Feb 24, 2024

@rombodawg
Is it possible to create merges between these models with different architectures?

rombodawg

Feb 24, 2024

@Ji-Ha Across diffrent architectures. No, no one has made a working merge method for merging across architectures. FuseLLM made a fake version, but its not actually merging.

not-lain

Feb 24, 2024

•

edited Feb 24, 2024

I agree, the next OpenCodeInterpreter model they release should not be focusing on coding only, instead they should focus more on creating a google Gemma 8.5b based model wich has more general purpose

aaabiao

Multimodal Art Projection org Feb 24, 2024

Based on your suggestion, we're excited to inform you that we have initiated the process of leveraging the Gemma 7b model to enhance our coding capabilities. This move aligns with our goal to create a more versatile and powerful model that excels not only in coding tasks but also in a broad range of other applications. We anticipate making this enhanced model open source by next week, and we're eager for you to experience its capabilities.

Furthermore, we have made our multi-turn data open source, hoping that it will inspire you and others to experiment with different base models using our dataset. We are looking forward to seeing the innovative models that emerge from these experiments. We also invite you to contribute any refined models back to our organization, fostering a collaborative environment for improvement and innovation.

rombodawg

Feb 24, 2024

@aaabiao This is amazing, you must be a really great company to take solid advice from the little guy. Most companies would ignore others and only follow their own corperate agendy. HUGE props to you

aaabiao

Multimodal Art Projection org Mar 1, 2024

Hi @rombodawg @not-lain ,We're excited to inform you that we have recently open-sourced the OpenCodeInterpreter models based on both Gemma 7b and the Starcoder2 series.

aaabiao changed discussion status to closed Mar 1, 2024

rombodawg

Mar 1, 2024

•

edited Mar 1, 2024

@aaabiao Thank you so much!!! I cant wait to use them! Quick question though, do you plan to do Starcoder v2 15b? The 7b did so well, that i feel like the 15b would be an amazing code interpreter model.

aaabiao

Multimodal Art Projection org Mar 2, 2024

We actually fine-tuned a 15B version but the results look pretty wired. It might be due to the transformer version issue. We are still debugging this. It should be resolved very soon. Will definitely release a 15B version. :)

aaabiao

Multimodal Art Projection org Mar 3, 2024

@aaabiao Thank you so much!!! I cant wait to use them! Quick question though, do you plan to do Starcoder v2 15b? The 7b did so well, that i feel like the 15b would be an amazing code interpreter model.

@rombodawg We've indeed open-sourced the more powerful OpenCodeInterpreter-SC2-15B. Can't wait to see what you'll create with it!

jlzhou

Mar 19, 2024

@aaabiao Thank you for your great work! This model is by far the most powerful coding model I have used!

You mentioned that OpenCodeInterpreter-SC2-15B is more powerful, however from the benchmark section in readme, it seems that this 6.7b model out performs the SC2 15B. I'm not very familiar with those benchmark numbers so correct me if I''m wrong. Can you provide any insights why this model with less params has higher scores?

aaabiao

Multimodal Art Projection org Mar 20, 2024

@aaabiao Thank you for your great work! This model is by far the most powerful coding model I have used!

You mentioned that OpenCodeInterpreter-SC2-15B is more powerful, however from the benchmark section in readme, it seems that this 6.7b model out performs the SC2 15B. I'm not very familiar with those benchmark numbers so correct me if I''m wrong. Can you provide any insights why this model with less params has higher scores?

Thank you for your feedback! The primary reason the 6.7B model outperforms the SC2-15B in benchmarks is due to the underlying base model's strength. In these scenarios, the base model's capabilities significantly outweigh the impact of our SFT efforts. This demonstrates that the DeepSeek 6.7B base model is inherently stronger, leading to its higher scores despite having fewer parameters.

rombodawg

Mar 20, 2024

@aaabiao If thats the case, you should be aware that a really great base model is waiting dormant just needing someone to train it. I may have tagged you before, but just as a reminder the Replete-AI/Mistral-Evolved-11b-v0.1 ai model can code extremely well despite only being a generalized model, similar to gpt 3.5. I'd put its coding performance at around CodeLlama-13b-Instruct or higher, with the addition of having non-coding capabilities. With a finetune its sure to become a really good coding model. I know I said this about Gemma, but transformers code was not ready for Gemma finetuning, and because of that, there was extreme loss with that model, my model however is based on mistral, so there should be no issues finetuning it.

https://huggingface.co/Replete-AI/Mistral-Evolved-11b-v0.1

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment