Merge with 32b coder?

by RDson - opened 2 days ago

2 days ago

•

Hi. Out of curiosity, was there any attempts to merge the Qwen 2.5 32b coder model? If not, is this something you are willing to try or do?

Edit: I tried this myself by using their merge technique as they describe it. RDson/CoderO1-DeepSeekR1-Coder-32B-Preview and RDson/CoderO1-DeepSeekR1-Coder-14B-Preview

sm54

about 17 hours ago

Hi. Out of curiosity, was there any attempts to merge the Qwen 2.5 32b coder model? If not, is this something you are willing to try or do?

Edit: I tried this myself by using their merge technique as they describe it. RDson/CoderO1-DeepSeekR1-Coder-32B-Preview and RDson/CoderO1-DeepSeekR1-Coder-14B-Preview

Have you tested or benchmarked your merges? Was thinking of giving the 14b a try.

Wanfq

FuseAI org about 16 hours ago

We find the evaluation results for math and code are not correct in our current version. To address this issue, we use the code from Qwen2.5-Math and Qwen2.5-Coder for math and code evaluation. With this approach, we have successfully reproduced the results reported in the DeepSeek-R1 paper. We will update all the results—including those for this coding model—tomorrow. Please stay tuned. Thank you for use FuseO1-Preview.

Wanfq

FuseAI org about 16 hours ago

Here is our merged coding model: https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview

RDson

about 16 hours ago

Hi. Out of curiosity, was there any attempts to merge the Qwen 2.5 32b coder model? If not, is this something you are willing to try or do?

Edit: I tried this myself by using their merge technique as they describe it. RDson/CoderO1-DeepSeekR1-Coder-32B-Preview and RDson/CoderO1-DeepSeekR1-Coder-14B-Preview

Have you tested or benchmarked your merges? Was thinking of giving the 14b a try.

I have not had the time to run any benchmarks due to limited resources. You would have to try it out yourself.

We find the evaluation results for math and code are not correct in our current version. To address this issue, we use the code from Qwen2.5-Math and Qwen2.5-Coder for math and code evaluation. With this approach, we have successfully reproduced the results reported in the DeepSeek-R1 paper. We will update all the results—including those for this coding model—tomorrow. Please stay tuned. Thank you for use FuseO1-Preview.

Thank you for your time making this coder model too. I hope it turned out well!

AaronFeng753

about 4 hours ago

@Wanfq

With this approach, we have successfully reproduced the results reported in the DeepSeek-R1 paper.

Could you share your approach? Thanks

AaronFeng753

about 4 hours ago

Please share the details as well, like temperature, top p&k, repeat penalty

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment