Deepseak Laser possible?
Hi cognitive computation team,
First of all thanks for all the new Laser models, greatly appreciated π they make a big difference, especially on weaker hardware.
Could you please also laser the new 7b deepseak instruct v1. 5?
I haven't found any lasered coding models yet and I feel like LASER could make the difference between mediocre performance and good performance with inference on something like a laptop.
Keep up your great work πͺ
Cheers
Hi! Nice to hear your feedback.
We will work on that.
Btw, could you provide us detailed feedback on the performance at your laptop?
Was there a performance gain? Could you describe it?
Hi! Nice to hear your feedback.
We will work on that.Btw, could you provide us detailed feedback on the performance at your laptop?
Was there a performance gain? Could you describe it?
I didn't compare it on my laptop, but I compared the Dolphin Mistral on my development server(cpu only inference) and the speed of the laser model was more comparable to the 3b models(rocket, phi-2) I tested rather than the 7b models(Openchat, zephyr, dolphin) I tested.
I don't have any numbers unfortunately, but I ran the same text through rocket(3b) and dolphin mistral laser and the latter was faster. It wasn't a huge context size though.
And I'm inferencing using llama.cpp/gguf, so I'm not sure how this will affect testing.
Edit: maybe I can run a little test later if I have enough time and try to get some numbers
If you could test it, I'd be grateful ππΌ
Here is an example from running via python-llama-cpp on my old laptop(convertible wih 7th gen i5 + 8GB DDR4 Sodimm memory, bought in 2017 i think).
One with the original Openchat-3.5-01-06 and one with your lasered version(both in Q3_K_M GGUF at default settings(temp 0.8)).
The difference isn't as big as i thought, but its definitively a noticable difference on crappy hardware haha
Great! More or less 10% speed up
@Venkman42
thank you very much for your tests and insights! So there is a small performance increase on your hardware :)
@DavidGF
You're welcome π
Yeah, it's definitely noticeable.
If you use streaming it can make the difference between readingspeed or too slow for reading speed.
I haven't tested though if it makes a difference between different quantization techniques or different temp settings.
And I think the increase in performance is a little higher with longer contexts, compared to one liners