Deepseak Laser possible?

by Venkman42 - opened Feb 5

Feb 5

Hi cognitive computation team,

First of all thanks for all the new Laser models, greatly appreciated 😊 they make a big difference, especially on weaker hardware.

Could you please also laser the new 7b deepseak instruct v1. 5?
I haven't found any lasered coding models yet and I feel like LASER could make the difference between mediocre performance and good performance with inference on something like a laptop.

Keep up your great work 💪
Cheers

fernandofernandes

Feb 5

Hi! Nice to hear your feedback.
We will work on that.

Btw, could you provide us detailed feedback on the performance at your laptop?
Was there a performance gain? Could you describe it?

Venkman42

Feb 5

•

edited Feb 5

Hi! Nice to hear your feedback.
We will work on that.

Btw, could you provide us detailed feedback on the performance at your laptop?
Was there a performance gain? Could you describe it?

I didn't compare it on my laptop, but I compared the Dolphin Mistral on my development server(cpu only inference) and the speed of the laser model was more comparable to the 3b models(rocket, phi-2) I tested rather than the 7b models(Openchat, zephyr, dolphin) I tested.

I don't have any numbers unfortunately, but I ran the same text through rocket(3b) and dolphin mistral laser and the latter was faster. It wasn't a huge context size though.

And I'm inferencing using llama.cpp/gguf, so I'm not sure how this will affect testing.

Edit: maybe I can run a little test later if I have enough time and try to get some numbers

fernandofernandes

Feb 5

If you could test it, I'd be grateful 🙏🏼

Venkman42

Feb 5

•

edited Feb 5

@fernandofernandes

Lasered:

Unlasered:

Here is an example from running via python-llama-cpp on my old laptop(convertible wih 7th gen i5 + 8GB DDR4 Sodimm memory, bought in 2017 i think).
One with the original Openchat-3.5-01-06 and one with your lasered version(both in Q3_K_M GGUF at default settings(temp 0.8)).
The difference isn't as big as i thought, but its definitively a noticable difference on crappy hardware haha

fernandofernandes

Feb 5

Great! More or less 10% speed up

DavidGF

Feb 5

@Venkman42
thank you very much for your tests and insights! So there is a small performance increase on your hardware :)

Venkman42

Feb 6

@DavidGF You're welcome 😁
Yeah, it's definitely noticeable.
If you use streaming it can make the difference between readingspeed or too slow for reading speed.

I haven't tested though if it makes a difference between different quantization techniques or different temp settings.

And I think the increase in performance is a little higher with longer contexts, compared to one liners

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment