Any plan for Q8
To have better performance I think it will be good to have more precisions for the model. Let me know your thoughts about this.
I believe increasing the model's precision could indeed lead to better performance, especially in tasks requiring high accuracy. However, it's important to consider how this might affect other aspects, such as processing speed and resource usage. Striking a balance between precision and efficiency will be key. I'm interested to hear what strategies or techniques others have used to achieve this.
Hey, that is true that it leads to higher performance but also to higher RAM usage as it needs more memory to think through. But there are many kinds of users out there, some with lower configuration and some with higher. Just in my case, I have 32 GB RAM on Mac M2 pro. So for me it's quite easy to run Q8 but there might be some users who has lesser config and hence might need only q4. So generally people give multiple quantization for the models so different users with different configurations can choose the right one for them.
BTW, I came across this model while searching for something else but it felt quite an interesting idea. I am yet to explore it but thanks for the good work.
Thanks for sharing your thoughts! I completely agree that different quantization options are important to cater to a variety of hardware configurations. Offering both Q8 and Q4 versions makes a lot of sense.
This model was actually developed with a specific problem statement in mind, which guided its design and optimization. We're planning to push the Q8 version soon to accommodate higher-performance needs, while still considering users with different configurations. Thanks for your feedback!!!