[Model] About reasonable use

#410
by Mindires - opened

First, Thanks very much for huggingface for providing so many cutting-edge LLMs for free!

But recently, I have noticed that the overload of command-R-Plus in hot time is becoming more and more serious. Maybe adding a quantitative version for most users' daily use is more conducive to balance?

I mean, if the quality is acceptable and the generation speed is greatly improved, most of the time people will like it faster than better, especially when voice chat may be include in the future.

I have another solution to this problem, but first, we need to understand why the Command R is overloaded.

  1. Command R+ is the default mode, leading to its widespread use by both new bots and older bots whose models have been deleted.
  2. Command R+ is in high demand, likely due to its superior performance over ChatGPT.

Here's a potential solution:

  1. Introduce a new 4-bit quantized R+ model, as suggested by @Mindires , and designate it as Command R+ Lite to become the new default.
  2. Alternatively, set Llama 3 or another lightweight model as the default (my preference would be any lite model).
  3. Temporarily remove R+ for 5 minutes to shift all models to the new default, then reintroduce R+.
  1. Alternatively, set Llama 3 or another lightweight model as the default (my preference would be any lite model).

I think I will see more overload now - llama3 70B replaces Mistral 7B for search result summary and chat title generation.

TASK_MODEL='meta-llama/Meta-Llama-3-70B-Instruct'

Sign up or log in to comment