This is the Leaderboard about ranking my own model :) Also some useful information (Maybe). Main purpose is for Roleplay
Leaderboard
Rank | Name | Parameter | Context Length | Tag | Note |
---|---|---|---|---|---|
π1 | Narumashi-RT | 11B | 4K | Lewd | Good for Roleplay, although it is LLAMA2. Thank Sao10k :) Could handle some (limited) TSF content. |
π2 | NaruMoE | 3x7B | 8K - 32K | Neurral | AVG model, could only handle limited extra content I want. |
β3 | NarumashiRTS | 7B | 8K | Neurral | Base on Kunoichi-7B, so it good enough. Know the extra content. Not lewd and will skip lewd content sometime. |
4 | HyouKan Series | 3x7B | 8K - 32K | Neurral | ATTENTION: DON'T USE GGUF VERSION SINCE IT HAVE SOME BUGS (VARY BY VERSION) All-rounded Roleplay model. Understand well Character Card and good logic. The first version have 8k context lenght. |
5 | SunnyRain | 2x10.7B | 4K | Lewd | To be real, it perform approximate like HyouKan in Roleplay, just got some strange behavious. |
6 | RainyMotip | 2x7B | 32K | Neurral | Good enough model, ok in Roleplay. |
7 | Nutopia | 7B | 32K | Not for Roleplay | I don't think this work for Roleplay, but it good for solving problem |
8 | TripedalChiken | 2x7B | 32K | Not for Roleplay | Solving problem is good, but for Roleplay, I don't think so |
Note:
- Lewd : perform well NSFW content. Some of lewd words will appear in normal content if your Character Card have NSFW informations.
- Neurral : perform well SFW content, can perform well NSFW content (limited maybe). Lewd words will less appear in chat/roleplay than Lewd
- Not for Roleplay : seem that those model with this tag not understand well Character Card. But its logical is very good.
- RT: Rough Translation Dataset that could lead to worse performance than original model.
- CN: Chinese dataset pretrain, maybe not understand extra content in English. (I can't find any good english verion.)
Some experience:
- The Context Length affect too much to your Memory. Let's say I have 16GB Vram card, I can run the model in 2 ways, using Text-Generation-WebUI:
- Inference: download the origin model, apply args:
--load-in-4bit --use_double_quant
. I can run all of my model in leaderboard. The bigger parameter is, the slower token can generate. (Ex:7B model could run in 15 token/s, since 3x7b model could only run in ~4-5 token/s) - GGUF Quantization (Fastest,cheapest way to run): After you downloaded GGUF version of those models, sometimes, you can't run it although you can run other model that have bigger parameter. That because:
- The context length: 16GB VRAM GPU could run maximum 2x10.7B (~ 19.2B) model with 4k context length. (5 token/s)
- That model is bug/broken.π
- Inference: download the origin model, apply args:
- Bigger model will have more information that you need for your Character Card.
- Best GGUF version that you should run (balance speed/performance): Q4_K_M, Q5_K_M (Slower than Q4)
Useful link:
Unable to determine this model's library. Check the
docs
.