question

#1
by Aryanne - opened

can you try to quantize this model
https://huggingface.co/niallturbitt/mpt-3b-8k-instruct
I Have not enough memory to do it

Just uploaded quants but llama.cpp is having issues loading them. Full output below, trying to debug but this is my first MPT quant... Maybe a custom_code issue? Param count doesn't seem to match repo name either.

Broken quants: https://huggingface.co/afrideva/mpt-3b-8k-instruct-GGUF
Using latest commit of llama.cpp: https://github.com/ggerganov/llama.cpp/tree/57ad015dc3011b046ed5a23186c86ea55f987c54

Log start
main: build = 1500 (57ad015)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: seed  = 1699510135
llama_model_loader: loaded meta data with 19 key-value pairs and 292 tensors from mpt-3b-8k-instruct/mpt-3b-8k-instruct.q8_0.gguf (version GGUF V3 (latest))
llama_model_loader: - tensor    0:                token_embd.weight q8_0     [  2048, 50368,     1,     1 ]
llama_model_loader: - tensor    1:                    output.weight q8_0     [  2048, 50368,     1,     1 ]
llama_model_loader: - tensor    2:           blk.0.attn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor    3:             blk.0.attn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor    4:            blk.0.attn_qkv.weight q8_0     [  2048,  6144,     1,     1 ]
llama_model_loader: - tensor    5:              blk.0.attn_qkv.bias f32      [  6144,     1,     1,     1 ]
llama_model_loader: - tensor    6:         blk.0.attn_output.weight q8_0     [  2048,  2048,     1,     1 ]
llama_model_loader: - tensor    7:           blk.0.attn_output.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor    8:            blk.0.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor    9:              blk.0.ffn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   10:              blk.0.ffn_up.weight q8_0     [  2048,  8192,     1,     1 ]
llama_model_loader: - tensor   11:                blk.0.ffn_up.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   12:            blk.0.ffn_down.weight q8_0     [  8192,  2048,     1,     1 ]
llama_model_loader: - tensor   13:              blk.0.ffn_down.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   14:           blk.1.attn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   15:             blk.1.attn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   16:            blk.1.attn_qkv.weight q8_0     [  2048,  6144,     1,     1 ]
llama_model_loader: - tensor   17:              blk.1.attn_qkv.bias f32      [  6144,     1,     1,     1 ]
llama_model_loader: - tensor   18:         blk.1.attn_output.weight q8_0     [  2048,  2048,     1,     1 ]
llama_model_loader: - tensor   19:           blk.1.attn_output.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   20:            blk.1.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   21:              blk.1.ffn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   22:              blk.1.ffn_up.weight q8_0     [  2048,  8192,     1,     1 ]
llama_model_loader: - tensor   23:                blk.1.ffn_up.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   24:            blk.1.ffn_down.weight q8_0     [  8192,  2048,     1,     1 ]
llama_model_loader: - tensor   25:              blk.1.ffn_down.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   26:           blk.2.attn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   27:             blk.2.attn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   28:            blk.2.attn_qkv.weight q8_0     [  2048,  6144,     1,     1 ]
llama_model_loader: - tensor   29:              blk.2.attn_qkv.bias f32      [  6144,     1,     1,     1 ]
llama_model_loader: - tensor   30:         blk.2.attn_output.weight q8_0     [  2048,  2048,     1,     1 ]
llama_model_loader: - tensor   31:           blk.2.attn_output.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   32:            blk.2.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   33:              blk.2.ffn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   34:              blk.2.ffn_up.weight q8_0     [  2048,  8192,     1,     1 ]
llama_model_loader: - tensor   35:                blk.2.ffn_up.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   36:            blk.2.ffn_down.weight q8_0     [  8192,  2048,     1,     1 ]
llama_model_loader: - tensor   37:              blk.2.ffn_down.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   38:           blk.3.attn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   39:             blk.3.attn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   40:            blk.3.attn_qkv.weight q8_0     [  2048,  6144,     1,     1 ]
llama_model_loader: - tensor   41:              blk.3.attn_qkv.bias f32      [  6144,     1,     1,     1 ]
llama_model_loader: - tensor   42:         blk.3.attn_output.weight q8_0     [  2048,  2048,     1,     1 ]
llama_model_loader: - tensor   43:           blk.3.attn_output.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   44:            blk.3.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   45:              blk.3.ffn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   46:              blk.3.ffn_up.weight q8_0     [  2048,  8192,     1,     1 ]
llama_model_loader: - tensor   47:                blk.3.ffn_up.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   48:            blk.3.ffn_down.weight q8_0     [  8192,  2048,     1,     1 ]
llama_model_loader: - tensor   49:              blk.3.ffn_down.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   50:           blk.4.attn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   51:             blk.4.attn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   52:            blk.4.attn_qkv.weight q8_0     [  2048,  6144,     1,     1 ]
llama_model_loader: - tensor   53:              blk.4.attn_qkv.bias f32      [  6144,     1,     1,     1 ]
llama_model_loader: - tensor   54:         blk.4.attn_output.weight q8_0     [  2048,  2048,     1,     1 ]
llama_model_loader: - tensor   55:           blk.4.attn_output.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   56:            blk.4.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   57:              blk.4.ffn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   58:              blk.4.ffn_up.weight q8_0     [  2048,  8192,     1,     1 ]
llama_model_loader: - tensor   59:                blk.4.ffn_up.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   60:            blk.4.ffn_down.weight q8_0     [  8192,  2048,     1,     1 ]
llama_model_loader: - tensor   61:              blk.4.ffn_down.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   62:           blk.5.attn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   63:             blk.5.attn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   64:            blk.5.attn_qkv.weight q8_0     [  2048,  6144,     1,     1 ]
llama_model_loader: - tensor   65:              blk.5.attn_qkv.bias f32      [  6144,     1,     1,     1 ]
llama_model_loader: - tensor   66:         blk.5.attn_output.weight q8_0     [  2048,  2048,     1,     1 ]
llama_model_loader: - tensor   67:           blk.5.attn_output.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   68:            blk.5.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   69:              blk.5.ffn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   70:              blk.5.ffn_up.weight q8_0     [  2048,  8192,     1,     1 ]
llama_model_loader: - tensor   71:                blk.5.ffn_up.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   72:            blk.5.ffn_down.weight q8_0     [  8192,  2048,     1,     1 ]
llama_model_loader: - tensor   73:              blk.5.ffn_down.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   74:           blk.6.attn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   75:             blk.6.attn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   76:            blk.6.attn_qkv.weight q8_0     [  2048,  6144,     1,     1 ]
llama_model_loader: - tensor   77:              blk.6.attn_qkv.bias f32      [  6144,     1,     1,     1 ]
llama_model_loader: - tensor   78:         blk.6.attn_output.weight q8_0     [  2048,  2048,     1,     1 ]
llama_model_loader: - tensor   79:           blk.6.attn_output.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   80:            blk.6.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   81:              blk.6.ffn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   82:              blk.6.ffn_up.weight q8_0     [  2048,  8192,     1,     1 ]
llama_model_loader: - tensor   83:                blk.6.ffn_up.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   84:            blk.6.ffn_down.weight q8_0     [  8192,  2048,     1,     1 ]
llama_model_loader: - tensor   85:              blk.6.ffn_down.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   86:           blk.7.attn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   87:             blk.7.attn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   88:            blk.7.attn_qkv.weight q8_0     [  2048,  6144,     1,     1 ]
llama_model_loader: - tensor   89:              blk.7.attn_qkv.bias f32      [  6144,     1,     1,     1 ]
llama_model_loader: - tensor   90:         blk.7.attn_output.weight q8_0     [  2048,  2048,     1,     1 ]
llama_model_loader: - tensor   91:           blk.7.attn_output.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   92:            blk.7.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   93:              blk.7.ffn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   94:              blk.7.ffn_up.weight q8_0     [  2048,  8192,     1,     1 ]
llama_model_loader: - tensor   95:                blk.7.ffn_up.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor   96:            blk.7.ffn_down.weight q8_0     [  8192,  2048,     1,     1 ]
llama_model_loader: - tensor   97:              blk.7.ffn_down.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   98:           blk.8.attn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor   99:             blk.8.attn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  100:            blk.8.attn_qkv.weight q8_0     [  2048,  6144,     1,     1 ]
llama_model_loader: - tensor  101:              blk.8.attn_qkv.bias f32      [  6144,     1,     1,     1 ]
llama_model_loader: - tensor  102:         blk.8.attn_output.weight q8_0     [  2048,  2048,     1,     1 ]
llama_model_loader: - tensor  103:           blk.8.attn_output.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  104:            blk.8.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  105:              blk.8.ffn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  106:              blk.8.ffn_up.weight q8_0     [  2048,  8192,     1,     1 ]
llama_model_loader: - tensor  107:                blk.8.ffn_up.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  108:            blk.8.ffn_down.weight q8_0     [  8192,  2048,     1,     1 ]
llama_model_loader: - tensor  109:              blk.8.ffn_down.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  110:           blk.9.attn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  111:             blk.9.attn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  112:            blk.9.attn_qkv.weight q8_0     [  2048,  6144,     1,     1 ]
llama_model_loader: - tensor  113:              blk.9.attn_qkv.bias f32      [  6144,     1,     1,     1 ]
llama_model_loader: - tensor  114:         blk.9.attn_output.weight q8_0     [  2048,  2048,     1,     1 ]
llama_model_loader: - tensor  115:           blk.9.attn_output.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  116:            blk.9.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  117:              blk.9.ffn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  118:              blk.9.ffn_up.weight q8_0     [  2048,  8192,     1,     1 ]
llama_model_loader: - tensor  119:                blk.9.ffn_up.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  120:            blk.9.ffn_down.weight q8_0     [  8192,  2048,     1,     1 ]
llama_model_loader: - tensor  121:              blk.9.ffn_down.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  122:          blk.10.attn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  123:            blk.10.attn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  124:           blk.10.attn_qkv.weight q8_0     [  2048,  6144,     1,     1 ]
llama_model_loader: - tensor  125:             blk.10.attn_qkv.bias f32      [  6144,     1,     1,     1 ]
llama_model_loader: - tensor  126:        blk.10.attn_output.weight q8_0     [  2048,  2048,     1,     1 ]
llama_model_loader: - tensor  127:          blk.10.attn_output.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  128:           blk.10.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  129:             blk.10.ffn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  130:             blk.10.ffn_up.weight q8_0     [  2048,  8192,     1,     1 ]
llama_model_loader: - tensor  131:               blk.10.ffn_up.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  132:           blk.10.ffn_down.weight q8_0     [  8192,  2048,     1,     1 ]
llama_model_loader: - tensor  133:             blk.10.ffn_down.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  134:          blk.11.attn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  135:            blk.11.attn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  136:           blk.11.attn_qkv.weight q8_0     [  2048,  6144,     1,     1 ]
llama_model_loader: - tensor  137:             blk.11.attn_qkv.bias f32      [  6144,     1,     1,     1 ]
llama_model_loader: - tensor  138:        blk.11.attn_output.weight q8_0     [  2048,  2048,     1,     1 ]
llama_model_loader: - tensor  139:          blk.11.attn_output.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  140:           blk.11.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  141:             blk.11.ffn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  142:             blk.11.ffn_up.weight q8_0     [  2048,  8192,     1,     1 ]
llama_model_loader: - tensor  143:               blk.11.ffn_up.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  144:           blk.11.ffn_down.weight q8_0     [  8192,  2048,     1,     1 ]
llama_model_loader: - tensor  145:             blk.11.ffn_down.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  146:          blk.12.attn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  147:            blk.12.attn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  148:           blk.12.attn_qkv.weight q8_0     [  2048,  6144,     1,     1 ]
llama_model_loader: - tensor  149:             blk.12.attn_qkv.bias f32      [  6144,     1,     1,     1 ]
llama_model_loader: - tensor  150:        blk.12.attn_output.weight q8_0     [  2048,  2048,     1,     1 ]
llama_model_loader: - tensor  151:          blk.12.attn_output.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  152:           blk.12.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  153:             blk.12.ffn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  154:             blk.12.ffn_up.weight q8_0     [  2048,  8192,     1,     1 ]
llama_model_loader: - tensor  155:               blk.12.ffn_up.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  156:           blk.12.ffn_down.weight q8_0     [  8192,  2048,     1,     1 ]
llama_model_loader: - tensor  157:             blk.12.ffn_down.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  158:          blk.13.attn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  159:            blk.13.attn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  160:           blk.13.attn_qkv.weight q8_0     [  2048,  6144,     1,     1 ]
llama_model_loader: - tensor  161:             blk.13.attn_qkv.bias f32      [  6144,     1,     1,     1 ]
llama_model_loader: - tensor  162:        blk.13.attn_output.weight q8_0     [  2048,  2048,     1,     1 ]
llama_model_loader: - tensor  163:          blk.13.attn_output.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  164:           blk.13.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  165:             blk.13.ffn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  166:             blk.13.ffn_up.weight q8_0     [  2048,  8192,     1,     1 ]
llama_model_loader: - tensor  167:               blk.13.ffn_up.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  168:           blk.13.ffn_down.weight q8_0     [  8192,  2048,     1,     1 ]
llama_model_loader: - tensor  169:             blk.13.ffn_down.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  170:          blk.14.attn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  171:            blk.14.attn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  172:           blk.14.attn_qkv.weight q8_0     [  2048,  6144,     1,     1 ]
llama_model_loader: - tensor  173:             blk.14.attn_qkv.bias f32      [  6144,     1,     1,     1 ]
llama_model_loader: - tensor  174:        blk.14.attn_output.weight q8_0     [  2048,  2048,     1,     1 ]
llama_model_loader: - tensor  175:          blk.14.attn_output.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  176:           blk.14.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  177:             blk.14.ffn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  178:             blk.14.ffn_up.weight q8_0     [  2048,  8192,     1,     1 ]
llama_model_loader: - tensor  179:               blk.14.ffn_up.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  180:           blk.14.ffn_down.weight q8_0     [  8192,  2048,     1,     1 ]
llama_model_loader: - tensor  181:             blk.14.ffn_down.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  182:          blk.15.attn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  183:            blk.15.attn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  184:           blk.15.attn_qkv.weight q8_0     [  2048,  6144,     1,     1 ]
llama_model_loader: - tensor  185:             blk.15.attn_qkv.bias f32      [  6144,     1,     1,     1 ]
llama_model_loader: - tensor  186:        blk.15.attn_output.weight q8_0     [  2048,  2048,     1,     1 ]
llama_model_loader: - tensor  187:          blk.15.attn_output.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  188:           blk.15.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  189:             blk.15.ffn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  190:             blk.15.ffn_up.weight q8_0     [  2048,  8192,     1,     1 ]
llama_model_loader: - tensor  191:               blk.15.ffn_up.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  192:           blk.15.ffn_down.weight q8_0     [  8192,  2048,     1,     1 ]
llama_model_loader: - tensor  193:             blk.15.ffn_down.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  194:          blk.16.attn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  195:            blk.16.attn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  196:           blk.16.attn_qkv.weight q8_0     [  2048,  6144,     1,     1 ]
llama_model_loader: - tensor  197:             blk.16.attn_qkv.bias f32      [  6144,     1,     1,     1 ]
llama_model_loader: - tensor  198:        blk.16.attn_output.weight q8_0     [  2048,  2048,     1,     1 ]
llama_model_loader: - tensor  199:          blk.16.attn_output.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  200:           blk.16.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  201:             blk.16.ffn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  202:             blk.16.ffn_up.weight q8_0     [  2048,  8192,     1,     1 ]
llama_model_loader: - tensor  203:               blk.16.ffn_up.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  204:           blk.16.ffn_down.weight q8_0     [  8192,  2048,     1,     1 ]
llama_model_loader: - tensor  205:             blk.16.ffn_down.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  206:          blk.17.attn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  207:            blk.17.attn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  208:           blk.17.attn_qkv.weight q8_0     [  2048,  6144,     1,     1 ]
llama_model_loader: - tensor  209:             blk.17.attn_qkv.bias f32      [  6144,     1,     1,     1 ]
llama_model_loader: - tensor  210:        blk.17.attn_output.weight q8_0     [  2048,  2048,     1,     1 ]
llama_model_loader: - tensor  211:          blk.17.attn_output.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  212:           blk.17.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  213:             blk.17.ffn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  214:             blk.17.ffn_up.weight q8_0     [  2048,  8192,     1,     1 ]
llama_model_loader: - tensor  215:               blk.17.ffn_up.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  216:           blk.17.ffn_down.weight q8_0     [  8192,  2048,     1,     1 ]
llama_model_loader: - tensor  217:             blk.17.ffn_down.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  218:          blk.18.attn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  219:            blk.18.attn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  220:           blk.18.attn_qkv.weight q8_0     [  2048,  6144,     1,     1 ]
llama_model_loader: - tensor  221:             blk.18.attn_qkv.bias f32      [  6144,     1,     1,     1 ]
llama_model_loader: - tensor  222:        blk.18.attn_output.weight q8_0     [  2048,  2048,     1,     1 ]
llama_model_loader: - tensor  223:          blk.18.attn_output.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  224:           blk.18.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  225:             blk.18.ffn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  226:             blk.18.ffn_up.weight q8_0     [  2048,  8192,     1,     1 ]
llama_model_loader: - tensor  227:               blk.18.ffn_up.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  228:           blk.18.ffn_down.weight q8_0     [  8192,  2048,     1,     1 ]
llama_model_loader: - tensor  229:             blk.18.ffn_down.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  230:          blk.19.attn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  231:            blk.19.attn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  232:           blk.19.attn_qkv.weight q8_0     [  2048,  6144,     1,     1 ]
llama_model_loader: - tensor  233:             blk.19.attn_qkv.bias f32      [  6144,     1,     1,     1 ]
llama_model_loader: - tensor  234:        blk.19.attn_output.weight q8_0     [  2048,  2048,     1,     1 ]
llama_model_loader: - tensor  235:          blk.19.attn_output.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  236:           blk.19.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  237:             blk.19.ffn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  238:             blk.19.ffn_up.weight q8_0     [  2048,  8192,     1,     1 ]
llama_model_loader: - tensor  239:               blk.19.ffn_up.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  240:           blk.19.ffn_down.weight q8_0     [  8192,  2048,     1,     1 ]
llama_model_loader: - tensor  241:             blk.19.ffn_down.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  242:          blk.20.attn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  243:            blk.20.attn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  244:           blk.20.attn_qkv.weight q8_0     [  2048,  6144,     1,     1 ]
llama_model_loader: - tensor  245:             blk.20.attn_qkv.bias f32      [  6144,     1,     1,     1 ]
llama_model_loader: - tensor  246:        blk.20.attn_output.weight q8_0     [  2048,  2048,     1,     1 ]
llama_model_loader: - tensor  247:          blk.20.attn_output.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  248:           blk.20.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  249:             blk.20.ffn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  250:             blk.20.ffn_up.weight q8_0     [  2048,  8192,     1,     1 ]
llama_model_loader: - tensor  251:               blk.20.ffn_up.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  252:           blk.20.ffn_down.weight q8_0     [  8192,  2048,     1,     1 ]
llama_model_loader: - tensor  253:             blk.20.ffn_down.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  254:          blk.21.attn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  255:            blk.21.attn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  256:           blk.21.attn_qkv.weight q8_0     [  2048,  6144,     1,     1 ]
llama_model_loader: - tensor  257:             blk.21.attn_qkv.bias f32      [  6144,     1,     1,     1 ]
llama_model_loader: - tensor  258:        blk.21.attn_output.weight q8_0     [  2048,  2048,     1,     1 ]
llama_model_loader: - tensor  259:          blk.21.attn_output.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  260:           blk.21.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  261:             blk.21.ffn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  262:             blk.21.ffn_up.weight q8_0     [  2048,  8192,     1,     1 ]
llama_model_loader: - tensor  263:               blk.21.ffn_up.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  264:           blk.21.ffn_down.weight q8_0     [  8192,  2048,     1,     1 ]
llama_model_loader: - tensor  265:             blk.21.ffn_down.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  266:          blk.22.attn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  267:            blk.22.attn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  268:           blk.22.attn_qkv.weight q8_0     [  2048,  6144,     1,     1 ]
llama_model_loader: - tensor  269:             blk.22.attn_qkv.bias f32      [  6144,     1,     1,     1 ]
llama_model_loader: - tensor  270:        blk.22.attn_output.weight q8_0     [  2048,  2048,     1,     1 ]
llama_model_loader: - tensor  271:          blk.22.attn_output.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  272:           blk.22.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  273:             blk.22.ffn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  274:             blk.22.ffn_up.weight q8_0     [  2048,  8192,     1,     1 ]
llama_model_loader: - tensor  275:               blk.22.ffn_up.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  276:           blk.22.ffn_down.weight q8_0     [  8192,  2048,     1,     1 ]
llama_model_loader: - tensor  277:             blk.22.ffn_down.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  278:          blk.23.attn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  279:            blk.23.attn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  280:           blk.23.attn_qkv.weight q8_0     [  2048,  6144,     1,     1 ]
llama_model_loader: - tensor  281:             blk.23.attn_qkv.bias f32      [  6144,     1,     1,     1 ]
llama_model_loader: - tensor  282:        blk.23.attn_output.weight q8_0     [  2048,  2048,     1,     1 ]
llama_model_loader: - tensor  283:          blk.23.attn_output.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  284:           blk.23.ffn_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  285:             blk.23.ffn_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  286:             blk.23.ffn_up.weight q8_0     [  2048,  8192,     1,     1 ]
llama_model_loader: - tensor  287:               blk.23.ffn_up.bias f32      [  8192,     1,     1,     1 ]
llama_model_loader: - tensor  288:           blk.23.ffn_down.weight q8_0     [  8192,  2048,     1,     1 ]
llama_model_loader: - tensor  289:             blk.23.ffn_down.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  290:               output_norm.weight f32      [  2048,     1,     1,     1 ]
llama_model_loader: - tensor  291:                 output_norm.bias f32      [  2048,     1,     1,     1 ]
llama_model_loader: - kv   0:                       general.architecture str     
llama_model_loader: - kv   1:                               general.name str     
llama_model_loader: - kv   2:                         mpt.context_length u32     
llama_model_loader: - kv   3:                       mpt.embedding_length u32     
llama_model_loader: - kv   4:                            mpt.block_count u32     
llama_model_loader: - kv   5:                    mpt.feed_forward_length u32     
llama_model_loader: - kv   6:                   mpt.attention.head_count u32     
llama_model_loader: - kv   7:           mpt.attention.layer_norm_epsilon f32     
llama_model_loader: - kv   8:               mpt.attention.max_alibi_bias f32     
llama_model_loader: - kv   9:                       tokenizer.ggml.model str     
llama_model_loader: - kv  10:                      tokenizer.ggml.tokens arr     
llama_model_loader: - kv  11:                  tokenizer.ggml.token_type arr     
llama_model_loader: - kv  12:                      tokenizer.ggml.merges arr     
llama_model_loader: - kv  13:                tokenizer.ggml.bos_token_id u32     
llama_model_loader: - kv  14:                tokenizer.ggml.eos_token_id u32     
llama_model_loader: - kv  15:            tokenizer.ggml.unknown_token_id u32     
llama_model_loader: - kv  16:            tokenizer.ggml.padding_token_id u32     
llama_model_loader: - kv  17:               general.quantization_version u32     
llama_model_loader: - kv  18:                          general.file_type u32     
llama_model_loader: - type  f32:  194 tensors
llama_model_loader: - type q8_0:   98 tensors
llm_load_vocab: mismatch in special tokens definition ( 95/50368 vs 116/50368 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = mpt
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 50368
llm_load_print_meta: n_merges         = 50009
llm_load_print_meta: n_ctx_train      = 8192
llm_load_print_meta: n_embd           = 2048
llm_load_print_meta: n_head           = 16
llm_load_print_meta: n_head_kv        = 16
llm_load_print_meta: n_layer          = 24
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: f_norm_eps       = 1.0e-05
llm_load_print_meta: f_norm_rms_eps   = 0.0e+00
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 8.0e+00
llm_load_print_meta: n_ff             = 8192
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 8192
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = ?B
llm_load_print_meta: model ftype      = mostly Q8_0
llm_load_print_meta: model params     = 1.41 B
llm_load_print_meta: model size       = 1.40 GiB (8.51 BPW) 
llm_load_print_meta: general.name   = mpt-3b-8k-instruct
llm_load_print_meta: BOS token = 0 '<|endoftext|>'
llm_load_print_meta: EOS token = 0 '<|endoftext|>'
llm_load_print_meta: UNK token = 0 '<|endoftext|>'
llm_load_print_meta: PAD token = 0 '<|endoftext|>'
llm_load_print_meta: LF token  = 128 'Γ„'
llm_load_tensors: ggml ctx size =    0.11 MB
error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 147
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'mpt-3b-8k-instruct/mpt-3b-8k-instruct.q8_0.gguf'
main: error: unable to load model

oh, thanks for trying, there must be a problem with the model πŸ‘

Aryanne changed discussion status to closed

now I see, by the file size the model is broken, a normal 3b float16 is around 6,85 gb

he uploaded again, now seems to be around 6,85 gb
https://huggingface.co/niallturbitt/mpt-3b-8k-instruct/tree/main

by the way sorry annoying you.

https://huggingface.co/afrideva/mpt-3b-8k-instruct-GGUF/blob/main/mpt-3b-8k-instruct.q2_k.gguf
Just uploaded the q2_k, working on rest. Happy to help anytime if able.

Many thanks for all your smaller model quants!

llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = mpt
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 50432
llm_load_print_meta: n_merges         = 50009
llm_load_print_meta: n_ctx_train      = 8192
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 32
llm_load_print_meta: n_layer          = 16
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: f_norm_eps       = 1.0e-05
llm_load_print_meta: f_norm_rms_eps   = 0.0e+00
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 8.0e+00
llm_load_print_meta: n_ff             = 16384
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 8192
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = ?B
llm_load_print_meta: model ftype      = mostly Q2_K
llm_load_print_meta: model params     = 3.63 B
llm_load_print_meta: model size       = 1.43 GiB (3.39 BPW) 
llm_load_print_meta: general.name   = mpt-3b-8k-instruct
llm_load_print_meta: BOS token = 0 '<|endoftext|>'
llm_load_print_meta: EOS token = 0 '<|endoftext|>'
llm_load_print_meta: UNK token = 0 '<|endoftext|>'
llm_load_print_meta: PAD token = 0 '<|endoftext|>'
llm_load_print_meta: LF token  = 128 'Γ„'
llm_load_tensors: ggml ctx size =    0.04 MB
llm_load_tensors: mem required  = 1468.79 MB
..........................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size  =  128.00 MB
llama_build_graph: non-view tensors processed: 324/324
llama_new_context_with_model: compute buffer total size = 121.13 MB

system_info: n_threads = 2 / 2 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 
sampling: 
    repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
    top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
    mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
generate: n_ctx = 512, n_batch = 512, n_predict = 128, n_keep = 0


### Instruction:
Please give me a color representing Namibia's beauty.

### Response:
A light blue color could represent the country ofNamibia, which is in Southern Africa and has some of the most beautiful beaches on earth! [end of text]

llama_print_timings:        load time =    6422.40 ms
llama_print_timings:      sample time =      28.16 ms /    32 runs   (    0.88 ms per token,  1136.16 tokens per second)
llama_print_timings: prompt eval time =    7203.60 ms /    21 tokens (  343.03 ms per token,     2.92 tokens per second)
llama_print_timings:        eval time =    7767.82 ms /    31 runs   (  250.57 ms per token,     3.99 tokens per second)
llama_print_timings:       total time =   15021.36 ms
Log end

q3_k-q8 up, Alpaca format seems to work well

Sign up or log in to comment