Anyone else seeing similar behavior? I especially like the start "Death, ..." plus some gobblygook.

#12
by BigDeeper - opened

llm_load_tensors: ggml ctx size = 0.38 MiB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required = 102.92 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors: VRAM used: 36497.55 MiB
....................................................................................................
llama_new_context_with_model: n_ctx = 2048
llama_new_context_with_model: freq_base = 1000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: VRAM kv self = 256.00 MB
llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB
llama_build_graph: non-view tensors processed: 1124/1124
llama_new_context_with_model: compute buffer total size = 187.22 MiB
llama_new_context_with_model: VRAM scratch buffer: 184.04 MiB
llama_new_context_with_model: total VRAM used: 36937.59 MiB (model: 36497.55 MiB, context: 440.04 MiB)

system_info: n_threads = 16 / 32 | AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
sampling:
repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.700
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temp
generate: n_ctx = 2048, n_batch = 512, n_predict = -1, n_keep = 0

Write a poem about Fibonacci sequence.

#################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################### Death,
,
gepubliceerdaders myself hope
Mack.... tw_{
[end of text]

llama_print_timings: load time = 16705.83 ms
llama_print_timings: sample time = 194.93 ms / 521 runs ( 0.37 ms per token, 2672.78 tokens per second)
llama_print_timings: prompt eval time = 516.70 ms / 12 tokens ( 43.06 ms per token, 23.22 tokens per second)
llama_print_timings: eval time = 30994.44 ms / 520 runs ( 59.60 ms per token, 16.78 tokens per second)
llama_print_timings: total time = 32847.43 ms
Log end

BigDeeper changed discussion title from Anyone else seeing similar behavior? I especially the start "Death, ..." plus some gobblygook. to Anyone else seeing similar behavior? I especially like the start "Death, ..." plus some gobblygook.

@BigDeeper that is pretty wierd but first of all you seem like you want a instruction/chatting model.
The instruction/chatting version of mixtral is this one
https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF

Sign up or log in to comment