legraphista commited on
Commit
342327d
1 Parent(s): 08546a5

Upload imatrix.log with huggingface_hub

Browse files
Files changed (1) hide show
  1. imatrix.log +167 -0
imatrix.log ADDED
@@ -0,0 +1,167 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ main: build = 3003 (d298382a)
2
+ main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
3
+ main: seed = 1716752123
4
+ llama_model_loader: loaded meta data with 27 key-value pairs and 197 tensors from Phi-3-mini-128k-instruct-IMat-GGUF/Phi-3-mini-128k-instruct.gguf (version GGUF V3 (latest))
5
+ llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
6
+ llama_model_loader: - kv 0: general.architecture str = phi3
7
+ llama_model_loader: - kv 1: general.name str = Phi3
8
+ llama_model_loader: - kv 2: phi3.context_length u32 = 131072
9
+ llama_model_loader: - kv 3: phi3.rope.scaling.original_context_length u32 = 4096
10
+ llama_model_loader: - kv 4: phi3.embedding_length u32 = 3072
11
+ llama_model_loader: - kv 5: phi3.feed_forward_length u32 = 8192
12
+ llama_model_loader: - kv 6: phi3.block_count u32 = 32
13
+ llama_model_loader: - kv 7: phi3.attention.head_count u32 = 32
14
+ llama_model_loader: - kv 8: phi3.attention.head_count_kv u32 = 32
15
+ llama_model_loader: - kv 9: phi3.attention.layer_norm_rms_epsilon f32 = 0.000010
16
+ llama_model_loader: - kv 10: phi3.rope.dimension_count u32 = 96
17
+ llama_model_loader: - kv 11: phi3.rope.freq_base f32 = 10000.000000
18
+ llama_model_loader: - kv 12: general.file_type u32 = 0
19
+ llama_model_loader: - kv 13: phi3.rope.scaling.attn_factor f32 = 1.190238
20
+ llama_model_loader: - kv 14: tokenizer.ggml.model str = llama
21
+ llama_model_loader: - kv 15: tokenizer.ggml.pre str = default
22
+ llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,32064] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
23
+ llama_model_loader: - kv 17: tokenizer.ggml.scores arr[f32,32064] = [-1000.000000, -1000.000000, -1000.00...
24
+ llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,32064] = [3, 3, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
25
+ llama_model_loader: - kv 19: tokenizer.ggml.bos_token_id u32 = 1
26
+ llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 32000
27
+ llama_model_loader: - kv 21: tokenizer.ggml.unknown_token_id u32 = 0
28
+ llama_model_loader: - kv 22: tokenizer.ggml.padding_token_id u32 = 32000
29
+ llama_model_loader: - kv 23: tokenizer.ggml.add_bos_token bool = true
30
+ llama_model_loader: - kv 24: tokenizer.ggml.add_eos_token bool = false
31
+ llama_model_loader: - kv 25: tokenizer.chat_template str = {{ bos_token }}{% for message in mess...
32
+ llama_model_loader: - kv 26: general.quantization_version u32 = 2
33
+ llama_model_loader: - type f32: 197 tensors
34
+ llm_load_vocab: special tokens definition check successful ( 323/32064 ).
35
+ llm_load_print_meta: format = GGUF V3 (latest)
36
+ llm_load_print_meta: arch = phi3
37
+ llm_load_print_meta: vocab type = SPM
38
+ llm_load_print_meta: n_vocab = 32064
39
+ llm_load_print_meta: n_merges = 0
40
+ llm_load_print_meta: n_ctx_train = 131072
41
+ llm_load_print_meta: n_embd = 3072
42
+ llm_load_print_meta: n_head = 32
43
+ llm_load_print_meta: n_head_kv = 32
44
+ llm_load_print_meta: n_layer = 32
45
+ llm_load_print_meta: n_rot = 96
46
+ llm_load_print_meta: n_embd_head_k = 96
47
+ llm_load_print_meta: n_embd_head_v = 96
48
+ llm_load_print_meta: n_gqa = 1
49
+ llm_load_print_meta: n_embd_k_gqa = 3072
50
+ llm_load_print_meta: n_embd_v_gqa = 3072
51
+ llm_load_print_meta: f_norm_eps = 0.0e+00
52
+ llm_load_print_meta: f_norm_rms_eps = 1.0e-05
53
+ llm_load_print_meta: f_clamp_kqv = 0.0e+00
54
+ llm_load_print_meta: f_max_alibi_bias = 0.0e+00
55
+ llm_load_print_meta: f_logit_scale = 0.0e+00
56
+ llm_load_print_meta: n_ff = 8192
57
+ llm_load_print_meta: n_expert = 0
58
+ llm_load_print_meta: n_expert_used = 0
59
+ llm_load_print_meta: causal attn = 1
60
+ llm_load_print_meta: pooling type = 0
61
+ llm_load_print_meta: rope type = 2
62
+ llm_load_print_meta: rope scaling = linear
63
+ llm_load_print_meta: freq_base_train = 10000.0
64
+ llm_load_print_meta: freq_scale_train = 1
65
+ llm_load_print_meta: n_yarn_orig_ctx = 4096
66
+ llm_load_print_meta: rope_finetuned = unknown
67
+ llm_load_print_meta: ssm_d_conv = 0
68
+ llm_load_print_meta: ssm_d_inner = 0
69
+ llm_load_print_meta: ssm_d_state = 0
70
+ llm_load_print_meta: ssm_dt_rank = 0
71
+ llm_load_print_meta: model type = 3B
72
+ llm_load_print_meta: model ftype = all F32
73
+ llm_load_print_meta: model params = 3.82 B
74
+ llm_load_print_meta: model size = 14.23 GiB (32.00 BPW)
75
+ llm_load_print_meta: general.name = Phi3
76
+ llm_load_print_meta: BOS token = 1 '<s>'
77
+ llm_load_print_meta: EOS token = 32000 '<|endoftext|>'
78
+ llm_load_print_meta: UNK token = 0 '<unk>'
79
+ llm_load_print_meta: PAD token = 32000 '<|endoftext|>'
80
+ llm_load_print_meta: LF token = 13 '<0x0A>'
81
+ llm_load_print_meta: EOT token = 32007 '<|end|>'
82
+ ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
83
+ ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
84
+ ggml_cuda_init: found 1 CUDA devices:
85
+ Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
86
+ llm_load_tensors: ggml ctx size = 0.22 MiB
87
+ llm_load_tensors: offloading 32 repeating layers to GPU
88
+ llm_load_tensors: offloading non-repeating layers to GPU
89
+ llm_load_tensors: offloaded 33/33 layers to GPU
90
+ llm_load_tensors: CPU buffer size = 375.75 MiB
91
+ llm_load_tensors: CUDA0 buffer size = 14200.53 MiB
92
+ ....................................................................................
93
+ llama_new_context_with_model: n_ctx = 512
94
+ llama_new_context_with_model: n_batch = 512
95
+ llama_new_context_with_model: n_ubatch = 512
96
+ llama_new_context_with_model: flash_attn = 0
97
+ llama_new_context_with_model: freq_base = 10000.0
98
+ llama_new_context_with_model: freq_scale = 1
99
+ llama_kv_cache_init: CUDA0 KV buffer size = 192.00 MiB
100
+ llama_new_context_with_model: KV self size = 192.00 MiB, K (f16): 96.00 MiB, V (f16): 96.00 MiB
101
+ llama_new_context_with_model: CUDA_Host output buffer size = 0.12 MiB
102
+ llama_new_context_with_model: CUDA0 compute buffer size = 83.00 MiB
103
+ llama_new_context_with_model: CUDA_Host compute buffer size = 7.01 MiB
104
+ llama_new_context_with_model: graph nodes = 1286
105
+ llama_new_context_with_model: graph splits = 2
106
+
107
+ system_info: n_threads = 25 / 32 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
108
+ compute_imatrix: tokenizing the input ..
109
+ compute_imatrix: tokenization took 133.64 ms
110
+ compute_imatrix: computing over 234 chunks with batch_size 512
111
+ compute_imatrix: 0.32 seconds per pass - ETA 1.23 minutes
112
+ [1]6.0727,[2]4.4610,[3]4.4629,[4]4.9370,[5]5.3244,[6]5.4170,[7]4.8496,[8]5.2827,[9]5.5966,
113
+ save_imatrix: stored collected data after 10 chunks in Phi-3-mini-128k-instruct-IMat-GGUF/imatrix.dat
114
+ [10]5.9047,[11]5.8828,[12]5.4226,[13]5.5632,[14]5.4485,[15]5.8942,[16]5.9986,[17]6.2966,[18]6.4616,[19]6.6562,
115
+ save_imatrix: stored collected data after 20 chunks in Phi-3-mini-128k-instruct-IMat-GGUF/imatrix.dat
116
+ [20]6.8072,[21]6.8799,[22]7.1044,[23]6.8300,[24]6.6506,[25]6.6546,[26]6.2990,[27]6.0381,[28]5.7430,[29]5.7002,
117
+ save_imatrix: stored collected data after 30 chunks in Phi-3-mini-128k-instruct-IMat-GGUF/imatrix.dat
118
+ [30]5.8055,[31]5.8773,[32]5.9258,[33]5.9168,[34]5.9704,[35]5.9718,[36]5.7531,[37]5.6171,[38]5.5492,[39]5.5208,
119
+ save_imatrix: stored collected data after 40 chunks in Phi-3-mini-128k-instruct-IMat-GGUF/imatrix.dat
120
+ [40]5.4923,[41]5.4249,[42]5.4651,[43]5.5062,[44]5.5516,[45]5.6231,[46]5.7071,[47]5.7971,[48]5.9323,[49]6.0424,
121
+ save_imatrix: stored collected data after 50 chunks in Phi-3-mini-128k-instruct-IMat-GGUF/imatrix.dat
122
+ [50]6.1599,[51]6.2644,[52]6.3634,[53]6.3316,[54]6.2462,[55]6.1791,[56]6.2703,[57]6.3211,[58]6.3341,[59]6.3940,
123
+ save_imatrix: stored collected data after 60 chunks in Phi-3-mini-128k-instruct-IMat-GGUF/imatrix.dat
124
+ [60]6.4760,[61]6.5040,[62]6.5788,[63]6.6285,[64]6.7074,[65]6.7470,[66]6.7897,[67]6.8378,[68]6.8799,[69]6.9477,
125
+ save_imatrix: stored collected data after 70 chunks in Phi-3-mini-128k-instruct-IMat-GGUF/imatrix.dat
126
+ [70]6.9901,[71]7.0393,[72]7.0741,[73]7.0324,[74]6.9788,[75]6.9180,[76]6.8588,[77]6.8484,[78]6.7958,[79]6.7419,
127
+ save_imatrix: stored collected data after 80 chunks in Phi-3-mini-128k-instruct-IMat-GGUF/imatrix.dat
128
+ [80]6.6785,[81]6.6562,[82]6.6094,[83]6.5719,[84]6.5868,[85]6.6086,[86]6.6214,[87]6.6579,[88]6.6725,[89]6.6531,
129
+ save_imatrix: stored collected data after 90 chunks in Phi-3-mini-128k-instruct-IMat-GGUF/imatrix.dat
130
+ [90]6.6234,[91]6.6468,[92]6.6571,[93]6.6760,[94]6.6899,[95]6.7034,[96]6.7345,[97]6.7548,[98]6.7283,[99]6.6884,
131
+ save_imatrix: stored collected data after 100 chunks in Phi-3-mini-128k-instruct-IMat-GGUF/imatrix.dat
132
+ [100]6.7032,[101]6.7244,[102]6.7139,[103]6.6792,[104]6.6236,[105]6.6084,[106]6.6134,[107]6.6201,[108]6.5983,[109]6.5869,
133
+ save_imatrix: stored collected data after 110 chunks in Phi-3-mini-128k-instruct-IMat-GGUF/imatrix.dat
134
+ [110]6.5668,[111]6.5736,[112]6.5833,[113]6.5814,[114]6.5906,[115]6.5858,[116]6.5842,[117]6.5771,[118]6.5829,[119]6.5621,
135
+ save_imatrix: stored collected data after 120 chunks in Phi-3-mini-128k-instruct-IMat-GGUF/imatrix.dat
136
+ [120]6.5647,[121]6.5506,[122]6.5262,[123]6.5435,[124]6.5348,[125]6.5383,[126]6.5255,[127]6.5257,[128]6.5349,[129]6.5169,
137
+ save_imatrix: stored collected data after 130 chunks in Phi-3-mini-128k-instruct-IMat-GGUF/imatrix.dat
138
+ [130]6.4927,[131]6.4831,[132]6.4802,[133]6.4305,[134]6.4382,[135]6.4156,[136]6.3964,[137]6.3724,[138]6.3466,[139]6.3174,
139
+ save_imatrix: stored collected data after 140 chunks in Phi-3-mini-128k-instruct-IMat-GGUF/imatrix.dat
140
+ [140]6.2951,[141]6.2763,[142]6.2547,[143]6.2554,[144]6.2527,[145]6.2348,[146]6.2123,[147]6.2092,[148]6.1993,[149]6.1886,
141
+ save_imatrix: stored collected data after 150 chunks in Phi-3-mini-128k-instruct-IMat-GGUF/imatrix.dat
142
+ [150]6.1821,[151]6.1677,[152]6.1637,[153]6.1538,[154]6.1412,[155]6.1651,[156]6.1401,[157]6.1342,[158]6.1511,[159]6.1461,
143
+ save_imatrix: stored collected data after 160 chunks in Phi-3-mini-128k-instruct-IMat-GGUF/imatrix.dat
144
+ [160]6.1502,[161]6.1632,[162]6.1662,[163]6.1865,[164]6.1987,[165]6.2188,[166]6.2278,[167]6.2257,[168]6.2270,[169]6.2340,
145
+ save_imatrix: stored collected data after 170 chunks in Phi-3-mini-128k-instruct-IMat-GGUF/imatrix.dat
146
+ [170]6.2467,[171]6.2367,[172]6.2370,[173]6.2540,[174]6.2566,[175]6.2740,[176]6.2834,[177]6.2939,[178]6.3006,[179]6.3317,
147
+ save_imatrix: stored collected data after 180 chunks in Phi-3-mini-128k-instruct-IMat-GGUF/imatrix.dat
148
+ [180]6.3424,[181]6.3923,[182]6.4110,[183]6.4381,[184]6.4432,[185]6.4486,[186]6.4541,[187]6.4579,[188]6.4483,[189]6.4521,
149
+ save_imatrix: stored collected data after 190 chunks in Phi-3-mini-128k-instruct-IMat-GGUF/imatrix.dat
150
+ [190]6.4586,[191]6.4720,[192]6.4769,[193]6.5062,[194]6.4950,[195]6.4664,[196]6.5076,[197]6.5460,[198]6.5764,[199]6.6271,
151
+ save_imatrix: stored collected data after 200 chunks in Phi-3-mini-128k-instruct-IMat-GGUF/imatrix.dat
152
+ [200]6.6740,[201]6.6813,[202]6.6861,[203]6.6440,[204]6.6408,[205]6.6470,[206]6.6681,[207]6.6649,[208]6.6679,[209]6.6688,
153
+ save_imatrix: stored collected data after 210 chunks in Phi-3-mini-128k-instruct-IMat-GGUF/imatrix.dat
154
+ [210]6.6772,[211]6.6909,[212]6.6907,[213]6.6877,[214]6.6944,[215]6.7130,[216]6.7305,[217]6.7338,[218]6.7346,[219]6.7292,
155
+ save_imatrix: stored collected data after 220 chunks in Phi-3-mini-128k-instruct-IMat-GGUF/imatrix.dat
156
+ [220]6.7205,[221]6.7193,[222]6.7178,[223]6.7324,[224]6.7153,[225]6.7209,[226]6.7047,[227]6.7403,[228]6.7806,[229]6.8252,
157
+ save_imatrix: stored collected data after 230 chunks in Phi-3-mini-128k-instruct-IMat-GGUF/imatrix.dat
158
+ [230]6.8657,[231]6.8871,[232]6.8663,[233]6.8447,[234]6.8193,
159
+ save_imatrix: stored collected data after 234 chunks in Phi-3-mini-128k-instruct-IMat-GGUF/imatrix.dat
160
+
161
+ llama_print_timings: load time = 2127.07 ms
162
+ llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
163
+ llama_print_timings: prompt eval time = 52676.50 ms / 119808 tokens ( 0.44 ms per token, 2274.41 tokens per second)
164
+ llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
165
+ llama_print_timings: total time = 55109.70 ms / 119809 tokens
166
+
167
+ Final estimate: PPL = 6.8193 +/- 0.07007