dahara1 commited on
Commit
acb460d
·
verified ·
1 Parent(s): a86022d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -2
README.md CHANGED
@@ -61,15 +61,25 @@ My test prompt execution time: 2173.36 seconds
61
  私のテストプロンプトの実行時間: 3787.47秒
62
  My test prompt execution time: 3787.47 seconds
63
 
 
64
 
 
 
 
 
 
 
 
65
 
 
 
66
 
67
 
68
  なお、温度0でも単独でモデルを実行した際と微妙な差異が出るケースを確認してますので再現性が最重要な場合は注意してください
69
  I have confirmed cases where there are slight differences when running the model alone even at 0 temperature, so please be careful if reproducibility is paramount.
70
 
71
- とはいえ、Q4を使った場合でも語尾が多少異なる程度で結論がかわるようなレベルの違いはありませんでした
72
- However, even when Q4 was used, the endings were slightly different, but there was no difference to the extent that the conclusion was changed.
73
 
74
  クライアントスクリプトの例は[dahara1/Qwen2.5-3B-Instruct-gguf-japanese-imatrix-128K](https://huggingface.co/dahara1/Qwen2.5-3B-Instruct-gguf-japanese-imatrix-128K)をご覧ください
75
  See [dahara1/Qwen2.5-3B-Instruct-gguf-japanese-imatrix-128K](https://huggingface.co/dahara1/Qwen2.5-3B-Instruct-gguf-japanese-imatrix-128K) for cliant example.
 
61
  私のテストプロンプトの実行時間: 3787.47秒
62
  My test prompt execution time: 3787.47 seconds
63
 
64
+ ### 4060ti(16GB)現在の最速 current max speed
65
 
66
+ ```
67
+ CUDA_VISIBLE_DEVICES=0 ./llama.cpp/llama.cpp/build/bin/llama-server \
68
+ -m ./llama.cpp/qwen/32B/Qwen2.5-32B-Instruct-Q8_0-f16.gguf \
69
+ -md ./llama.cpp/qwen/Qwen2.5-0.5B-Instruct-IQ3_XXS.gguf \
70
+ -ngl 25 -ngld 99 -e --temp 0 -fa -c 1800 \
71
+ --draft-max 16 --draft-min 5
72
+ ```
73
 
74
+ 私のテストプロンプトの実行時間: 2130.14秒
75
+ My test prompt execution time: 2130.14 seconds
76
 
77
 
78
  なお、温度0でも単独でモデルを実行した際と微妙な差異が出るケースを確認してますので再現性が最重要な場合は注意してください
79
  I have confirmed cases where there are slight differences when running the model alone even at 0 temperature, so please be careful if reproducibility is paramount.
80
 
81
+ とはいえ、IQ3を使った場合でも語尾が多少異なる程度で結論がかわるようなレベルの違いはありませんでした
82
+ However, even when IQ3 was used, the endings were slightly different, but there was no difference to the extent that the conclusion was changed.
83
 
84
  クライアントスクリプトの例は[dahara1/Qwen2.5-3B-Instruct-gguf-japanese-imatrix-128K](https://huggingface.co/dahara1/Qwen2.5-3B-Instruct-gguf-japanese-imatrix-128K)をご覧ください
85
  See [dahara1/Qwen2.5-3B-Instruct-gguf-japanese-imatrix-128K](https://huggingface.co/dahara1/Qwen2.5-3B-Instruct-gguf-japanese-imatrix-128K) for cliant example.