ssyok's picture
Update README.md
68f7497 verified
|
raw
history blame
910 Bytes
metadata
license: mit
pipeline_tag: text-generation
tags:
  - ONNX
  - DML
  - ONNXRuntime
  - phi3
  - nlp
  - conversational
  - custom_code
inference: false
language:
  - en

EmbeddedLLM/Phi-3-mini-4k-instruct-onnx-cpu-int4-rtn-block-32

Performance Metrics

CPU-INT4-RTN-BLOCK-32

We measured the performance of CPU-INT4-RTN-BLOCK-32 on AMD Ryzen 9 7940HS /w Radeon 78

Prompt Length Generation Length Average Throughput (tps)
128 128 -
128 256 -
128 512 -
128 1024 -
256 128 -
256 256 -
256 512 -
256 1024 -
512 128 -
512 256 -
512 512 -
512 1024 -
1024 128 -
1024 256 -
1024 512 -
1024 1024 -