FINAL BENCHMARKING

  • Time to First Token (TTFT): 0.001s
  • Time Per Output Token (TPOT): 33.26ms/token
  • Throughput (token/s): 30.88token/s
  • Average Token Latency (ms/token): 33.33ms/token
  • Total Generation Time: 13.966s
  • Input Tokenization Time: 0.011s
  • Input Tokens: 1909
  • Output Tokens: 420
  • Total Tokens: 2329
  • Memory Usage (GPU): 3.38GB

Uploaded model

  • Developed by: vietphuon
  • License: apache-2.0
  • Finetuned from model : unsloth/Llama-3.2-1B-Instruct-bnb-4bit

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month
158
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for vietphuon/Llama-3.2-1B-Instruct-alpaca-then-quizgen-16bit

Collection including vietphuon/Llama-3.2-1B-Instruct-alpaca-then-quizgen-16bit