docs/colosseum_bottom.md · ml-energy/leaderboard at main

Technical details

We allow models to generate only up to 512 new tokens. Due to this, some responses may be cut off in the middle.
Tokens are sampled from the model output with temperature 1.0, repetition_penalty 1.0, top_k 50, and top_p 0.95.
Large models (>= 30B) run on two NVIDIA A40 GPUs with tensor parallelism, whereas other models run on one NVIDIA A40 GPU. We directly measure the energy consumption of these GPUs.

Contact

Please direct general questions and issues related to the Colosseum to our GitHub repository's discussion board. You can find the ML.ENERGY initiative members in our homepage. If you need direct communication, please email admins@ml.energy.