### Technical details - We allow models to generate only up to 512 new tokens. Due to this, some responses may be cut off in the middle. - Tokens are sampled from the model output with `temperature` 1.0, `repetition_penalty` 1.0, `top_k` 50, and `top_p` 0.95. - Large models (>= 30B) run on two NVIDIA A40 GPUs with tensor parallelism, whereas other models run on one NVIDIA A40 GPU. We directly measure the energy consumption of these GPUs. ### Contact Please direct general questions and issues related to the Colosseum to our GitHub repository's [discussion board](https://github.com/ml-energy/leaderboard/discussions). You can find the ML.ENERGY initiative members in [our homepage](https://ml.energy#members). If you need direct communication, please email admins@ml.energy.