Experiment with and compare different tokenizers
Track, rank and evaluate open LLMs' CoT quality
Track, rank and evaluate open LLMs and chatbots