Respair's picture
Upload folder using huggingface_hub
b386992 verified

Evaluation of Large Language Models with the NeMo 2.0

This directory contains Jupyter Notebook tutorials using the NeMo Framework for evaluating large language models (LLMs):

  1. mmlu.ipynb

    • Provides an overview of model deployment and available endpoints.
    • Demonstrates how to run MMLU evaluations for both completions and chat endpoints to assess model proficiency across diverse subjects.
  2. simple-evals.ipynb

    • Shows how to enable additional evaluation frameworks with the evaluation suite.
    • Uses NVIDIA Evals Factory Simple-Evals to demonstrate how to run evaluations for the HumanEval benchmark.
  3. wikitext.ipynb

    • Illustrates running evaluation tasks without predefined configurations.
    • Uses the WikiText benchmark as an example to define and execute a custom evaluation job.