File size: 1,562 Bytes
8cf2761
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f06dec4
 
 
 
8cf2761
 
 
f06dec4
 
 
8cf2761
 
 
 
 
 
 
f06dec4
 
 
8cf2761
 
 
f06dec4
 
 
8cf2761
 
 
f06dec4
8cf2761
 
 
 
 
 
 
f06dec4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# LLMDataParser

**LLMDataParser** is a Python library that provides parsers for benchmark datasets used in evaluating Large Language Models (LLMs). It offers a unified interface for loading and parsing datasets like **MMLU** and **GSM8k**, simplifying dataset preparation for LLM evaluation.

## Features

- **Unified Interface**: Consistent `DatasetParser` for all datasets.
- **LLM-Agnostic**: Independent of any specific language model.
- **Easy to Use**: Simple methods and built-in Python types.
- **Extensible**: Easily add support for new datasets.

## Installation

### Option 1: Using pip

You can install the package directly using `pip`. Even with only a `pyproject.toml` file, this method works for standard installations.

1. **Clone the Repository**:

   ```bash
   git clone https://github.com/jeff52415/LLMDataParser.git
   cd LLMDataParser
   ```

2. **Install Dependencies with pip**:

   ```bash
   pip install .
   ```

### Option 2: Using Poetry

Poetry manages the virtual environment and dependencies automatically, so you don't need to create a conda environment first.

1. **Install Dependencies with Poetry**:

   ```bash
   poetry install
   ```

2. **Activate the Virtual Environment**:

   ```bash
   poetry shell
   ```

## Available Parsers

- **MMLUDatasetParser**: Parses the MMLU dataset.

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

## Contact

For questions or support, please open an issue on GitHub or contact [jeff52415@gmail.com](mailto:jeff52415@gmail.com).