Delete README.md
Browse files
README.md
DELETED
@@ -1,68 +0,0 @@
|
|
1 |
-
# Mass Evaluations
|
2 |
-
|
3 |
-
Simple benchmark tool for running predefined prompts through all checkpoints of a model.
|
4 |
-
|
5 |
-
## Usage
|
6 |
-
|
7 |
-
```bash
|
8 |
-
python benchmark.py [model_name] [options]
|
9 |
-
```
|
10 |
-
|
11 |
-
## Examples
|
12 |
-
|
13 |
-
```bash
|
14 |
-
# Benchmark all checkpoints of a model
|
15 |
-
python benchmark.py pico-decoder-tiny-dolma5M-v1
|
16 |
-
|
17 |
-
# Specify custom output directory
|
18 |
-
python benchmark.py pico-decoder-tiny-dolma5M-v1 --output my_results/
|
19 |
-
|
20 |
-
# Use custom prompts file
|
21 |
-
python benchmark.py pico-decoder-tiny-dolma5M-v1 --prompts my_prompts.json
|
22 |
-
```
|
23 |
-
|
24 |
-
## Managing Prompts
|
25 |
-
|
26 |
-
Prompts are stored in `prompts.json` as a simple array of strings:
|
27 |
-
|
28 |
-
```json
|
29 |
-
[
|
30 |
-
"Hello, how are you?",
|
31 |
-
"Complete this story: Once upon a time",
|
32 |
-
"What is the capital of France?"
|
33 |
-
]
|
34 |
-
```
|
35 |
-
|
36 |
-
### Adding New Prompts
|
37 |
-
|
38 |
-
Simply edit `prompts.json` and add new prompt strings to the array. Super simple!
|
39 |
-
|
40 |
-
## Features
|
41 |
-
|
42 |
-
- **Auto-discovery**: Finds all `step_*` checkpoints automatically
|
43 |
-
- **JSON-based prompts**: Easily customizable prompts via JSON file
|
44 |
-
- **Readable output**: Markdown reports with clear structure
|
45 |
-
- **Error handling**: Continues on failures, logs errors
|
46 |
-
- **Progress tracking**: Shows real-time progress
|
47 |
-
- **Metadata logging**: Includes generation time and parameters
|
48 |
-
|
49 |
-
## Output
|
50 |
-
|
51 |
-
Results are saved as markdown files in `results/` directory:
|
52 |
-
```
|
53 |
-
results/
|
54 |
-
βββ pico-decoder-tiny-dolma5M-v1_benchmark_20250101_120000.md
|
55 |
-
βββ pico-decoder-tiny-dolma29k-v3_benchmark_20250101_130000.md
|
56 |
-
βββ ...
|
57 |
-
```
|
58 |
-
|
59 |
-
## Predefined Prompts
|
60 |
-
|
61 |
-
1. "Hello, how are you?" (conversational)
|
62 |
-
2. "Complete this story: Once upon a time" (creative)
|
63 |
-
3. "Explain quantum physics in simple terms" (explanatory)
|
64 |
-
4. "Write a haiku about coding" (creative + structured)
|
65 |
-
5. "What is the capital of France?" (factual)
|
66 |
-
6. "The meaning of life is" (philosophical)
|
67 |
-
7. "In the year 2050," (futuristic)
|
68 |
-
8. "Python programming is" (technical)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|