CultriX commited on
Commit
9471d74
1 Parent(s): 7877ee4

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +72 -0
app.py CHANGED
@@ -339,6 +339,78 @@ def main():
339
  * Change the `gist_id` in [yall.py](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard/blob/main/yall.py#L126).
340
  * Create "New Secret" in Settings > Variables and secrets (name: "github", value: [your GitHub token](https://github.com/settings/tokens))
341
  A special thanks to [gblazex](https://huggingface.co/gblazex) for providing many evaluations.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
342
  ''')
343
 
344
 
 
339
  * Change the `gist_id` in [yall.py](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard/blob/main/yall.py#L126).
340
  * Create "New Secret" in Settings > Variables and secrets (name: "github", value: [your GitHub token](https://github.com/settings/tokens))
341
  A special thanks to [gblazex](https://huggingface.co/gblazex) for providing many evaluations.
342
+
343
+
344
+ # Bonus: Workflow for Automating Model Evaluation and Selection
345
+
346
+ ## Step 1. Export CSV Data from Another-LLM-LeaderBoards
347
+ Go to our [Another-LLM-LeaderBoards](https://leaderboards.example.com) and click the export csv data button. Save it to `/tmp/models.csv`.
348
+
349
+ ## Step 2: Examine CSV Data
350
+ Run a script for extracting model names, benchmark scores, and model page link from the CSV data.
351
+
352
+ ```python
353
+ import re
354
+ from huggingface_hub import ModelCard
355
+ import pandas as pd
356
+
357
+ # Load the CSV data
358
+ df = pd.read_csv('/tmp/models.csv')
359
+
360
+ # Sort the data by the second column (assuming the column name is 'Average')
361
+ df_sorted = df.sort_values(by='Average', ascending=False)
362
+
363
+ # Open the file in append mode
364
+ with open('configurations.txt', 'a') as file:
365
+ # Get model cards for the top 20 entries
366
+ for index, row in df_sorted.head(20).iterrows():
367
+ model_name = row['Model'].rstrip()
368
+ card = ModelCard.load(model_name)
369
+ file.write(f'Model Name: {model_name}\n')
370
+ file.write(f'Scores: {row["Average"]}\n') # Assuming 'Average' is the benchmark score
371
+ file.write(f'AGIEval: {row["AGIEval"]}\n')
372
+ file.write(f'GPT4All: {row["GPT4All"]}\n')
373
+ file.write(f'TruthfulQA: {row["TruthfulQA"]}\n')
374
+ file.write(f'Bigbench: {row["Bigbench"]}\n')
375
+ file.write(f'Model Card: {card}\n')
376
+ ```
377
+
378
+ ## Step 3: Feed the Discovered Models, Scores and Configurations to LLM-client (shell-gpt)
379
+ Run your local LLM-client by feeding it all the discovered merged models, their benchmark scores and if found the configurations used to merge them. Provide it with an instruction similar to this:
380
+
381
+ ```bash
382
+ cat /tmp/configurations2.txt | sgpt --chat config "Based on the merged models that are provided here, along with their respective benchmark achievements and the configurations used in merging them, your task is to come up with a new configuration for a new merged model that will outperform all others. In your thought process, argue and reflect on your own choices to improve your thinking process and outcome"
383
+ ```
384
+
385
+ ## Step 4: (Optional) Reflect on Initial Configuration Suggested by Chat-GPT
386
+ If you wanted to get particularly naughty, you could add a step like this where you make Chat-GPT rethink and reflect on the configuration it initially comes up with based on the information you gave it.
387
+
388
+ ```bash
389
+ for i in $(seq 1 3); do echo "$i" && sgpt --chat config "Repeat the process from before and again reflect and improve on your suggested configuration"; sleep 20; done
390
+ ```
391
+
392
+ ## Step 5: Wait for Chat-GPT to give you a LeaderBoard-topping merge configuration
393
+ Wait for Chat-GPT to provide a new merge configuration.
394
+
395
+ ## Step 6: Enter the Configuration in Automergekit NoteBook
396
+ Fire up your automergekit NoteBook and enter in the configuration that was just so generously provided to you by Chat-GPT.
397
+
398
+ ## Step 7: Evaluate the New Merge using auto-llm-eval notebook
399
+ Fire up your auto-llm-eval notebook to see if the merge that Chat-GPT came up with is actually making any sense and performing well.
400
+
401
+ ## Step 8: Repeat the Process
402
+ Repeat this process for a few times every day, learning from each new model created.
403
+
404
+ ## Step 9: Rank the New Number One Model
405
+ Rank the new number one model and top your own LeaderBoard: (Model: CultriX/MergeCeption-7B-v3)
406
+ ![image.png](https://cdn-uploads.huggingface.co/production/uploads/6495d5a915d8ef6f01bc75eb/mFV3Ou469fk6ivj1XrD9d.png)
407
+ ```
408
+
409
+ ## Step 10: Automate the Process with Cronjob
410
+ Create a cronjob that automates this process 5 times every day, only to then learn from the models that it has created in order to create even better ones and I'm telling you that you better prepare yourself for some non-neglectable increases in benchmark scores for the near future.
411
+
412
+ Cheers,
413
+ CultriX
414
  ''')
415
 
416