Mikiko Bazeley
commited on
Commit
·
eeb7bb5
1
Parent(s):
3be47f7
Copied over LLM-as-judge starter project
Browse files- .env.template +6 -0
- README.md +125 -14
- home.py +85 -0
- img/ash.png +0 -0
- img/bulbasaur.png +0 -0
- img/charmander.png +0 -0
- img/fireworksai_logo.png +0 -0
- img/home_page_1.png +0 -0
- img/home_page_2.png +0 -0
- img/page_1_a.png +0 -0
- img/page_1_b.png +0 -0
- img/page_1_c.png +0 -0
- img/page_1_empty.png +0 -0
- img/page_2_a.png +0 -0
- img/page_2_b.png +0 -0
- img/page_2_c.png +0 -0
- img/page_2_empty.png +0 -0
- img/squirtel.png +0 -0
- pages/1_Comparing_LLMs.py +185 -0
- pages/2_Parameter_Exploration_for_LLMs.py +293 -0
- requirements.txt +246 -0
.env.template
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Fireworks AI API key
|
2 |
+
FIREWORKS_API_KEY=your_fireworks_api_key_here
|
3 |
+
|
4 |
+
|
5 |
+
# Debug mode
|
6 |
+
DEBUG=False
|
README.md
CHANGED
@@ -1,14 +1,125 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
## Project: Fireworks Model Comparison App
|
2 |
+
|
3 |
+
### Overview
|
4 |
+
The **Fireworks Model Comparison App** is an interactive tool built using **Streamlit** that allows users to compare various Large Language Models (LLMs) hosted on **Fireworks AI**. Users can adjust key model parameters, provide custom prompts, and generate model outputs to compare their behavior and responses. Additionally, an LLM-as-a-Judge feature is available to evaluate the generated outputs and provide feedback on their quality.
|
5 |
+
|
6 |
+
|
7 |
+
### Objectives
|
8 |
+
- **Compare Models**: Select different models from the Fireworks platform and compare their outputs based on a shared prompt.
|
9 |
+
- **Modify Parameters**: Fine-tune parameters such as **Max Tokens**, **Temperature**, **Top-p**, and **Top-k** to observe how they influence model behavior.
|
10 |
+
- **Evaluate Using LLM-as-a-Judge**: After generating responses, use a separate model to act as a judge and evaluate the outputs from the selected models.
|
11 |
+
|
12 |
+
|
13 |
+
![Home Page Screenshot](img/home_page_1.png)
|
14 |
+
![Home Page Screenshot](img/home_page_2.png)
|
15 |
+
|
16 |
+
|
17 |
+
### Features
|
18 |
+
- **Streamlit UI**: A simple and intuitive interface where users can select models, input prompts, and adjust model parameters.
|
19 |
+
- **LLM Comparison**: Select up to three different models, run a query with the same prompt, and view side-by-side responses.
|
20 |
+
- **Parameter Exploration**: Explore and modify different parameters such as Max Tokens, Temperature, Top-p, and more to see how they affect the model's response.
|
21 |
+
- **LLM-as-a-Judge**: Let another LLM compare the generated responses from the models and provide a comparison.
|
22 |
+
|
23 |
+
### App Structure
|
24 |
+
The app consists of two main pages:
|
25 |
+
1. **Comparing LLMs**:
|
26 |
+
- Compare the outputs of three selected LLMs from Fireworks AI by providing a prompt.
|
27 |
+
- View the responses side-by-side for easy comparison.
|
28 |
+
- A selected LLM acts as a judge to evaluate the generated responses.
|
29 |
+
|
30 |
+
|
31 |
+
![Home Page Screenshot](img/page_1_empty.png)
|
32 |
+
![Home Page Screenshot](img/page_1_a.png)
|
33 |
+
![Home Page Screenshot](img/page_1_b.png)
|
34 |
+
![Home Page Screenshot](img/page_1_c.png)
|
35 |
+
|
36 |
+
|
37 |
+
2. **Parameter Exploration**:
|
38 |
+
- Modify various parameters for the LLMs (e.g., Max Tokens, Temperature, Top-p) and observe how they affect the outputs.
|
39 |
+
- Compare three different outputs generated with varying parameter configurations.
|
40 |
+
- Use LLM-as-a-Judge to provide a final evaluation of the outputs.
|
41 |
+
|
42 |
+
![Home Page Screenshot](img/page_2_empty.png)
|
43 |
+
![Home Page Screenshot](img/page_2_a.png)
|
44 |
+
![Home Page Screenshot](img/page_2_b.png)
|
45 |
+
![Home Page Screenshot](img/page_2_c.png)
|
46 |
+
|
47 |
+
### Setup and Installation
|
48 |
+
|
49 |
+
#### Prerequisites
|
50 |
+
- **Python 3.x** installed on your machine.
|
51 |
+
- A **Fireworks AI** API key, which you can obtain by signing up at [Fireworks AI](https://fireworks.ai).
|
52 |
+
- Install **Streamlit** and the **Fireworks Python Client**.
|
53 |
+
|
54 |
+
#### Step-by-Step Setup
|
55 |
+
##### 1. Clone the Repository:
|
56 |
+
First, clone the repository from GitHub:
|
57 |
+
|
58 |
+
```bash
|
59 |
+
git clone https://github.com/fw-ai/examples.git
|
60 |
+
```
|
61 |
+
|
62 |
+
##### 2. Navigate to the Specific Project Sub-directory:
|
63 |
+
After cloning the repository, navigate to the `project_llm-as-a-judge-streamlit-dashboard` sub-directory:
|
64 |
+
|
65 |
+
```bash
|
66 |
+
cd learn/inference/project_llm-as-a-judge-streamlit-dashboard
|
67 |
+
```
|
68 |
+
|
69 |
+
##### 3. Set up a Virtual Environment (Optional but Recommended):
|
70 |
+
Create and activate a Python virtual environment:
|
71 |
+
|
72 |
+
```bash
|
73 |
+
python3 -m venv venv
|
74 |
+
source venv/bin/activate # On macOS/Linux
|
75 |
+
.\venv\Scripts\activate # On Windows
|
76 |
+
```
|
77 |
+
|
78 |
+
##### 4. Install Required Dependencies:
|
79 |
+
Install the necessary Python dependencies using `pip3`:
|
80 |
+
|
81 |
+
```bash
|
82 |
+
pip3 install -r requirements.txt
|
83 |
+
```
|
84 |
+
|
85 |
+
##### 5. Configure the `.env` File:
|
86 |
+
Copy the `.env.template` file and rename it to `.env` in the same project directory:
|
87 |
+
|
88 |
+
```bash
|
89 |
+
mkdir env/
|
90 |
+
cp .env.template env/.env
|
91 |
+
```
|
92 |
+
|
93 |
+
Open the `.env` file and add your **FIREWORKS_API_KEY**:
|
94 |
+
|
95 |
+
```bash
|
96 |
+
FIREWORKS_API_KEY=<your_fireworks_api_key>
|
97 |
+
```
|
98 |
+
|
99 |
+
##### 6. Run the Streamlit App:
|
100 |
+
Finally, run the Streamlit app:
|
101 |
+
|
102 |
+
```bash
|
103 |
+
streamlit run home.py
|
104 |
+
```
|
105 |
+
|
106 |
+
|
107 |
+
##### 7. **Explore the app**:
|
108 |
+
- Open the app in your browser via the URL provided by Streamlit (typically `http://localhost:8501`).
|
109 |
+
- Navigate between the pages to compare models and adjust parameters.
|
110 |
+
|
111 |
+
### Example Prompts
|
112 |
+
Here are some example prompts you can try in the app:
|
113 |
+
- **Prompt 1**: "Describe the future of AI in 500 words."
|
114 |
+
- **Prompt 2**: "Write a short story about a time traveler who visits ancient Rome."
|
115 |
+
- **Prompt 3**: "Explain quantum computing in simple terms."
|
116 |
+
- **Prompt 4**: "Generate a recipe for a healthy vegan dinner."
|
117 |
+
|
118 |
+
### Fireworks API Documentation
|
119 |
+
To learn more about how to query models and interact with the Fireworks API, visit the [Fireworks API Documentation](https://docs.fireworks.ai/api-reference/post-chatcompletions).
|
120 |
+
|
121 |
+
### Contributing
|
122 |
+
We welcome contributions to improve this app! To contribute, fork the repository, make your changes, and submit a pull request.
|
123 |
+
|
124 |
+
### License
|
125 |
+
This project is licensed under the MIT License.
|
home.py
ADDED
@@ -0,0 +1,85 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import streamlit as st
|
2 |
+
from PIL import Image
|
3 |
+
|
4 |
+
# Load images
|
5 |
+
logo_image = Image.open("img/fireworksai_logo.png")
|
6 |
+
bulbasaur_image = Image.open("img/bulbasaur.png")
|
7 |
+
charmander_image = Image.open("img/charmander.png")
|
8 |
+
squirtel_image = Image.open("img/squirtel.png")
|
9 |
+
ash_image = Image.open("img/ash.png")
|
10 |
+
|
11 |
+
# Set page configuration
|
12 |
+
st.set_page_config(page_title="Fireworks Model Comparison App", page_icon="🎇")
|
13 |
+
|
14 |
+
# Fireworks Logo at the top
|
15 |
+
st.image(logo_image)
|
16 |
+
|
17 |
+
# Home page title and description
|
18 |
+
st.title("Fireworks Model Comparison App")
|
19 |
+
|
20 |
+
# Introduction with Pokémon image (Bulbasaur)
|
21 |
+
st.markdown("""
|
22 |
+
### Welcome to the Fireworks Model Comparison App!""")
|
23 |
+
|
24 |
+
st.image(ash_image, width=100)
|
25 |
+
|
26 |
+
st.markdown(""" This app allows you to interact with and compare various Large Language Models (LLMs) hosted on **Fireworks AI**. You can select from a range of models, adjust key model parameters, and run comparisons between their outputs. The app also enables you to evaluate results using an **LLM-as-a-judge** to provide an unbiased comparison of responses.""")
|
27 |
+
|
28 |
+
# API Documentation Link
|
29 |
+
st.markdown("""
|
30 |
+
[Explore Fireworks API Documentation](https://docs.fireworks.ai/api-reference/post-chatcompletions)
|
31 |
+
""")
|
32 |
+
|
33 |
+
# Objectives of the App with Pokémon image (Charmander)
|
34 |
+
|
35 |
+
st.markdown("""
|
36 |
+
---
|
37 |
+
### Objectives of the App:
|
38 |
+
- **Compare Different Models**: Select models from Fireworks AI’s hosted collection and compare their outputs.
|
39 |
+
- **Modify Parameters**: Adjust settings like **Max Tokens**, **Temperature**, and **Sampling** methods to explore how different configurations affect outputs.
|
40 |
+
- **Evaluate Using LLM-as-a-Judge**: Generate responses and use another LLM to evaluate and provide a comparison.
|
41 |
+
- **Simple Interface**: The app uses **Streamlit**, making it easy to use, even for those without coding experience.
|
42 |
+
""")
|
43 |
+
|
44 |
+
# How to use the app with Pokémon image (Squirtle)
|
45 |
+
st.image(squirtel_image, width=100)
|
46 |
+
st.markdown("""
|
47 |
+
---
|
48 |
+
### How to Use the App:
|
49 |
+
1. **Select a Model**: Use the dropdown menus to choose models for comparison.
|
50 |
+
2. **Provide a Prompt**: Enter a prompt that the models will use to generate a response.
|
51 |
+
3. **Adjust Parameters**: Fine-tune the settings for each model to explore how different configurations affect the results.
|
52 |
+
4. **Generate and Compare**: View the responses from multiple models side-by-side.
|
53 |
+
5. **Evaluate with LLM-as-a-Judge**: Use another model to compare and judge the outputs.
|
54 |
+
""")
|
55 |
+
|
56 |
+
# Explanation of Other Pages with Pokémon image (Ash)
|
57 |
+
st.image(bulbasaur_image, width=100)
|
58 |
+
st.markdown("""
|
59 |
+
---
|
60 |
+
### App Sections:
|
61 |
+
This Streamlit app consists of two key pages that help you interact with the Fireworks AI platform and perform model comparisons.
|
62 |
+
|
63 |
+
- **Page 1: Comparing LLMs**
|
64 |
+
- On this page, you can compare the outputs of three selected LLMs from Fireworks AI by providing a single prompt.
|
65 |
+
- The outputs are displayed side-by-side for easy comparison, and a selected LLM can act as a judge to evaluate the responses.
|
66 |
+
|
67 |
+
- **Page 2: Parameter Exploration for LLMs**
|
68 |
+
- This page allows you to adjust various parameters like **Max Tokens**, **Temperature**, and **Sampling Methods** for LLMs.
|
69 |
+
- You can provide a prompt and see how different parameter configurations affect the output for each model.
|
70 |
+
- The LLM-as-a-Judge is also used to compare and evaluate the generated responses.
|
71 |
+
""")
|
72 |
+
st.image(charmander_image, width=100)
|
73 |
+
|
74 |
+
# Background Information about Fireworks Models with Pokémon image (Bulbasaur again for symmetry)
|
75 |
+
|
76 |
+
st.markdown("""
|
77 |
+
---
|
78 |
+
### Fireworks AI Models:
|
79 |
+
Fireworks AI provides access to a variety of Large Language Models (LLMs) that you can query and experiment with, including:
|
80 |
+
|
81 |
+
- **Text Models**: These models are designed for tasks such as text generation, completion, and Q&A.
|
82 |
+
- **Model Parameters**: By adjusting parameters such as temperature, top-p, and top-k, you can influence the behavior of the models and the creativity or focus of their outputs.
|
83 |
+
|
84 |
+
For more information, check out the [Fireworks API Documentation](https://docs.fireworks.ai/api-reference/post-chatcompletions) and learn how to query different models using Fireworks' Python Client.
|
85 |
+
""")
|
img/ash.png
ADDED
![]() |
img/bulbasaur.png
ADDED
![]() |
img/charmander.png
ADDED
![]() |
img/fireworksai_logo.png
ADDED
![]() |
img/home_page_1.png
ADDED
![]() |
img/home_page_2.png
ADDED
![]() |
img/page_1_a.png
ADDED
![]() |
img/page_1_b.png
ADDED
![]() |
img/page_1_c.png
ADDED
![]() |
img/page_1_empty.png
ADDED
![]() |
img/page_2_a.png
ADDED
![]() |
img/page_2_b.png
ADDED
![]() |
img/page_2_c.png
ADDED
![]() |
img/page_2_empty.png
ADDED
![]() |
img/squirtel.png
ADDED
![]() |
pages/1_Comparing_LLMs.py
ADDED
@@ -0,0 +1,185 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from dotenv import load_dotenv
|
2 |
+
import os
|
3 |
+
from PIL import Image
|
4 |
+
import streamlit as st
|
5 |
+
import fireworks.client
|
6 |
+
|
7 |
+
st.set_page_config(page_title="LLM Comparison Tool", page_icon="🎇")
|
8 |
+
st.title("LLM-as-a-judge: Comparing LLMs using Fireworks")
|
9 |
+
st.write("A light introduction to how easy it is to swap LLMs and how to use the Fireworks Python client")
|
10 |
+
|
11 |
+
# Clear the cache before starting
|
12 |
+
st.cache_data.clear()
|
13 |
+
|
14 |
+
# Specify the path to the .env file in the env/ directory
|
15 |
+
dotenv_path = os.path.join(os.path.dirname(__file__), '..', 'env', '.env')
|
16 |
+
|
17 |
+
# Load the .env file from the specified path
|
18 |
+
load_dotenv(dotenv_path)
|
19 |
+
|
20 |
+
# Get the Fireworks API key from the environment variable
|
21 |
+
fireworks_api_key = os.getenv("FIREWORKS_API_KEY")
|
22 |
+
|
23 |
+
if not fireworks_api_key:
|
24 |
+
raise ValueError("No API key found in the .env file. Please add your FIREWORKS_API_KEY to the .env file.")
|
25 |
+
|
26 |
+
# Load the image
|
27 |
+
logo_image = Image.open("img/fireworksai_logo.png")
|
28 |
+
ash_image = Image.open("img/ash.png")
|
29 |
+
bulbasaur_image = Image.open("img/bulbasaur.png")
|
30 |
+
squirtel_image = Image.open("img/squirtel.png")
|
31 |
+
charmander_image = Image.open("img/charmander.png")
|
32 |
+
|
33 |
+
st.divider()
|
34 |
+
# Streamlit app
|
35 |
+
st.subheader("Fireworks Playground")
|
36 |
+
|
37 |
+
st.write("Fireworks AI is a platform that offers serverless and scalable AI models.")
|
38 |
+
st.write("👉 Learn more here: [Fireworks Serverless Models](https://fireworks.ai/models?show=Serverless)")
|
39 |
+
st.divider()
|
40 |
+
|
41 |
+
# Sidebar for selecting models
|
42 |
+
with st.sidebar:
|
43 |
+
st.image(logo_image)
|
44 |
+
|
45 |
+
st.write("Select three models to compare their outputs:")
|
46 |
+
|
47 |
+
st.image(bulbasaur_image, width=80)
|
48 |
+
option_1 = st.selectbox("Select Model 1", [
|
49 |
+
"Text: Meta Llama 3.1 Instruct - 70B",
|
50 |
+
"Text: Meta Llama 3.1 Instruct - 8B",
|
51 |
+
"Text: Meta Llama 3.2 Instruct - 3B",
|
52 |
+
"Text: Gemma 2 Instruct - 9B",
|
53 |
+
"Text: Mixtral MoE Instruct - 8x22B",
|
54 |
+
"Text: Mixtral MoE Instruct - 8x7B",
|
55 |
+
"Text: MythoMax L2 - 13B"
|
56 |
+
], index=2) # Default to Meta Llama 3.2 Instruct - 3B
|
57 |
+
|
58 |
+
st.image(charmander_image, width=80)
|
59 |
+
option_2 = st.selectbox("Select Model 2", [
|
60 |
+
"Text: Meta Llama 3.1 Instruct - 70B",
|
61 |
+
"Text: Meta Llama 3.1 Instruct - 8B",
|
62 |
+
"Text: Meta Llama 3.2 Instruct - 3B",
|
63 |
+
"Text: Gemma 2 Instruct - 9B",
|
64 |
+
"Text: Mixtral MoE Instruct - 8x22B",
|
65 |
+
"Text: Mixtral MoE Instruct - 8x7B",
|
66 |
+
"Text: MythoMax L2 - 13B"
|
67 |
+
], index=5) # Default to Mixtral MoE Instruct - 8x7B
|
68 |
+
|
69 |
+
st.image(squirtel_image, width=80)
|
70 |
+
option_3 = st.selectbox("Select Model 3", [
|
71 |
+
"Text: Meta Llama 3.1 Instruct - 70B",
|
72 |
+
"Text: Meta Llama 3.1 Instruct - 8B",
|
73 |
+
"Text: Meta Llama 3.2 Instruct - 3B",
|
74 |
+
"Text: Gemma 2 Instruct - 9B",
|
75 |
+
"Text: Mixtral MoE Instruct - 8x22B",
|
76 |
+
"Text: Mixtral MoE Instruct - 8x7B",
|
77 |
+
"Text: MythoMax L2 - 13B"
|
78 |
+
], index=0) # Default to Gemma 2 Instruct - 9B
|
79 |
+
|
80 |
+
# Dropdown to select the LLM that will perform the comparison
|
81 |
+
st.image(ash_image, width=80)
|
82 |
+
comparison_llm = st.selectbox("Select Comparison Model", [
|
83 |
+
"Text: Meta Llama 3.1 Instruct - 70B",
|
84 |
+
"Text: Meta Llama 3.1 Instruct - 8B",
|
85 |
+
"Text: Meta Llama 3.2 Instruct - 3B",
|
86 |
+
"Text: Gemma 2 Instruct - 9B",
|
87 |
+
"Text: Mixtral MoE Instruct - 8x22B",
|
88 |
+
"Text: Mixtral MoE Instruct - 8x7B",
|
89 |
+
"Text: MythoMax L2 - 13B"
|
90 |
+
], index=5) # Default to MythoMax L2 - 13B
|
91 |
+
|
92 |
+
os.environ["FIREWORKS_API_KEY"] = fireworks_api_key
|
93 |
+
|
94 |
+
# Helper text for the prompt
|
95 |
+
st.markdown("### Enter your prompt below to generate responses:")
|
96 |
+
|
97 |
+
prompt = st.text_input("Prompt", label_visibility="collapsed")
|
98 |
+
st.divider()
|
99 |
+
|
100 |
+
# Function to generate a response from a text model
|
101 |
+
def generate_text_response(model_name, prompt):
|
102 |
+
return fireworks.client.ChatCompletion.create(
|
103 |
+
model=model_name,
|
104 |
+
messages=[{
|
105 |
+
"role": "user",
|
106 |
+
"content": prompt,
|
107 |
+
}]
|
108 |
+
)
|
109 |
+
|
110 |
+
# Function to compare the three responses using the selected LLM
|
111 |
+
def compare_responses(response_1, response_2, response_3, comparison_model):
|
112 |
+
comparison_prompt = f"Compare the following three responses:\n\nResponse 1: {response_1}\n\nResponse 2: {response_2}\n\nResponse 3: {response_3}\n\nProvide a succinct comparison."
|
113 |
+
|
114 |
+
comparison_response = fireworks.client.ChatCompletion.create(
|
115 |
+
model=comparison_model, # Use the selected LLM for comparison
|
116 |
+
messages=[{
|
117 |
+
"role": "user",
|
118 |
+
"content": comparison_prompt,
|
119 |
+
}]
|
120 |
+
)
|
121 |
+
|
122 |
+
return comparison_response.choices[0].message.content
|
123 |
+
|
124 |
+
|
125 |
+
# If Generate button is clicked
|
126 |
+
if st.button("Generate"):
|
127 |
+
if not fireworks_api_key.strip() or not prompt.strip():
|
128 |
+
st.error("Please provide the missing fields.")
|
129 |
+
else:
|
130 |
+
try:
|
131 |
+
with st.spinner("Please wait..."):
|
132 |
+
fireworks.client.api_key = fireworks_api_key
|
133 |
+
|
134 |
+
# Create three columns for side-by-side comparison
|
135 |
+
col1, col2, col3 = st.columns(3)
|
136 |
+
|
137 |
+
# Model 1
|
138 |
+
with col1:
|
139 |
+
st.subheader(f"Model 1: {option_1}")
|
140 |
+
st.image(bulbasaur_image)
|
141 |
+
if option_1.startswith("Text"):
|
142 |
+
model_map = {
|
143 |
+
"Text: Meta Llama 3.1 Instruct - 70B": "accounts/fireworks/models/llama-v3p1-70b-instruct",
|
144 |
+
"Text: Meta Llama 3.1 Instruct - 8B": "accounts/fireworks/models/llama-v3p1-8b-instruct",
|
145 |
+
"Text: Meta Llama 3.2 Instruct - 3B": "accounts/fireworks/models/llama-v3p2-3b-instruct",
|
146 |
+
"Text: Gemma 2 Instruct - 9B": "accounts/fireworks/models/gemma2-9b-it",
|
147 |
+
"Text: Mixtral MoE Instruct - 8x22B": "accounts/fireworks/models/mixtral-8x22b-instruct",
|
148 |
+
"Text: Mixtral MoE Instruct - 8x7B": "accounts/fireworks/models/mixtral-8x7b-instruct",
|
149 |
+
"Text: MythoMax L2 - 13B": "accounts/fireworks/models/mythomax-l2-13b"
|
150 |
+
}
|
151 |
+
response_1 = generate_text_response(model_map[option_1], prompt)
|
152 |
+
st.success(response_1.choices[0].message.content)
|
153 |
+
|
154 |
+
# Model 2
|
155 |
+
with col2:
|
156 |
+
st.subheader(f"Model 2: {option_2}")
|
157 |
+
st.image(charmander_image)
|
158 |
+
response_2 = generate_text_response(model_map[option_2], prompt)
|
159 |
+
st.success(response_2.choices[0].message.content)
|
160 |
+
|
161 |
+
# Model 3
|
162 |
+
with col3:
|
163 |
+
st.subheader(f"Model 3: {option_3}")
|
164 |
+
st.image(squirtel_image)
|
165 |
+
response_3 = generate_text_response(model_map[option_3], prompt)
|
166 |
+
st.success(response_3.choices[0].message.content)
|
167 |
+
|
168 |
+
# Visual divider between model responses and comparison
|
169 |
+
st.divider()
|
170 |
+
|
171 |
+
# Generate a comparison of the three responses using the selected LLM
|
172 |
+
comparison = compare_responses(
|
173 |
+
response_1.choices[0].message.content,
|
174 |
+
response_2.choices[0].message.content,
|
175 |
+
response_3.choices[0].message.content,
|
176 |
+
model_map[comparison_llm]
|
177 |
+
)
|
178 |
+
|
179 |
+
# Display the comparison
|
180 |
+
st.subheader("Comparison of the Three Responses:")
|
181 |
+
st.image(ash_image)
|
182 |
+
st.write(comparison)
|
183 |
+
|
184 |
+
except Exception as e:
|
185 |
+
st.exception(f"Exception: {e}")
|
pages/2_Parameter_Exploration_for_LLMs.py
ADDED
@@ -0,0 +1,293 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from dotenv import load_dotenv
|
2 |
+
import os
|
3 |
+
from PIL import Image
|
4 |
+
import random
|
5 |
+
import streamlit as st
|
6 |
+
import fireworks.client
|
7 |
+
|
8 |
+
# Set page configuration
|
9 |
+
st.set_page_config(page_title="LLM Parameters Comparison", page_icon="🎇")
|
10 |
+
st.title("Understanding the Completions Chat API parameters")
|
11 |
+
st.write("Compare LLM responses with different sets of parameters and evaluate the results using an LLM-as-a-judge.")
|
12 |
+
st.markdown("Check out our [Chat Completions API Documentation](https://docs.fireworks.ai/api-reference/post-chatcompletions) for more information on the parameters.")
|
13 |
+
|
14 |
+
# Add expandable section for parameter descriptions
|
15 |
+
with st.expander("Parameter Descriptions", expanded=False):
|
16 |
+
st.markdown("""
|
17 |
+
**Max Tokens**: Maximum number of tokens the model can generate.<br>
|
18 |
+
**Prompt Truncate Length**: Number of tokens from the input prompt considered.<br>
|
19 |
+
**Temperature**: Controls randomness of the output.<br>
|
20 |
+
**Top-p (Nucleus Sampling)**: Cumulative probability of token selection.<br>
|
21 |
+
**Top-k**: Limits the number of tokens sampled.<br>
|
22 |
+
**Frequency Penalty**: Discourages repeated words or phrases.<br>
|
23 |
+
**Presence Penalty**: Encourages new topics.<br>
|
24 |
+
**Stop Sequence**: Defines when to stop generating tokens.
|
25 |
+
""", unsafe_allow_html=True)
|
26 |
+
|
27 |
+
# Load environment variables
|
28 |
+
dotenv_path = os.path.join(os.path.dirname(__file__), '..', 'env', '.env')
|
29 |
+
load_dotenv(dotenv_path)
|
30 |
+
|
31 |
+
# Get the Fireworks API key from environment variables
|
32 |
+
fireworks_api_key = os.getenv("FIREWORKS_API_KEY")
|
33 |
+
if not fireworks_api_key:
|
34 |
+
raise ValueError("No API key found in the .env file. Please add your FIREWORKS_API_KEY to the .env file.")
|
35 |
+
|
36 |
+
os.environ["FIREWORKS_API_KEY"] = fireworks_api_key
|
37 |
+
|
38 |
+
# Load the images
|
39 |
+
logo_image = Image.open("img/fireworksai_logo.png")
|
40 |
+
bulbasaur_image = Image.open("img/bulbasaur.png")
|
41 |
+
charmander_image = Image.open("img/charmander.png")
|
42 |
+
squirtel_image = Image.open("img/squirtel.png")
|
43 |
+
ash_image = Image.open("img/ash.png")
|
44 |
+
|
45 |
+
# Map models to their respective identifiers
|
46 |
+
model_map = {
|
47 |
+
"Text: Meta Llama 3.1 Instruct - 70B": "accounts/fireworks/models/llama-v3p1-70b-instruct",
|
48 |
+
"Text: Meta Llama 3.1 Instruct - 8B": "accounts/fireworks/models/llama-v3p1-8b-instruct",
|
49 |
+
"Text: Meta Llama 3.2 Instruct - 3B": "accounts/fireworks/models/llama-v3p2-3b-instruct",
|
50 |
+
"Text: Gemma 2 Instruct - 9B": "accounts/fireworks/models/gemma2-9b-it",
|
51 |
+
"Text: Mixtral MoE Instruct - 8x22B": "accounts/fireworks/models/mixtral-8x22b-instruct",
|
52 |
+
"Text: Mixtral MoE Instruct - 8x7B": "accounts/fireworks/models/mixtral-8x7b-instruct",
|
53 |
+
"Text: MythoMax L2 - 13B": "accounts/fireworks/models/mythomax-l2-13b"
|
54 |
+
}
|
55 |
+
|
56 |
+
# Function to generate a response from a text model with parameters
|
57 |
+
def generate_text_response(model_name, prompt, params):
|
58 |
+
return fireworks.client.ChatCompletion.create(
|
59 |
+
model=model_name,
|
60 |
+
messages=[{
|
61 |
+
"role": "user",
|
62 |
+
"content": prompt,
|
63 |
+
}],
|
64 |
+
max_tokens=params["max_tokens"],
|
65 |
+
temperature=params["temperature"],
|
66 |
+
top_p=params["top_p"],
|
67 |
+
top_k=params["top_k"],
|
68 |
+
frequency_penalty=params["frequency_penalty"],
|
69 |
+
presence_penalty=params["presence_penalty"],
|
70 |
+
stop=params["stop"]
|
71 |
+
)
|
72 |
+
|
73 |
+
# Function to compare the three responses using the selected LLM
|
74 |
+
def compare_responses(response_1, response_2, response_3, comparison_model):
|
75 |
+
comparison_prompt = f"Compare the following three responses:\n\nResponse 1: {response_1}\n\nResponse 2: {response_2}\n\nResponse 3: {response_3}\n\nProvide a succinct comparison."
|
76 |
+
|
77 |
+
comparison_response = fireworks.client.ChatCompletion.create(
|
78 |
+
model=comparison_model,
|
79 |
+
messages=[{
|
80 |
+
"role": "user",
|
81 |
+
"content": comparison_prompt,
|
82 |
+
}]
|
83 |
+
)
|
84 |
+
|
85 |
+
return comparison_response.choices[0].message.content
|
86 |
+
|
87 |
+
# Slightly randomize parameters for sets 2 and 3
|
88 |
+
def randomize_params():
|
89 |
+
return {
|
90 |
+
"max_tokens": random.randint(100, 200),
|
91 |
+
"prompt_truncate_len": random.randint(100, 200),
|
92 |
+
"temperature": round(random.uniform(0.7, 1.3), 2),
|
93 |
+
"top_p": round(random.uniform(0.8, 1.0), 2),
|
94 |
+
"top_k": random.randint(30, 70),
|
95 |
+
"frequency_penalty": round(random.uniform(0, 1), 2),
|
96 |
+
"presence_penalty": round(random.uniform(0, 1), 2),
|
97 |
+
"n": 1,
|
98 |
+
"stop": None
|
99 |
+
}
|
100 |
+
|
101 |
+
# Sidebar for LLM selection, prompt, and judge LLM
|
102 |
+
with st.sidebar:
|
103 |
+
st.image(logo_image)
|
104 |
+
|
105 |
+
# Select the model for generating responses
|
106 |
+
st.subheader("Select LLM for Generating Responses")
|
107 |
+
model = st.selectbox("Select a model for generating responses:", [
|
108 |
+
"Text: Meta Llama 3.1 Instruct - 70B",
|
109 |
+
"Text: Meta Llama 3.1 Instruct - 8B",
|
110 |
+
"Text: Meta Llama 3.2 Instruct - 3B",
|
111 |
+
"Text: Gemma 2 Instruct - 9B",
|
112 |
+
"Text: Mixtral MoE Instruct - 8x22B",
|
113 |
+
"Text: Mixtral MoE Instruct - 8x7B",
|
114 |
+
"Text: MythoMax L2 - 13B"
|
115 |
+
], index=2)
|
116 |
+
|
117 |
+
# Placeholder prompts
|
118 |
+
suggested_prompts = [
|
119 |
+
"Prompt 1: Describe the future of AI.",
|
120 |
+
"Prompt 2: Write a short story about a cat who becomes the mayor of a small town",
|
121 |
+
"Prompt 3: Write a step-by-step guide to making pancakes from scratch.",
|
122 |
+
"Prompt 4: Generate a grocery list and meal plan for a vegetarian family of four for one week.",
|
123 |
+
"Prompt 5: Generate a story in which a time traveler goes back to Ancient Greece, accidentally introduces modern memes to philosophers like Socrates and Plato, and causes chaos in the philosophical discourse.",
|
124 |
+
"Prompt 6: Create a timeline where dinosaurs never went extinct and developed their own civilizations, and describe their technology and cultural achievements in the year 2024.",
|
125 |
+
"Prompt 7: Explain the concept of Gödel’s incompleteness theorems in the form of a Dr. Seuss poem, using at least 10 distinct rhyme schemes."
|
126 |
+
]
|
127 |
+
|
128 |
+
# Selectbox for suggested prompts
|
129 |
+
selected_prompt = st.selectbox("Choose a suggested prompt:", suggested_prompts)
|
130 |
+
|
131 |
+
# Input box where the user can edit the selected prompt or enter a custom one
|
132 |
+
prompt = st.text_input("Prompt", value=selected_prompt)
|
133 |
+
|
134 |
+
# Select the LLM for judging the responses
|
135 |
+
st.subheader("Select LLM for Judge")
|
136 |
+
judge_llm = st.selectbox("Select a model to act as the judge:", [
|
137 |
+
"Text: Meta Llama 3.1 Instruct - 70B",
|
138 |
+
"Text: Meta Llama 3.1 Instruct - 8B",
|
139 |
+
"Text: Meta Llama 3.2 Instruct - 3B",
|
140 |
+
"Text: Gemma 2 Instruct - 9B",
|
141 |
+
"Text: Mixtral MoE Instruct - 8x22B",
|
142 |
+
"Text: Mixtral MoE Instruct - 8x7B",
|
143 |
+
"Text: MythoMax L2 - 13B"
|
144 |
+
], index=2)
|
145 |
+
|
146 |
+
# Create three columns for parameter sets side-by-side
|
147 |
+
col1, col2, col3 = st.columns(3)
|
148 |
+
|
149 |
+
# Parameters for Output 1 (Bulbasaur image)
|
150 |
+
with col1:
|
151 |
+
st.subheader("Parameter Set #1")
|
152 |
+
st.image(bulbasaur_image, width=100) # Bulbasaur image
|
153 |
+
max_tokens_1 = st.slider("Max Tokens", 50, 1000, 123)
|
154 |
+
prompt_truncate_len_1 = st.slider("Prompt Truncate Length", 50, 200, 123)
|
155 |
+
temperature_1 = st.slider("Temperature", 0.1, 2.0, 1.0)
|
156 |
+
top_p_1 = st.slider("Top-p", 0.0, 1.0, 1.0)
|
157 |
+
top_k_1 = st.slider("Top-k", 0, 100, 50)
|
158 |
+
frequency_penalty_1 = st.slider("Frequency Penalty", 0.0, 2.0, 0.0)
|
159 |
+
presence_penalty_1 = st.slider("Presence Penalty", 0.0, 2.0, 0.0)
|
160 |
+
stop_1 = st.text_input("Stop Sequence", "")
|
161 |
+
|
162 |
+
params_1 = {
|
163 |
+
"max_tokens": max_tokens_1,
|
164 |
+
"prompt_truncate_len": prompt_truncate_len_1,
|
165 |
+
"temperature": temperature_1,
|
166 |
+
"top_p": top_p_1,
|
167 |
+
"top_k": top_k_1,
|
168 |
+
"frequency_penalty": frequency_penalty_1,
|
169 |
+
"presence_penalty": presence_penalty_1,
|
170 |
+
"n": 1,
|
171 |
+
"stop": stop_1 if stop_1 else None
|
172 |
+
}
|
173 |
+
|
174 |
+
# Parameters for Output 2 (Charmander image)
|
175 |
+
with col2:
|
176 |
+
st.subheader("Parameter Set #2")
|
177 |
+
st.image(charmander_image, width=100) # Charmander image
|
178 |
+
use_random_2 = st.checkbox("Randomize parameters for Output 2", value=True)
|
179 |
+
if use_random_2:
|
180 |
+
params_2 = randomize_params()
|
181 |
+
st.write("**Random Parameters for Output 2:**")
|
182 |
+
st.json(params_2) # Display random params
|
183 |
+
else:
|
184 |
+
max_tokens_2 = st.slider("Max Tokens (Output 2)", 50, 1000, 150)
|
185 |
+
prompt_truncate_len_2 = st.slider("Prompt Truncate Length (Output 2)", 50, 200, 150)
|
186 |
+
temperature_2 = st.slider("Temperature (Output 2)", 0.1, 2.0, 0.9)
|
187 |
+
top_p_2 = st.slider("Top-p (Output 2)", 0.0, 1.0, 0.95)
|
188 |
+
top_k_2 = st.slider("Top-k (Output 2)", 0, 100, 45)
|
189 |
+
frequency_penalty_2 = st.slider("Frequency Penalty (Output 2)", 0.0, 2.0, 0.1)
|
190 |
+
presence_penalty_2 = st.slider("Presence Penalty (Output 2)", 0.0, 2.0, 0.1)
|
191 |
+
stop_2 = st.text_input("Stop Sequence (Output 2)", "")
|
192 |
+
|
193 |
+
params_2 = {
|
194 |
+
"max_tokens": max_tokens_2,
|
195 |
+
"prompt_truncate_len": prompt_truncate_len_2,
|
196 |
+
"temperature": temperature_2,
|
197 |
+
"top_p": top_p_2,
|
198 |
+
"top_k": top_k_2,
|
199 |
+
"frequency_penalty": frequency_penalty_2,
|
200 |
+
"presence_penalty": presence_penalty_2,
|
201 |
+
"n": 1,
|
202 |
+
"stop": stop_2 if stop_2 else None
|
203 |
+
}
|
204 |
+
|
205 |
+
# Parameters for Output 3 (Squirtle image)
|
206 |
+
with col3:
|
207 |
+
st.subheader("Parameter Set #3")
|
208 |
+
st.image(squirtel_image, width=100) # Squirtle image
|
209 |
+
use_random_3 = st.checkbox("Randomize parameters for Output 3", value=True)
|
210 |
+
if use_random_3:
|
211 |
+
params_3 = randomize_params()
|
212 |
+
st.write("**Random Parameters for Output 3:**")
|
213 |
+
st.json(params_3) # Display random params
|
214 |
+
else:
|
215 |
+
max_tokens_3 = st.slider("Max Tokens (Output 3)", 50, 1000, 180)
|
216 |
+
prompt_truncate_len_3 = st.slider("Prompt Truncate Length (Output 3)", 50, 200, 140)
|
217 |
+
temperature_3 = st.slider("Temperature (Output 3)", 0.1, 2.0, 1.1)
|
218 |
+
top_p_3 = st.slider("Top-p (Output 3)", 0.0, 1.0, 0.85)
|
219 |
+
top_k_3 = st.slider("Top-k (Output 3)", 0, 100, 60)
|
220 |
+
frequency_penalty_3 = st.slider("Frequency Penalty (Output 3)", 0.0, 2.0, 0.05)
|
221 |
+
presence_penalty_3 = st.slider("Presence Penalty (Output 3)", 0.0, 2.0, 0.2)
|
222 |
+
stop_3 = st.text_input("Stop Sequence (Output 3)", "")
|
223 |
+
|
224 |
+
params_3 = {
|
225 |
+
"max_tokens": max_tokens_3,
|
226 |
+
"prompt_truncate_len": prompt_truncate_len_3,
|
227 |
+
"temperature": temperature_3,
|
228 |
+
"top_p": top_p_3,
|
229 |
+
"top_k": top_k_3,
|
230 |
+
"frequency_penalty": frequency_penalty_3,
|
231 |
+
"presence_penalty": presence_penalty_3,
|
232 |
+
"n": 1,
|
233 |
+
"stop": stop_3 if stop_3 else None
|
234 |
+
}
|
235 |
+
|
236 |
+
# Divider above generate button
|
237 |
+
st.divider()
|
238 |
+
|
239 |
+
# Generate button and logic
|
240 |
+
st.subheader("Just hit play")
|
241 |
+
st.write("See the effect of selecting parameters on the responses.")
|
242 |
+
|
243 |
+
|
244 |
+
if st.button("Generate"):
|
245 |
+
if not fireworks_api_key.strip() or not prompt.strip():
|
246 |
+
st.error("Please provide the missing fields.")
|
247 |
+
else:
|
248 |
+
try:
|
249 |
+
with st.spinner("Please wait..."):
|
250 |
+
fireworks.client.api_key = fireworks_api_key
|
251 |
+
|
252 |
+
# Generate responses for each set of parameters
|
253 |
+
response_1 = generate_text_response(model_map[model], prompt, params_1)
|
254 |
+
response_2 = generate_text_response(model_map[model], prompt, params_2)
|
255 |
+
response_3 = generate_text_response(model_map[model], prompt, params_3)
|
256 |
+
|
257 |
+
# Display results in the main section
|
258 |
+
col1, col2, col3 = st.columns(3)
|
259 |
+
|
260 |
+
with col1:
|
261 |
+
st.subheader("Response 1")
|
262 |
+
st.image(bulbasaur_image, width=100)
|
263 |
+
st.success(response_1.choices[0].message.content)
|
264 |
+
|
265 |
+
with col2:
|
266 |
+
st.subheader("Response 2")
|
267 |
+
st.image(charmander_image, width=100)
|
268 |
+
st.success(response_2.choices[0].message.content)
|
269 |
+
|
270 |
+
with col3:
|
271 |
+
st.subheader("Response 3")
|
272 |
+
st.image(squirtel_image, width=100)
|
273 |
+
st.success(response_3.choices[0].message.content)
|
274 |
+
|
275 |
+
st.divider()
|
276 |
+
|
277 |
+
# Use the selected LLM as the judge and display Ash image
|
278 |
+
st.subheader("LLM-as-a-Judge Comparison")
|
279 |
+
st.image(ash_image, width=100)
|
280 |
+
comparison = compare_responses(
|
281 |
+
response_1.choices[0].message.content,
|
282 |
+
response_2.choices[0].message.content,
|
283 |
+
response_3.choices[0].message.content,
|
284 |
+
model_map[judge_llm]
|
285 |
+
)
|
286 |
+
|
287 |
+
st.write(comparison)
|
288 |
+
|
289 |
+
except Exception as e:
|
290 |
+
st.exception(f"Exception: {e}")
|
291 |
+
|
292 |
+
# Divider below generate button
|
293 |
+
st.divider()
|
requirements.txt
ADDED
@@ -0,0 +1,246 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
aiohappyeyeballs==2.4.0
|
2 |
+
aiohttp==3.10.5
|
3 |
+
aiosignal==1.3.1
|
4 |
+
alembic==1.13.2
|
5 |
+
altair==5.4.1
|
6 |
+
annotated-types==0.7.0
|
7 |
+
anthropic==0.29.0
|
8 |
+
anyio==4.6.0
|
9 |
+
appdirs==1.4.4
|
10 |
+
appnope==0.1.4
|
11 |
+
asgiref==3.8.1
|
12 |
+
asttokens==2.4.1
|
13 |
+
attrs==24.2.0
|
14 |
+
backoff==2.2.1
|
15 |
+
bcrypt==4.2.0
|
16 |
+
beautifulsoup4==4.12.3
|
17 |
+
blinker==1.8.2
|
18 |
+
boto3==1.35.24
|
19 |
+
botocore==1.35.24
|
20 |
+
bs4==0.0.2
|
21 |
+
build==1.2.2
|
22 |
+
cachetools==5.5.0
|
23 |
+
certifi==2024.8.30
|
24 |
+
cffi==1.17.1
|
25 |
+
chardet==5.2.0
|
26 |
+
charset-normalizer==3.3.2
|
27 |
+
chroma-hnswlib==0.7.3
|
28 |
+
chromadb==0.4.24
|
29 |
+
click==8.1.7
|
30 |
+
cohere==5.9.4
|
31 |
+
coloredlogs==15.0.1
|
32 |
+
comm==0.2.2
|
33 |
+
crewai==0.32.2
|
34 |
+
cryptography==43.0.1
|
35 |
+
dataclasses-json==0.6.7
|
36 |
+
debugpy==1.8.5
|
37 |
+
decorator==5.1.1
|
38 |
+
deepdiff==8.0.1
|
39 |
+
defusedxml==0.7.1
|
40 |
+
Deprecated==1.2.14
|
41 |
+
dirtyjson==1.0.8
|
42 |
+
distro==1.9.0
|
43 |
+
docstring_parser==0.16
|
44 |
+
durationpy==0.7
|
45 |
+
embedchain==0.1.109
|
46 |
+
emoji==2.13.0
|
47 |
+
executing==2.1.0
|
48 |
+
faiss-cpu==1.8.0.post1
|
49 |
+
fastapi==0.115.0
|
50 |
+
fastavro==1.9.7
|
51 |
+
filelock==3.16.1
|
52 |
+
filetype==1.2.0
|
53 |
+
fireworks-ai==0.15.3
|
54 |
+
flatbuffers==24.3.25
|
55 |
+
frozendict==2.4.4
|
56 |
+
frozenlist==1.4.1
|
57 |
+
fsspec==2024.9.0
|
58 |
+
gitdb==4.0.11
|
59 |
+
GitPython==3.1.43
|
60 |
+
google-api-core==2.20.0
|
61 |
+
google-auth==2.35.0
|
62 |
+
google-cloud-aiplatform==1.67.1
|
63 |
+
google-cloud-bigquery==3.25.0
|
64 |
+
google-cloud-core==2.4.1
|
65 |
+
google-cloud-resource-manager==1.12.5
|
66 |
+
google-cloud-storage==2.18.2
|
67 |
+
google-crc32c==1.6.0
|
68 |
+
google-resumable-media==2.7.2
|
69 |
+
google_search_results==2.4.2
|
70 |
+
googleapis-common-protos==1.65.0
|
71 |
+
gptcache==0.1.44
|
72 |
+
greenlet==3.1.1
|
73 |
+
grpc-google-iam-v1==0.13.1
|
74 |
+
grpcio==1.66.1
|
75 |
+
grpcio-status==1.62.3
|
76 |
+
h11==0.14.0
|
77 |
+
html5lib==1.1
|
78 |
+
httpcore==1.0.5
|
79 |
+
httptools==0.6.1
|
80 |
+
httpx==0.27.2
|
81 |
+
httpx-sse==0.4.0
|
82 |
+
huggingface-hub==0.25.0
|
83 |
+
humanfriendly==10.0
|
84 |
+
idna==3.10
|
85 |
+
importlib_metadata==8.4.0
|
86 |
+
importlib_resources==6.4.5
|
87 |
+
instructor==1.3.3
|
88 |
+
ipykernel==6.29.5
|
89 |
+
ipython==8.27.0
|
90 |
+
jedi==0.19.1
|
91 |
+
Jinja2==3.1.4
|
92 |
+
jiter==0.4.2
|
93 |
+
jmespath==1.0.1
|
94 |
+
joblib==1.4.2
|
95 |
+
jsonpatch==1.33
|
96 |
+
jsonpath-python==1.0.6
|
97 |
+
jsonpointer==3.0.0
|
98 |
+
jsonref==1.1.0
|
99 |
+
jsonschema==4.23.0
|
100 |
+
jsonschema-specifications==2023.12.1
|
101 |
+
jupyter_client==8.6.3
|
102 |
+
jupyter_core==5.7.2
|
103 |
+
kubernetes==31.0.0
|
104 |
+
langchain==0.1.20
|
105 |
+
langchain-anthropic==0.1.10
|
106 |
+
langchain-cohere==0.1.5
|
107 |
+
langchain-community==0.0.38
|
108 |
+
langchain-core==0.1.52
|
109 |
+
langchain-openai==0.1.7
|
110 |
+
langchain-text-splitters==0.0.2
|
111 |
+
langdetect==1.0.9
|
112 |
+
langsmith==0.1.125
|
113 |
+
llama-cloud==0.0.6
|
114 |
+
llama-index-core==0.10.50.post1
|
115 |
+
llama-index-readers-file==0.1.25
|
116 |
+
llama-parse==0.4.4
|
117 |
+
lxml==5.3.0
|
118 |
+
Mako==1.3.5
|
119 |
+
markdown-it-py==3.0.0
|
120 |
+
MarkupSafe==2.1.5
|
121 |
+
marshmallow==3.22.0
|
122 |
+
matplotlib-inline==0.1.7
|
123 |
+
mdurl==0.1.2
|
124 |
+
mmh3==5.0.1
|
125 |
+
monotonic==1.6
|
126 |
+
mpmath==1.3.0
|
127 |
+
multidict==6.1.0
|
128 |
+
multitasking==0.0.11
|
129 |
+
mypy-extensions==1.0.0
|
130 |
+
narwhals==1.8.2
|
131 |
+
nest-asyncio==1.6.0
|
132 |
+
networkx==3.3
|
133 |
+
nltk==3.9.1
|
134 |
+
numpy==1.26.4
|
135 |
+
oauthlib==3.2.2
|
136 |
+
onnxruntime==1.19.2
|
137 |
+
openai==1.47.0
|
138 |
+
opentelemetry-api==1.27.0
|
139 |
+
opentelemetry-exporter-otlp-proto-common==1.27.0
|
140 |
+
opentelemetry-exporter-otlp-proto-grpc==1.27.0
|
141 |
+
opentelemetry-exporter-otlp-proto-http==1.27.0
|
142 |
+
opentelemetry-instrumentation==0.48b0
|
143 |
+
opentelemetry-instrumentation-asgi==0.48b0
|
144 |
+
opentelemetry-instrumentation-fastapi==0.48b0
|
145 |
+
opentelemetry-proto==1.27.0
|
146 |
+
opentelemetry-sdk==1.27.0
|
147 |
+
opentelemetry-semantic-conventions==0.48b0
|
148 |
+
opentelemetry-util-http==0.48b0
|
149 |
+
orderly-set==5.2.2
|
150 |
+
orjson==3.10.7
|
151 |
+
overrides==7.7.0
|
152 |
+
packaging==23.2
|
153 |
+
pandas==2.2.3
|
154 |
+
parameterized==0.9.0
|
155 |
+
parso==0.8.4
|
156 |
+
peewee==3.17.6
|
157 |
+
pexpect==4.9.0
|
158 |
+
pillow==10.4.0
|
159 |
+
platformdirs==4.3.6
|
160 |
+
posthog==3.6.6
|
161 |
+
prettytable==3.11.0
|
162 |
+
prompt_toolkit==3.0.47
|
163 |
+
proto-plus==1.24.0
|
164 |
+
protobuf==4.25.5
|
165 |
+
psutil==6.0.0
|
166 |
+
ptyprocess==0.7.0
|
167 |
+
pulsar-client==3.5.0
|
168 |
+
pure_eval==0.2.3
|
169 |
+
pyarrow==17.0.0
|
170 |
+
pyasn1==0.6.1
|
171 |
+
pyasn1_modules==0.4.1
|
172 |
+
pycparser==2.22
|
173 |
+
pydantic==2.9.2
|
174 |
+
pydantic_core==2.23.4
|
175 |
+
pydeck==0.9.1
|
176 |
+
Pygments==2.18.0
|
177 |
+
pypdf==4.3.1
|
178 |
+
PyPika==0.48.9
|
179 |
+
pyproject_hooks==1.1.0
|
180 |
+
pysbd==0.3.4
|
181 |
+
python-dateutil==2.9.0.post0
|
182 |
+
python-dotenv==1.0.1
|
183 |
+
python-iso639==2024.4.27
|
184 |
+
python-magic==0.4.27
|
185 |
+
pytz==2024.2
|
186 |
+
PyYAML==6.0.2
|
187 |
+
pyzmq==26.2.0
|
188 |
+
rapidfuzz==3.9.7
|
189 |
+
referencing==0.35.1
|
190 |
+
regex==2023.12.25
|
191 |
+
requests==2.32.3
|
192 |
+
requests-oauthlib==2.0.0
|
193 |
+
requests-toolbelt==1.0.0
|
194 |
+
rich==13.8.1
|
195 |
+
rpds-py==0.20.0
|
196 |
+
rsa==4.9
|
197 |
+
s3transfer==0.10.2
|
198 |
+
safetensors==0.4.5
|
199 |
+
schema==0.7.7
|
200 |
+
scikit-learn==1.5.2
|
201 |
+
scipy==1.14.1
|
202 |
+
sec-api==1.0.18
|
203 |
+
sentence-transformers==3.1.1
|
204 |
+
setuptools==75.1.0
|
205 |
+
shapely==2.0.6
|
206 |
+
shellingham==1.5.4
|
207 |
+
six==1.16.0
|
208 |
+
smmap==5.0.1
|
209 |
+
sniffio==1.3.1
|
210 |
+
soupsieve==2.6
|
211 |
+
SQLAlchemy==2.0.35
|
212 |
+
stack-data==0.6.3
|
213 |
+
starlette==0.38.6
|
214 |
+
streamlit==1.38.0
|
215 |
+
striprtf==0.0.26
|
216 |
+
sympy==1.13.3
|
217 |
+
tabulate==0.9.0
|
218 |
+
tenacity==8.5.0
|
219 |
+
threadpoolctl==3.5.0
|
220 |
+
tiktoken==0.7.0
|
221 |
+
tokenizers==0.19.1
|
222 |
+
toml==0.10.2
|
223 |
+
torch==2.4.1
|
224 |
+
tornado==6.4.1
|
225 |
+
tqdm==4.66.5
|
226 |
+
traitlets==5.14.3
|
227 |
+
transformers==4.44.2
|
228 |
+
typer==0.12.5
|
229 |
+
types-requests==2.32.0.20240914
|
230 |
+
typing-inspect==0.9.0
|
231 |
+
typing_extensions==4.12.2
|
232 |
+
tzdata==2024.1
|
233 |
+
unstructured==0.14.8
|
234 |
+
unstructured-client==0.25.9
|
235 |
+
urllib3==2.2.3
|
236 |
+
uvicorn==0.30.6
|
237 |
+
uvloop==0.20.0
|
238 |
+
watchfiles==0.24.0
|
239 |
+
wcwidth==0.2.13
|
240 |
+
webencodings==0.5.1
|
241 |
+
websocket-client==1.8.0
|
242 |
+
websockets==13.1
|
243 |
+
wrapt==1.16.0
|
244 |
+
yarl==1.11.1
|
245 |
+
yfinance==0.2.40
|
246 |
+
zipp==3.20.2
|