File size: 10,982 Bytes
2fd6e58
468deee
37bd313
468deee
37bd313
 
 
468deee
 
d28c4a5
468deee
 
d28c4a5
 
 
 
 
468deee
37bd313
 
 
 
d28c4a5
 
 
 
 
37bd313
 
 
 
468deee
 
 
 
 
 
 
 
37bd313
468deee
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ead5edf
 
 
 
468deee
 
 
37bd313
468deee
37bd313
 
 
468deee
 
 
 
 
 
 
 
 
37bd313
468deee
37bd313
 
 
 
 
 
 
468deee
 
 
 
37bd313
468deee
 
 
37bd313
 
 
468deee
 
 
37bd313
 
 
 
 
468deee
 
45ee289
 
 
468deee
 
 
37bd313
468deee
 
 
 
 
 
37bd313
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
# Contribute to our AI TCO Calculator

## Presentation

The TCO Calculator’s purpose is to assist users in comparing the deployment [Total Cost of Ownership](https://www.techtarget.com/searchdatacenter/definition/TCO?Offer=abt_pubpro_AI-Insider) (TCO) of different AI consumption, from deploying an open-source model like Llama 2 on one's internal infrastructure, to consuming a SaaS AI like OpenAI. 

To do so, it computes the cost/request of said service and adds a labor cost to get a comprehensive estimate of how much the set-up of these services would cost.

Here is the formula used to compute the cost/request of an AI model service:
\\(CR = \frac{CIT_{1K} * IT + COT_{1K} * OT}{1000}\\)

with:
- \\(CR\\) = Cost per Request
- \\(CIT_{1K}\\) = Cost per 1000 Input Tokens
- \\(COT_{1K}\\) = Cost per 1000 Output Tokens
- \\(IT\\) = Input Tokens
- \\(OT\\) = Output Tokens

For instance, imagine we want to evaluate the cost of a summarization request. We will assume here that the number of input tokens is `500`, and that the output is `200` tokens.

If we use OpenAI GPT3.5 pricing, prices are `$0.0015` per 1k input tokens, and `$0.002` per 1k output tokens.

Therefore the formula gives us: 

\\(CR = \frac{0.0015 * 500 + 0.002 * 200}{1000}\\)

so \\(CR = $0.00115\\), aka `$1.15` per `1000` requests.


## Contributing

To contribute, you’ll have to provide the value of the input and output cost/token.

If you want to add your own service to this [Gradio](https://www.gradio.app/) application, you’ll have to follow two main steps:
1. Create a class for your model in our `models.py` file.
2. Add the name of your model class in our `app.py` file.

## Create a class for your model

Step by step, we’ll see how you can create a model class for your own service that can later be an option of our TCO Calculator. We’ll use a basic frame that you can find [here](https://huggingface.co/spaces/mithril-security/TCO_calculator/blob/main/contribution_example.py) in our calculator’s repository. 

First, you need to create a class for your model and set basic information such as the name of your service and the latency of your model.

```python
# The name of your new model service's class
class NewModel(BaseTCOModel):

    def __init__(self):
        # Name of the AI model service and the category it belongs to (SaaS, Open source)
        self.set_name("(Category) Service name")
        self.set_latency("The average latency of your model")
        super().__init__()
```
Then, you’ll have to create the core function of your model page, the `render` function. Its first elements will be the [Gradio components](https://www.gradio.app/docs/components) that you want to put in your model page.
It can be a Dropdown with multiple choices for the user to make or a Textbox with information about the computation parameters the user has to know (for instance, the cost of the hardware set-up used). There can be multiple of them.
**All components’ visibility must be set to `False`**.

```python
def render(self):
  # Create as many Gradio components as you want to provide information or customization to the user
  # Put all their visibility to False at the beginning
  # Don't forget to put the interactive parameter of the component to False if the value is fixed
  self.model_parameter = gr.Dropdown(["Option 1", "Option 2"], value="Option 1", visible=False, interactive=True, label="Title for this parameter", info="Add some information to clarify specific aspects of your parameter")
```
Then, still in the `render` function, you must instantiate your input and output cost/token. They are the key values needed to compute the cost/request of your AI model service. 
Note that the user can’t interact with these since they are the values you’ll have to provide from benchmark tests on your model.

```python
# Put the values of the input and output cost per token
# These values can be updated using a function above that is triggered by a change in the parameters
# Put default values accordingly to the default parameters

self.input_cost_per_token = gr.Number(0.1, visible=False, label="($) Price/1K input prompt tokens", interactive=False)

self.output_cost_per_token = gr.Number(0.2, visible=False, label="($) Price/1K output prompt tokens", interactive=False)
```
Then, if the user can modify some parameters using the Gradio components mentioned above, you’ll have to update the values influenced by this. 
This is why you need to create an update function that has the changing parameter(s) for input(s) and outputs the correct value. 
In the test example, the parameter only influences the cost/token but it could be another parameter whose choices depend on the value of the former one. 

```python
def on_model_parameter_change(model_parameter):
  if model_parameter == "Option 1":
    input_tokens_cost_per_token = 0.1
    output_tokens_cost_per_token = 0.2
  else:
    input_tokens_cost_per_token = 0.2
    output_tokens_cost_per_token = 0.4
  return input_tokens_cost_per_token, output_tokens_cost_per_token
```

Don’t forget to add the triggering event that calls the update function when a Gradio parameter is changed. 
Note that the inputs and outputs can vary depending on the update function. 

```python
self.model_parameter.change(on_model_parameter_change, inputs=self.model_parameter, outputs=[self.input_cost_per_token, self.output_cost_per_token])
```

The last element of the render function you have to implement is the labor cost parameter. It provides an estimation of how much it would cost to have engineers deploy the model. Note that for a SaaS solution, we consider this cost is be 0 as it requires no engineer to deploy and maintain the model. 

```python
self.labor = gr.Number(0, visible=False, label="($) Labor cost per month", info="This is an estimate of the labor cost of the AI engineer in charge of deploying the model", interactive=True)
```

Lastly, it's important to create a `compute_cost_per_token` function that's essential for computing the cost/request of your AI model service. This function should have the same input and output parameters as in the example below for the whole calculator to work. 

Additionally, if any conversion is required on the input or output cost/token, you can perform them in this function.

```python
def compute_cost_per_token(self, input_cost_per_token, output_cost_per_token, labor):
  #Additional computation on your cost_per_token values
  #You often need to convert some values here
  return input_cost_per_token, output_cost_per_token, labor
```

Once your model class is ready, you’ll have to **add it to the `models.py` file** in our TCO Calculator’s [Hugging Face repository](https://huggingface.co/spaces/mithril-security/TCO_calculator/tree/main). 

## Update the app.py file
For the user to be able to select your AI model service option in the Calculator, you still have one step to go. 

In the following code line of the `app.py` file (line 93), you’ll have to **add the name of your model class** as follows:
```python
Models: list[models.BaseTCOModel] = [models.OpenAIModelGPT4,...,  models.NewModel]
```

## Example: How we wrote OpenAI GPT3.5 Turbo TCO

Let’s break down how we followed the previous steps to implement the GPT3.5 Turbo model class. 

To get the values for the input and output cost/token, we went to OpenAI’s [pricing web page](https://openai.com/pricing). We spotted which were their options, and which parameters could users change that would influence the costs.

```python
class OpenAIModelGPT3_5(BaseTCOModel):
  def __init__(self):
    self.set_name("(SaaS) OpenAI GPT3.5 Turbo")
    #Average latency value for GPT3.5 Turbo
    self.set_latency("5s") 
    super().__init__()
```
Let’s implement the `render` function.
For this model, OpenAI offers two different options for the context of the model, as in the amount of tokens considered by the model when processing text. Then, the user can choose between both using the dropdown below. 
Note that the visibility is set to False!

```python
def render(self):
  self.context_length = gr.Dropdown(choices=["4K", "16K"], value="4K", visible=False, interactive=True, label="Context size", info="Number of tokens the model considers when processing text")
```

Then, we have to create the input and output cost/token components that will be necessary to compute the cost/request. 
Furthermore, we added a [Markdown](https://www.gradio.app/docs/markdown) component to provide information on the values we put for these cost/token (this Gradio component supports HTML). 

```python
self.input_tokens_cost_per_token = gr.Number(0.0015, visible=False, label="($) Price/1K input prompt tokens", interactive=False)
self.output_tokens_cost_per_token = gr.Number(0.002, visible=False, label="($) Price/1K output prompt tokens", interactive=False)
self.info = gr.Markdown("The cost per input and output tokens values are from OpenAI's [pricing web page](https://openai.com/pricing)", interactive=False, visible=False)
```
Then, depending on the user selection for the context, the input and output cost/token are impacted. So we have to create an update function to implement their changes. 

```python
def define_cost_per_token(context_length):
  if context_length == "4K":
    cost_per_1k_input_tokens = 0.0015
    cost_per_1k_output_tokens = 0.002
  else:
    cost_per_1k_input_tokens = 0.003
    cost_per_1k_output_tokens = 0.004
  return cost_per_1k_input_tokens, cost_per_1k_output_tokens
```

This function is called when the user changes the value of `context_length`. 

```python
self.context_length.change(define_cost_per_token, inputs=self.context_length, outputs=[self.input_tokens_cost_per_token, self.output_tokens_cost_per_token])
```

Then, the last part of the `render` function is about the labor cost. For a SaaS solution like OpenAI’s GPT3.5 Turbo, no engineer is required to deploy and maintain the model so this cost is null. 

```python
self.labor = gr.Number(0, visible=False, label="($) Labor cost per month", info="This is an estimate of the labor cost of the AI engineer in charge of deploying the model", interactive=True)
```

Eventually, we have to implement the `compute_cost_per_token` function. Because OpenAI’s pricing is given in dollars per 1000 tokens, a little conversion is needed. 

```python
def compute_cost_per_token(self, input_tokens_cost_per_token, output_tokens_cost_per_token, labor):
  cost_per_input_token = (input_tokens_cost_per_token / 1000)
  cost_per_output_token = (output_tokens_cost_per_token / 1000)
  return cost_per_input_token, cost_per_output_token, labor
```

Then, we added this model class to the models.py [file](https://huggingface.co/spaces/mithril-security/TCO_calculator/blob/main/models.py) in the TCO Calculator repository, and added its name `OpenAIModelGPT3_5` in the list of models in the app.py [file](https://huggingface.co/spaces/mithril-security/TCO_calculator/blob/main/app.py).