Contribute to our AI TCO Calculator
Presentation
The TCO Calculator’s purpose is to assist users in comparing the deployment Total Cost of Ownership (TCO) of different AI consumption, from deploying an open-source model like Llama 2 on one's internal infrastructure, to consuming a SaaS AI like OpenAI.
To do so, it computes the cost/request of said service and adds a labor cost to get a comprehensive estimate of how much the set-up of these services would cost.
Here is the formula used to compute the cost/request of an AI model service:
with:
- = Cost per Request
- = Cost per 1000 Input Tokens
- = Cost per 1000 Output Tokens
- = Input Tokens
- = Output Tokens
For instance, imagine we want to evaluate the cost of a summarization request. We will assume here that the number of input tokens is 500
, and that the output is 200
tokens.
If we use OpenAI GPT3.5 pricing, prices are $0.0015
per 1k input tokens, and $0.002
per 1k output tokens.
Therefore the formula gives us:
so CR = $0.00115, aka $1.15
per 1000
requests.
Contributing
To contribute, you’ll have to provide the value of the input and output cost/token.
If you want to add your own service to this Gradio application, you’ll have to follow two main steps:
- Create a class for your model in our
models.py
file. - Add the name of your model class in our
app.py
file.
Create a class for your model
Step by step, we’ll see how you can create a model class for your own service that can later be an option of our TCO Calculator. We’ll use a basic frame that you can find here in our calculator’s repository.
First, you need to create a class for your model and set basic information such as the name of your service and the latency of your model.
# The name of your new model service's class
class NewModel(BaseTCOModel):
def __init__(self):
# Name of the AI model service and the category it belongs to (SaaS, Open source)
self.set_name("(Category) Service name")
self.set_latency("The average latency of your model")
super().__init__()
Then, you’ll have to create the core function of your model page, the render
function. Its first elements will be the Gradio components that you want to put in your model page.
It can be a Dropdown with multiple choices for the user to make or a Textbox with information about the computation parameters the user has to know (for instance, the cost of the hardware set-up used). There can be multiple of them.
All components’ visibility must be set to False
.
def render(self):
# Create as many Gradio components as you want to provide information or customization to the user
# Put all their visibility to False at the beginning
# Don't forget to put the interactive parameter of the component to False if the value is fixed
self.model_parameter = gr.Dropdown(["Option 1", "Option 2"], value="Option 1", visible=False, interactive=True, label="Title for this parameter", info="Add some information to clarify specific aspects of your parameter")
Then, still in the render
function, you must instantiate your input and output cost/token. They are the key values needed to compute the cost/request of your AI model service.
Note that the user can’t interact with these since they are the values you’ll have to provide from benchmark tests on your model.
# Put the values of the input and output cost per token
# These values can be updated using a function above that is triggered by a change in the parameters
# Put default values accordingly to the default parameters
self.input_cost_per_token = gr.Number(0.1, visible=False, label="($) Price/1K input prompt tokens", interactive=False)
self.output_cost_per_token = gr.Number(0.2, visible=False, label="($) Price/1K output prompt tokens", interactive=False)
Then, if the user can modify some parameters using the Gradio components mentioned above, you’ll have to update the values influenced by this. This is why you need to create an update function that has the changing parameter(s) for input(s) and outputs the correct value. In the test example, the parameter only influences the cost/token but it could be another parameter whose choices depend on the value of the former one.
def on_model_parameter_change(model_parameter):
if model_parameter == "Option 1":
input_tokens_cost_per_token = 0.1
output_tokens_cost_per_token = 0.2
else:
input_tokens_cost_per_token = 0.2
output_tokens_cost_per_token = 0.4
return input_tokens_cost_per_token, output_tokens_cost_per_token
Don’t forget to add the triggering event that calls the update function when a Gradio parameter is changed. Note that the inputs and outputs can vary depending on the update function.
self.model_parameter.change(on_model_parameter_change, inputs=self.model_parameter, outputs=[self.input_cost_per_token, self.output_cost_per_token])
The last element of the render function you have to implement is the labor cost parameter. It provides an estimation of how much it would cost to have engineers deploy the model. Note that for a SaaS solution, we consider this cost is be 0 as it requires no engineer to deploy and maintain the model.
self.labor = gr.Number(0, visible=False, label="($) Labor cost per month", info="This is an estimate of the labor cost of the AI engineer in charge of deploying the model", interactive=True)
Lastly, it's important to create a compute_cost_per_token
function that's essential for computing the cost/request of your AI model service. This function should have the same input and output parameters as in the example below for the whole calculator to work.
Additionally, if any conversion is required on the input or output cost/token, you can perform them in this function.
def compute_cost_per_token(self, input_cost_per_token, output_cost_per_token, labor):
#Additional computation on your cost_per_token values
#You often need to convert some values here
return input_cost_per_token, output_cost_per_token, labor
Once your model class is ready, you’ll have to add it to the models.py
file in our TCO Calculator’s Hugging Face repository.
Update the app.py file
For the user to be able to select your AI model service option in the Calculator, you still have one step to go.
In the following code line of the app.py
file (line 93), you’ll have to add the name of your model class as follows:
Models: list[models.BaseTCOModel] = [models.OpenAIModelGPT4,..., models.NewModel]
Example: How we wrote OpenAI GPT3.5 Turbo TCO
Let’s break down how we followed the previous steps to implement the GPT3.5 Turbo model class.
To get the values for the input and output cost/token, we went to OpenAI’s pricing web page. We spotted which were their options, and which parameters could users change that would influence the costs.
class OpenAIModelGPT3_5(BaseTCOModel):
def __init__(self):
self.set_name("(SaaS) OpenAI GPT3.5 Turbo")
#Average latency value for GPT3.5 Turbo
self.set_latency("5s")
super().__init__()
Let’s implement the render
function.
For this model, OpenAI offers two different options for the context of the model, as in the amount of tokens considered by the model when processing text. Then, the user can choose between both using the dropdown below.
Note that the visibility is set to False!
def render(self):
self.context_length = gr.Dropdown(choices=["4K", "16K"], value="4K", visible=False, interactive=True, label="Context size", info="Number of tokens the model considers when processing text")
Then, we have to create the input and output cost/token components that will be necessary to compute the cost/request. Furthermore, we added a Markdown component to provide information on the values we put for these cost/token (this Gradio component supports HTML).
self.input_tokens_cost_per_token = gr.Number(0.0015, visible=False, label="($) Price/1K input prompt tokens", interactive=False)
self.output_tokens_cost_per_token = gr.Number(0.002, visible=False, label="($) Price/1K output prompt tokens", interactive=False)
self.info = gr.Markdown("The cost per input and output tokens values are from OpenAI's [pricing web page](https://openai.com/pricing)", interactive=False, visible=False)
Then, depending on the user selection for the context, the input and output cost/token are impacted. So we have to create an update function to implement their changes.
def define_cost_per_token(context_length):
if context_length == "4K":
cost_per_1k_input_tokens = 0.0015
cost_per_1k_output_tokens = 0.002
else:
cost_per_1k_input_tokens = 0.003
cost_per_1k_output_tokens = 0.004
return cost_per_1k_input_tokens, cost_per_1k_output_tokens
This function is called when the user changes the value of context_length
.
self.context_length.change(define_cost_per_token, inputs=self.context_length, outputs=[self.input_tokens_cost_per_token, self.output_tokens_cost_per_token])
Then, the last part of the render
function is about the labor cost. For a SaaS solution like OpenAI’s GPT3.5 Turbo, no engineer is required to deploy and maintain the model so this cost is null.
self.labor = gr.Number(0, visible=False, label="($) Labor cost per month", info="This is an estimate of the labor cost of the AI engineer in charge of deploying the model", interactive=True)
Eventually, we have to implement the compute_cost_per_token
function. Because OpenAI’s pricing is given in dollars per 1000 tokens, a little conversion is needed.
def compute_cost_per_token(self, input_tokens_cost_per_token, output_tokens_cost_per_token, labor):
cost_per_input_token = (input_tokens_cost_per_token / 1000)
cost_per_output_token = (output_tokens_cost_per_token / 1000)
return cost_per_input_token, cost_per_output_token, labor
Then, we added this model class to the models.py file in the TCO Calculator repository, and added its name OpenAIModelGPT3_5
in the list of models in the app.py file.