Are AI Agents Sustainable? It depends

Community Article Published April 7, 2025

Upvote

sasha Sasha Luccioni

BrigitteTousi Brigitte Tousignant

yjernite Yacine Jernite

Building upon the interest garnered by our recent blog post about AI agents and ethics, this post follows up with an exploration of another important aspect of agentic development: sustainability. How can we get a better idea of agents’ environmental impacts?

TL;DR

Like for any new technology, there are ways and ways to build AI agents, and design choices matter. Choosing whether to adopt a brute-force approach relying on scale and commercial black box models, versus a more modular approach using open-source frameworks like HF's smolagents to connect task-specific models can have far-reaching effects in terms of cost and energy (see a recent paper on this topic). To give a sense of scale, the cost difference between these two options can be several orders of magnitude, especially now that we see small models (1B-olympicCoder) match the performance of the very largest LLMs (Claude Sonnet 3.5) and increasingly more complex tasks. So finding the right balance through informed choices is necessary to make the technology more reliable, cost-efficient, and sustainable.

Introduction

Lately, AI Agents have emerged as the hot new paradigm for developing and interacting with AI tools – from writing code from scratch to summarizing complex topics based on different sources, with the ability to make decisions and plan without explicit human feedback. Under the hood, an AI agent can use all sorts of AI models – be it any of the over a million public open-source models on the HF Hub, a proprietary model queried via an API, or a combination of the two.

But are AI Agents more sustainable than other kinds of AI systems? As with most environmental questions, it depends on a variety of factors. This blog post covers the three main ones: 1) the type of model used, 2) the modality, and 3) how we are making decisions about which systems to use. At the end of the day, what matters is not whether an AI system is an agent or not, but the type of approach that it leverages under the hood.

The right model type for the right (agent) task

Training and deploying AI models requires using computational resources, either from specialized hardware like GPUs and TPUs, or more generic hardware like CPUs. Powering that hardware – as well as the underlying infrastructure like networking, storage and data transfer – needs substantial energy. For instance, an NVIDIA A100 GPU uses 300W of energy when used at full capacity, which is roughly 40 times more than an LED light bulb. Tools like Code Carbon – which is integrated into the Transformers library – enable users to estimate how much energy a given AI model is using in real-time, both at training and inference time.

In “Power Hungry Processing”, a study we carried out last year, we used Code Carbon to measure the energy consumption of AI models for a variety of different tasks, comparing models of different sizes, architectures and paradigms. Comparing task-specific models (i.e. ones that were trained to carry out only a single task, such as question answering or sentiment analysis) versus multi-purpose models (which were able to carry out multiple tasks), we found that for a task such as question answering, for instance finding the capital of Canada in a Wikipedia article about the country, generative multi-purpose models consumed over 30 times more energy compared to extractive, task-specific models.


Figure taken from Luccioni et al. (2024)

For tasks such as summarizations, the gap was smaller, with a difference of 5-7 times between task-specific and multi-purpose models. The exact amount of energy used varied depending on the size of the model, the architecture and the length of model input and output, but the pattern remained: generative, task-specific models (also called ‘foundation models’ or ‘general-purpose models’) use more energy and more compute for tasks compared to task-specific models.

AI Agents, unlike LLMs writ large, are built to carry out specific tasks – be it helping a customer exchange an item purchased online, or analyzing years of financial data to glean insights and generate a report. The types of models these agents leverage to accomplish these tasks can have a huge impact on their sustainability. For instance, the customer service agent can, under the hood, leverage a smaller model that is only able to convert natural language to SQL code (e.g. SQLCoder-7b-2), or a generic LLM that is able to carry out a variety of text-based tasks including code generation (e.g. QwQ-32B). Depending on the size and architecture of the model, this can translate into a major difference in energy efficiency.

The AI Energy Score project, a Hugging Face-led initiative, aims to benchmark and compare the energy efficiency of AI models for a variety of different tasks. We have already evaluated hundreds of open-source models (and presented these results on our leaderboard, which can help guide developers to pick models with efficiency in mind when developing agentic systems for a variety of different tasks.

Modality matters

While the term ‘large language model’ has persisted, many current AI models that we refer to as LLMs are actually multi-modal, taking anything from images to video and audio as input and generating textual transcriptions, descriptions, and answers as output. Agents are liable to do so as well, given the diversity of sources of information that they can exploit in order to carry out their tasks. But both the input and output modalities have an impact on the energy usage of AI models and are liable to have a similar impact for agents as well.

Tasks that map both image and text inputs to categories of outputs (e.g. classifying an image based on a set of categories, or a tweet as having positive or negative sentiment) are less energy-intensive than those that generate either text or images from scratch. We also discovered that for both categories, image-based tasks are more energy-intensive than text-based ones, by a factor of 3 for categorical tasks and a factor of 60 (!) for generative ones. What this means in an agentic setting is that, multiplied by the millions of queries that agents can handle on a daily basis, if they are manipulating images and generating new content, that this can quickly add up.


Figure taken from Luccioni et al. (2024)

We also found that the length of text generated also impacts energy usage: for sequence-to-sequence tasks like summarization, the more tokens are generated, the more energy is used, all other things being equal. Since many ‘agentic’ tasks are actually sequence-to-sequence – like translating input queries into programming code, or summarizing large amounts of reports into shorter summaries, it’s important to think about how all of this generated text will add up over the course of coming months and years. Also, as more visual agents are developed to interact directly with user interfaces, it will be important to add new tasks and set ups to reflect these new use cases.

Choosing a more sustainable way forward

As with any field, the world of AI is dominated by trends – technologies that were considered state-of-the-art a few years ago (who remembers word2vec?) are replaced by newer generations of technologies (BERT and its derivatives), which, in due course, also give way to the next wave of popular models (the Transformers of today). However, in terms of energy, the vector-based models of 10 years ago were vastly more efficient than current generations of LLaMas and GPTs, which is why it is important to track the energy demands of the field over time. In a recent paper co-authored with collaborators from both industry and academia, we discussed the pursuit of ‘bigger-is-better’ approaches in AI and explored the potential environmental and societal impacts that this can have, from a ballooning environmental footprint to a concentration of power that favors bigger corporate players as opposed to researchers from academia and startups.

In the aforementioned paper, we reflect upon the trade-offs between task performance and efficiency (or resource usage), and call upon the AI community to consider both when developing and deploying models, pursuing a ‘Pareto frontier’ to guide sustainable progress:


Figure taken from Varoquaux et al (2024)

This pursuit includes assigning value to research on smaller systems, reporting the size and cost of models, and using AI approaches that correspond to the problem being solved. In the context of agents, that can mean both using smaller, more efficient models as well as creating streamlined systems that rely upon several interconnected components - e.g. an intent classifier coupled with several task-specific models that are activated based on user queries. Also, while it may be harder to evaluate the energy requirements of agentic systems compared to previous generations of more monolithic AI models, using tools like Code Carbon to evaluate the different system components and communicating this information to users can help guide informed decision-making and deployment of AI.

We also address the fact that open-source models offer unique environmental transparency, allowing precise energy consumption tracking while being deployed locally across different hardware components. These measurements can be documented and shared in model cards, providing essential sustainability data to the AI community. Open-sourcing models can also help reduce the amount of compute needed to train and deploy custom models by enabling developers to build upon existing base models from the community. For deeper insights into measuring and disclosing environmental impacts of AI systems, explore our recent primer.

Conclusion

As we deploy more and more agentic AI systems in more and more applications, considering their energy requirements and environmental impacts is crucial to inform and guide sustainable development. While it’s impossible to say whether agents are more or less sustainable than previous generations of AI models, by taking into account the modality and specificity of the models that we use and the way in which we use them, we can guide progress in the right direction. The present blog post outlines some of the considerations and best practices to have when developing and deploying agentic AI systems with sustainability in mind, and we hope that it will help the AI community move forward together.

Acknolwedgements

Thank you Chun Te Lee for the amazing banner for this post!

Summary of linked projects and tools:

Summary of linked resources:

Citing this blog post:

If you want to cite this blog post, please use the following:

@misc{ai_agents_sustainaility_hf_blog,
  author    = {Sasha Luccioni and Brigitte Tousignant and Yacine Jernite},
  title     = {Are AI Agents Sustainable? It depends},
  booktitle = {Hugging Face Blog},
  year      = {2025},
  url       = {https://huggingface.co/blog/ai-agent-sustainability},
}

Community

maemst

1 day ago

•

edited 1 day ago

Isn't there a typo?
"...generative, task-specific models (also called ‘foundation models’ or ‘general-purpose models’)..."

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment

Upvote