Scaling Transformers:

Carbon Footprint as a performance metric for Ever-Larger Language Models (LLMs)

Written by Priscila Chaves

January 27, 2025

Scaling Transformers: Carbon Footprint as a performance metric for Ever-Larger Language Models (LLMs)

by Priscila Chaves

This document provides a summary of key ideas and insights from my research. It is not a comprehensive representation of the full study and may omit detailed analyses, methodologies, and supporting evidence for the sake of brevity. The findings and interpretations presented here have not undergone formal peer review and should be considered preliminary.
Readers are encouraged to approach the content with this context in mind and to seek further clarification or discussion if needed. For inquiries about the complete research or for collaboration opportunities, reach out to me directly.

Scaling Transformers: Weighing the Cost of Progress

The relentless drive to scale Large Language Models (LLMs) in artificial intelligence has ignited unprecedented innovation. Yet, beneath this progress lies an ethical dilemma: can we continue to expand these technologies without jeopardizing the planet? In this article, I explore this tension, critiquing the assumptions behind scaling (work of Kaplan et al) and advocating for carbon footprints as a core performance metric for AI development (work of Luccioni et al). Let’s unpack the findings, reflections, and the call to action from my research.

The Allure of Scale: A Response to Kaplan et al.

In their groundbreaking study on scaling laws for neural language models, Kaplan et al. (2020) introduced an empirically-driven framework that defines the relationship between performance, model size, dataset size, and compute power. Their findings are undoubtedly influential. By demonstrating that larger models trained on modest datasets can achieve significant sample efficiency, they provided a roadmap for scaling that has shaped the AI industry.

Yet, this work falls short in critical ways. It frames scale as the ultimate benchmark while sidestepping key ethical and environmental concerns. For instance, Kaplan et al. fail to address the implications of their "optimal compute-efficient training" on energy consumption, infrastructure costs, and carbon emissions. They suggest a “billion-fold increase in compute power” without grappling with what such exponential growth means for the planet.

Moreover, the study is steeped in technological determinism. It neglects to question who benefits from scaling LLMs and who bears the costs. With training bias, resource inequities, and energy demands left unexamined, the framework overlooks the broader impact of scaling on a strained and unequal world.

Counting the Costs: A Response to Luccioni et al.

In contrast, Luccioni et al. (2022) provide a refreshing lens by prioritizing environmental accountability in their study of BLOOM, a 176-billion-parameter multilingual language model. By measuring BLOOM’s carbon footprint across its lifecycle, from equipment manufacturing to deployment, the authors illuminate the hidden costs of scaling. Their findings are sobering: BLOOM’s training alone emitted approximately 50.5 metric tons of CO₂, and this figure doubles when factoring in embodied and idle emissions.

What makes their approach compelling is its transparency. Unlike Kaplan et al., Luccioni et al. confront the reality that "the cloud is terrestrial." They detail how factors like energy grid carbon intensity and infrastructure decisions dramatically influence emissions. For example, BLOOM’s emissions were significantly lower than comparable models like GPT-3 because it was trained on a French supercomputer powered by renewable energy.

Yet, even this study leaves room for improvement. While the authors call for standardized reporting and a lifecycle approach to emissions, they stop short of connecting these insights to LLM performance metrics. The question remains: can we optimize both performance and environmental impact?

Rethinking Scaling: A Path Forward

After analyzing these studies, I propose three actionable steps to balance the scaling of LLMs with environmental sustainability:

Reframe Carbon as a Metric of Performance
The AI community must embrace carbon footprint as a key performance indicator alongside accuracy. As Schwartz et al. (2020) argue, Green AI should prioritize efficiency and environmental accountability. Including emissions data in publications and benchmarks can create a culture where sustainability drives innovation.
Design with Limits
The industry’s obsession with unbounded scaling needs to be tempered. By tying compute budgets to carbon constraints, researchers can focus on optimizing energy use rather than defaulting to larger models. This approach may also encourage fine-tuning smaller, existing models rather than pursuing size as the sole path to progress.
Acknowledge the Grounded Cloud
Decisions about where and how LLMs are trained matter. By using data centers powered by renewable energy and investing in energy-efficient infrastructure, organizations can drastically reduce their carbon footprints. The environmental impact of AI must no longer be treated as an externality but as a central design consideration.

Scaling Smarter, Not Just Bigger

As AI continues to transform our world, we face a stark choice: scale responsibly or let progress come at the expense of our planet. My research highlights a critical gap in current frameworks, urging the industry to adopt carbon-conscious practices and redefine what “optimal” really means. Scaling smarter, not just bigger, isn’t a concession—it’s an ethical imperative.

The question we must ask ourselves isn’t just how far we can push the limits of AI, but whether those limits should include the health of the world we all share. The time to act is now.

Detailed citations from the original essay are included to emphasise the academic grounding of my arguments and call for further exploration of ethical AI design.
Bender, E., Mcmillan-Major, A., Shmitchell, S., Gebru, T., & Shmitchell, S.-G. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? https://doi.org/10.1145/3442188.3445922
Greene, T. (2023, May 3). Google DeepMind CEO Demis Hassabis says we may have AGI “in the next few years.” Cointelegraph. https://cointelegraph.com/news/google-deepmind-ceo-demis-hassabis-says-we-may-have-agi-in-the-next-few-years
Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T., Rutherford, E., Casas, D. de L., Hendricks, L. A., Welbl, J., Clark, A., Hennigan, T., Noland, E., Millican, K., Driessche, G. van den, Damoc, B., Guy, A., Osindero, S., Simonyan, K., Elsen, E., & Rae, J. W. (2022). Training Compute-Optimal Large Language Models. ArXiv:2203.15556 [Cs]. https://arxiv.org/abs/2203.15556
Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., & Amodei, D. (2020). Scaling Laws for Neural Language Models. ArXiv:2001.08361 [Cs, Stat]. https://arxiv.org/abs/2001.08361
Lee, J.-Y. ., Marotzke, J., Bala, G., Cao, L., Corti, S., Dunne, J. P., Engelbrecht, F., Fischer, E., Fyfe, J. C., Jones, C., Maycock, A., Mutemi, J., Ndiaye, O., Panickal, S., & Zhou, T. (2023). Future Global Climate: Scenario-based Projections and Near-term Information. In Intergovernmental Panel on Climate Change (IPCC) (Ed.), Cambridge University Press (pp. 553–672). Cambridge University Press. https://www.cambridge.org/core/books/climate-change-2021-the-physical-science-basis/future-global-climate-scenariobased-projections-and-nearterm-information/309359EDDCFABB031C078AE20CEE04FD
Luccioni, A. S., Viguier, S., & Ligozat, A.-L. (2022). Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model. https://doi.org/10.48550/arxiv.2211.02001
Nando de Freitas [@NandoDF]. (2022, May 14). Someone’s opinion article. My opinion: It’s all about scale now! The Game is Over! It’s about making these models bigger, safer, compute efficient, faster at sampling, smarter memory, more modalities, INNOVATIVE DATA, on/offline, … 1/N [Tweet]. Twitter. https://twitter.com/NandoDF/status/152539703632501964916
Patterson, D., Gonzalez, J., Le, Q., Liang, C., Munguia, L.-M., Rothchild, D., So, D., Texier, M., & Dean, J. (2021). Carbon Emissions and Large Neural Network Training. ArXiv:2104.10350 [Cs]. https://arxiv.org/abs/2104.10350
Schwartz, R., Dodge, J., Smith, N. A., & Etzioni, O. (2020). Green AI. Communications of the ACM, 63(12), 54–63. https://doi.org/10.1145/3381831
Truong, K. (2023, July 10). San Francisco AI Companies Got $11B in Funding So Far This Year. The San Francisco Standard. https://sfstandard.com/2023/07/10/san-francisco-companies-got-half-the-worlds-ai-funding-so-far-this-year/
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need. ArXiv.org. https://arxiv.org/abs/1706.0376217