EMNLP 2025 Main Conference Oral | Suzhou, China, November 4-9, 2025 | Resource and Theme Paper Award nominations (Top 1%)

ESGenius: Benchmarking LLMs on Environmental, Social, and Governance (ESG) and Sustainability Knowledge

Chaoyue He1 Xin Zhou1,* Yi Wu1 Xinjia Yu1 Yan Zhang1 Lei Zhang1 Di Wang1 Shengfei Lyu1 Hong Xu1 Xiaoqiao Wang2 Wei Liu2 Chunyan Miao1

1 Alibaba-NTU Global e-Sustainability CorpLab (ANGEL), Singapore; 2 Alibaba Group, China

* Corresponding author

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

An expert, source-grounded benchmark for evaluating whether LLMs understand sustainability reporting, climate disclosure, governance, and standards-driven ESG reasoning.

Main ESGenius benchmark results
ESGenius evaluates 50 language models across 1,136 expert ESG and sustainability questions, with source-grounded references and question-level inspection support.

Abstract

Large language models are increasingly used for sustainability reporting, climate disclosure, and ESG analysis, yet their knowledge of specialized standards and source-dependent concepts remains difficult to evaluate systematically. ESGenius provides a 1,136-question multiple-choice benchmark covering environmental, social, governance, and sustainability knowledge across major reporting and climate frameworks. Each question follows an A-D answer protocol with a Z option for uncertainty, and the reference version includes source document metadata and supporting excerpts for audit or retrieval-augmented evaluation. The repository includes dataset files, evaluation scripts, published result figures, and an interactive 50-model heatmap for model-question diagnostics.

Benchmark Overview

1,136expert ESG questions
50evaluated models
7framework families
A-D + Zanswer protocol

Standards-aware scope

Covers sustainability reporting, climate disclosure, biodiversity, energy, governance, and ESG reasoning across IPCC, GRI, SASB, ISO, IFRS/ISSB, TCFD, and CDP sources.

Source-grounded references

The reference CSV preserves document names, page references, and supporting text snippets so answers can be audited or used in retrieval-aware experiments.

Diagnostic evaluation

Published figures summarize aggregate performance, while the full heatmap exposes per-question outcomes, invalid outputs, uncertainty, and missing responses.

Dataset

The canonical public dataset release is hosted on Hugging Face, with local mirrors retained in this GitHub repository. It includes plain CSV/JSON files for standard evaluation and a reference-aware CSV for source-grounded inspection or retrieval experiments.

query_idStable question identifier
queryQuestion stem
A-DAnswer options
ZNot sure option
ref_docSource document in reference file
source_textSupporting excerpt in reference file

Results

A compact ranking view summarizes the 50-model evaluation. Detailed model-question behavior is available in the full interactive Plotly report.

Interactive Heatmap

Inspect every model-question outcome.

The full report covers 50 evaluated models across 1,136 ESGenius questions, sorted by model rank and question difficulty for fast error-pattern analysis.

Correct Wrong Invalid Not sure Missing
Open full Plotly heatmap
50 models ranked top to bottom Hardest questions Easiest questions

Citation

If you use ESGenius, please cite the EMNLP 2025 paper and repository metadata.

BibTeX

@inproceedings{he-etal-2025-esgenius,
  title = "{ESG}enius: Benchmarking {LLM}s on Environmental, Social, and Governance ({ESG}) and Sustainability Knowledge",
  author = "He, Chaoyue and Zhou, Xin and Wu, Yi and Yu, Xinjia and Zhang, Yan and Zhang, Lei and Wang, Di and Lyu, Shengfei and Xu, Hong and Xiaoqiao, Wang and Liu, Wei and Miao, Chunyan",
  editor = "Christodoulopoulos, Christos and Chakraborty, Tanmoy and Rose, Carolyn and Peng, Violet",
  booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
  month = nov,
  year = "2025",
  address = "Suzhou, China",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2025.emnlp-main.739/",
  doi = "10.18653/v1/2025.emnlp-main.739",
  pages = "14612--14653",
  ISBN = "979-8-89176-332-6"
}