Standards-aware scope
Covers sustainability reporting, climate disclosure, biodiversity, energy, governance, and ESG reasoning across IPCC, GRI, SASB, ISO, IFRS/ISSB, TCFD, and CDP sources.
EMNLP 2025 Main Conference Oral | Suzhou, China, November 4-9, 2025 | Resource and Theme Paper Award nominations (Top 1%)
An expert, source-grounded benchmark for evaluating whether LLMs understand sustainability reporting, climate disclosure, governance, and standards-driven ESG reasoning.
Large language models are increasingly used for sustainability reporting, climate disclosure, and ESG analysis, yet their knowledge of specialized standards and source-dependent concepts remains difficult to evaluate systematically. ESGenius provides a 1,136-question multiple-choice benchmark covering environmental, social, governance, and sustainability knowledge across major reporting and climate frameworks. Each question follows an A-D answer protocol with a Z option for uncertainty, and the reference version includes source document metadata and supporting excerpts for audit or retrieval-augmented evaluation. The repository includes dataset files, evaluation scripts, published result figures, and an interactive 50-model heatmap for model-question diagnostics.
Covers sustainability reporting, climate disclosure, biodiversity, energy, governance, and ESG reasoning across IPCC, GRI, SASB, ISO, IFRS/ISSB, TCFD, and CDP sources.
The reference CSV preserves document names, page references, and supporting text snippets so answers can be audited or used in retrieval-aware experiments.
Published figures summarize aggregate performance, while the full heatmap exposes per-question outcomes, invalid outputs, uncertainty, and missing responses.
The canonical public dataset release is hosted on Hugging Face, with local mirrors retained in this GitHub repository. It includes plain CSV/JSON files for standard evaluation and a reference-aware CSV for source-grounded inspection or retrieval experiments.
A compact ranking view summarizes the 50-model evaluation. Detailed model-question behavior is available in the full interactive Plotly report.
The full report covers 50 evaluated models across 1,136 ESGenius questions, sorted by model rank and question difficulty for fast error-pattern analysis.
If you use ESGenius, please cite the EMNLP 2025 paper and repository metadata.
@inproceedings{he-etal-2025-esgenius,
title = "{ESG}enius: Benchmarking {LLM}s on Environmental, Social, and Governance ({ESG}) and Sustainability Knowledge",
author = "He, Chaoyue and Zhou, Xin and Wu, Yi and Yu, Xinjia and Zhang, Yan and Zhang, Lei and Wang, Di and Lyu, Shengfei and Xu, Hong and Xiaoqiao, Wang and Liu, Wei and Miao, Chunyan",
editor = "Christodoulopoulos, Christos and Chakraborty, Tanmoy and Rose, Carolyn and Peng, Violet",
booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.emnlp-main.739/",
doi = "10.18653/v1/2025.emnlp-main.739",
pages = "14612--14653",
ISBN = "979-8-89176-332-6"
}
Everything needed to inspect, reproduce, and cite the benchmark.