Using Large Language Models for Evaluation: Opportunities and Limitations
Speaker: Prof. Emine Yilmaz
Time: 15:00 (BST), May 27, 2026.
Abstract
Large Language Models (LLMs) have shown significant promise as tools for automated evaluation across diverse domains. While the use of LLMs for evaluation offers substantial advantages—potentially reducing reliance on costly and subjective human assessments—the adoption of LLM-based evaluation is not without challenges. In this talk, we discuss both the transformative potential and the inherent limitations of using LLMs for evaluation tasks. In particular, we highlight challenges such as bias and variability in judgments. We also explore how LLMs can augment traditional evaluation practices while emphasizing the need for a cautious and informed approach to their use.
Our Speaker
Emine Yilmaz is a Professor and ELLIS Fellow at University College London, Department of Computer Science. At UCL she is one of the faculty members affiliated with the UCL Centre for Artificial Intelligence, where she leads the Web Intelligence Group. She also works as an Amazon Scholar for Amazon Alexa. Professor Yilmaz’s research interests lie in the fields of information retrieval and natural language processing. She has received several awards for her research, including the Karen Sparck Jones Award, a Google Faculty Research Award and a Bloomberg Data Science Research Award. Her research has been funded by several funding bodies including EU Horizon 2020, EPSRC, Alan Turing Institute, Google, Bloomberg and Elsevier.