FOCI GenAI/LLM Users Group: "Bias in Bias Evaluation: The Need for Multifactor Bias Benchmarking of LLMs" (21 Feb)

Posted February 16, 2024
DALL-E the GenAI Sheep
FOCI GenAI/LLM Users Group
6p Weds, 21 Feb (pizza at 5:30)
Amos Eaton 214

WHAT: "The Need for Multifactor Bias Benchmarking of LLMs "
LEADER: Hannah Powers
WHEN: 6p, 21 Feb 
CONTACT: Aaron Green <>

DESCRIPTION: LLMs have shown a capacity for producing toxic and biased responses to even innocuous prompts. Bias benchmarks exist to evaluate models for trustworthiness and identify at-risk subgroups from model responses. However, these benchmarks exhibit gaps that bias the analysis. Furthermore, existing analyses lack the statistical foundations needed to make definitive conclusions about the model's biases. We propose a method of identifying gaps in existing benchmarks and a multi-factor bias analysis of LLMs to identify key factors behind model behavior. 

BIO: Hannah Powers is a second year PhD student in the computer science department at RPI. Her current research involves evaluating the trustworthiness of large language models. She has an interest in ethical machine learning and artificial intelligence.

Recordings of previous FOCI GenAI Users Group sessions:

Remote video URL