TWed Talk: Danielle Villa on "Explanation Cross-Examination: Testing Faithfulness of Language Model-Generated Explanations" (25 Sep 2024)

Posted September 20, 2024

4p, Winslow 1140 (Pizza arrives at 3:30p)

WHAT: Danielle Villa on "Explanation Cross-Examination: Testing Faithfulness of Language Model-Generated Explanations"
WHEN: 4p, Weds, 25 Sep (NEW TIME!)
VIDEO: https://youtu.be/U7qwedwKvAo
EVENT PAGE: https://bit.ly/3ZsPI0R

Please join us as TWC's Danielle Villa leads us in a discussion of her fascinating work developing a framework for evaluating the faithfulness of explainations generated by language models.

DESCRIPTION: Language models (LMs) are often prompted to explain their outputs for increased accuracy and transparency. However, evidence shows that important factors that influence LM outputs are not always included in LM-generated explanations. For this reason, measuring the faithfulness of LM-generated explanations has emerged as an important problem. Existing solutions tend to focus on global faithfulness, i.e. the general tendency of a model to produce unfaithful explanations. In contrast, this talk discusses a follow-up question generating framework for measuring local faithfulness, i.e. the faithfulness of individual explanations. Our framework consists uses a cross-examiner model, which is responsible for probing the target model's explanations via targeted follow-up questions.

BIO: Danielle is a 3rd year PhD student under Deborah McGuinness studying how semantic technologies can be used to improve LLMs, with a focus on improving evaluation metrics using knowledge graphs. Her current work is on a process for generating counterfactuals to question answering datasets using knowledge graphs to evaluate the faithfulness of LLM-generated explanations. The explanation cross-examiner was developed this previous summer while she was at IBM and a paper discussing it is currently under review at AAAI.

Search

TWed Talk: Danielle Villa on "Explanation Cross-Examination: Testing Faithfulness of Language Model-Generated Explanations" (25 Sep 2024)