Will Shockey
One of the hardest things to do in natural language understanding is multi-hop question answering (QA). Multi-hop QA is different from single-hop QA because it needs systems to combine information from many sources and steps of reasoning. This makes it a good place to test whether models can do structured inference instead of just matching patterns. In multi-hop QA, it is very important to be accurate in your reasoning because a mistake in one step can cause the whole chain of reasoning to break down, leading to wrong final answers even when there is some evidence. As knowledge-intensive applications grow in areas like biomedical research, enterprise knowledge management, and commonsense reasoning, the ability to do strong multi-hop QA is becoming a standard for real machine reasoning.
Even though large language models (LLMs) are very powerful, there are still some problems that make them less reliable for multi-hop QA. Hallucination is a big problem. When LLMs don't have clear evidence to back up their answers, they often give fluent but wrong answers. Knowledge bases (KBs) are structured collections of facts, but they aren’t complete because they don't cover everything and the quality of the information is inconsistent. Because of this incompleteness, only a limited number of questions can be answered correctly using only KBs. Also, LLMs have a hard time with reasoning in long contexts. When the number of hops goes up, models have to keep track of long chains of entities and relations, which often goes beyond the effective context window of current architectures. These problems show that we need methods that combine the generative flexibility of LLMs with ways to ground facts and make structured inferences.
Knowledge graphs (KGs) offer a potential way to get around these problems. KGs provide a structured base for reasoning that is both understandable and verifiable by showing information as entities linked by typed relations. When used with LLMs, KGs can help with retrieval, limit decoding, and give clear evidence paths that make hallucination less likely. They also allow for compositional reasoning, which means that multi-hop queries can be broken down into sequences of graph traversals. In this way, KGs serve as grounding mechanisms, anchoring generative models in factual structure while still allowing them to produce natural language outputs. Combining KGs and LLMs is therefore a key step toward improving robust, knowledge-intensive QA by bringing together symbolic accuracy and neural flexibility.
Links:
- Final paper: https://drive.google.com/file/d/1noTnfxJJhe-HZ-B7POwXvwlXR9OAL9dZ/
- Final presentation (slides): https://docs.google.com/presentation/d/1gCNxDXi2_41-ka1ppfoRnCeM03THKRh9/
- Final presentation (video): https://youtu.be/Ci0gX6BmpiE