Lara Verheyen

Lara Verheyen

Vrije Universiteit Brussel

Lara Verheyen studied the Master of Linguistics at KU Leuven (Belgium). After graduation in 2018, she completed an Advanced Master in Artificial Intelligence with a specialization in Speech and Language Technology. Currently, Lara is a PhD student at the Artificial Intelligence Laboratory at the Vrije Universiteit Brussel (VUB) under the supervision of Prof. Dr. Katrien Beuls. Her PhD research focusses on building truly intelligent systems that interact with humans about their shared environment. Specifically, the goal of these systems is to hold coherent and meaningful conversations with humans. To achieve this, these systems build up knowledge during the conversation and ground the conversation in the shared environment and the acquired knowledge. Lara is particularly interested in operationalizing these systems through a hybrid approach that combines symbolic and subsymbolic techniques.

A hybrid visual dialog agent

Holding a coherent, meaningful and multi-turn conversation with a human interlocutor is one of the main challenges of current intelligent agents. Especially when conversations span multiple turns, agents lack the capabilities to remember what has been said and to ground their answers in the conversational context.If we want truly intelligent agents that can communicate with humans about their environment, these agents need to possess certain cognitive capabilities. They must be able to perceive and categorize the world, to understand and produce utterances and possess sufficient reasoning skills to integrate these sources of information.In this talk, I present a novel methodology that allows an intelligent agent to hold multi-turn, coherent conversations with humans. Concretely, the agent maps utterances to a representation of their meaning. This semantic representation consists of the reasoning operations that are required to understand the utterance in terms of the environment and the discourse context. These reasoning operations are executed in a hybrid way; those related to discourse understanding are executed symbolically, whereas those that interact with the environment are executed subsymbolically. To keep track of what has been said, the agent possesses over a conversation memory, which is a representation of the conversational context. After each turn in the conversation, the conversation memory is updated with necessary information.The proposed intelligent agent is validated through the task of visual dialog. The visual dialog task consists of modelling an agent that can answer a series of questions about an image. The agent requires both the image and the conversational context to answer these questions correctly. Applied to two benchmark datasets, namely MNIST dialog and CLEVR dialog, the agent achieves an accuracy of 97.18% and 95.94%, respectively. The methodology proposed in this talk paves the way for intelligent agents that hold coherent and multi-turn conversations with humans. Moreover, the applied technologies ensure that the system is explainable and interpretable by design.