MPG at DisCoTeX: Predicting Text Coherence by Tree-based Modelling of Linguistic Features

Abstract

Abstract

Automatic text coherence modelling plays a crucial role in natural language processing tasks, such as machine translation, summarisation, and question answering. Moreover, text coherence is fundamental to reading comprehension and readers' engagement, essential to a number of application domains. In this report, we report progress for the Assessing Discourse Coherence in Italian Texts task from EVALITA-23, whose goal is to address automatic coherence detection. The task was challenged by extracting linguistic features used to train a machine learning classifier, leading to minor improvement over the baseline. The feature importance analysis revealed semantic features' relevance, providing indications for future feature engineering and modelling efforts.