Date
Share
Thibault Clérice

Thibault Clérice

INRIA

Holding a PhD in Classical Studies from the University of Lyon 3, I worked as an engineer at King’s College London and was a master’s director at the École nationale des chartes before joining the ALMAnaCH team at the Inria center in Paris in 2023, where I am now a permanent researcher. I am a founding member and co-editor of HTR United, an international catalog for training AI in handwriting recognition.

CATMuS Medieval: The Importance of Automatic Transcription of Medieval Manuscripts for Computer Vision

The performance of text recognition models, whether OCR or HTR, is improving each year, with a recent acceleration thanks to hybrid models that combine vision and language models. These systems are typically based on well-controlled data: the languages covered have few variations, and efficient post-correction mechanisms can be applied thanks to language models. However, when it comes to medieval manuscripts, the situation becomes significantly more complex. These documents exhibit much more heterogeneous characteristics, which challenge current models that rely heavily on their ability to understand standardized and regular language.