In a time where archives are being lost and hard disks corrupted, it is necessary to preserve and let the public navigate easily through period documentation using modern technologies. The Good Old Manuals project is designed to research and document the market gardening practices from the 19th century.
In May 2023, the CENTRINNO team in Paris organized a workshop dedicated to the French Method, a set of market gardening practices developed by Parisian professionals in the 19th century. The focus of this event was on making the extensive knowledge contained in old market gardening manuals accessible to contemporary audiences. This effort began in 2017 when our laboratory, Sony CSL – Paris started collecting and digitizing these historical texts. Employing advanced techniques such as computer vision, text mining, and natural language processing (NLP), we have been able to extract and clean the data from these documents.
Our digitization process involves several steps. We detect image and text zones on each page. The text zones are then converted into strings of characters using optical character recognition (OCR). We further refine these strings by using a Bayesian spell checker to correct OCR errors. So far, we have successfully processed a third of our “Good Old Manuals” corpus, resulting in approximately 1 million words of valuable content.
Our research involves also multiple tasks, including Named Entities Recognition (NER) to identify locations and individuals mentioned in the texts. We also perform semantic mapping of verbs to understand the various actions associated with gardening practices. For instance, NER helps us link mentions of locations to actual geographical places and individuals to their biographies. We manually annotate locations to distinguish between place names and varieties of fruits or vegetables, such as differentiating “Brussels” from “Brussels sprout.”
A significant achievement in our project is detecting causality frames within the texts. By identifying elements that trigger causal relationships (e.g., words like “causes” or “because”), we can link observations to biological phenomena. This allows us to uncover historical insights that align with modern agricultural knowledge.
We have also imagined visualizations such as maps showing the frequency of location mentions and social networks based on people mentioned in the corpus. These tools help us understand the geographic and social contexts of historical gardening practices.
Our interactive “Good Old Manuals” website is now live. This platform lets you navigate through the texts, maps, and graphs we have developed, providing a comprehensive and accessible way to explore this rich heritage. This project aims to preserve historical knowledge while making it available to a broad audience, demonstrating how old texts can be combined with modern computational techniques to create new learning experiences.
Visit our website to discover the world of 19th-century Parisian market gardening practices here: https://sonycslparis.github.io/gom-webapp/
The GOM Team on the right.
About – Centrinno
Project operated by nine cities and teams (Amsterdam, Barcelona, Blönduos, Copenhagen, Geneva, Milan, Paris, Tallinn, Zagreb.) as part of the European Commission’s Horizon 2020 program. The Paris team is coordinated by Fab City Grand Paris and co-piloted by Vergers Urbains, WOMA, Ars Longa, and in partnership with Volumes and Sony Computer Science Laboratories – Paris.
Centrinno website: https://centrinno.eu/
Links and articles:
- GOM website: https://sonycslparis.github.io/gom-webapp/
- Computational Agroecology: Should we bet the microfarm on it?: https://limits.pubpub.org/pub/comput/release/1
- The discourse of the French Method on Medium: https://medium.com/@koddavid/the-discourse-of-the-french-method-how-to-make-accessible-the-knowledge-encapsulated-in-old-e566456106c5