Annophis – Developing a multilingual, multimodal, and machine learning-driven annotation infrastructure for the analysis, creation, enrichment and cross-cultural comparison of historical formulaic text corpora (ANNOPHIS)

Developing a multilingual, multimodal, and machine learning-driven annotation infrastructure for the analysis, creation, enrichment and cross-cultural comparison of historical formulaic text
corpora (ANNOPHIS)

This project, funded by the Flemish Research Foundation, represents a collaboration between Ghent University and the University of Florida, aimed at developing an annotation platform for historical languages, specifically Ancient Greek and Latin. This platform will support manual annotation and validation of automatic annotations by integrating machine-learning models through APIs. Additionally, the platform will support multimodal annotation, including both texts and images.

The primary corpora we will work with are formulaic texts, which are short texts with a fixed generic structure and a stereotypical visual format for highlighting key components. Ghent University has already developed two corpora of primarily Ancient Greek formulaic texts: the Database of Byzantine Book Epigrams (DBBE – www.dbbe.ugent.be), a collection of medieval Greek paratexts, and the Database of Everyday Writing in Antiquity (EVWRIT – www.evwrit.ugent.be), a database of everyday non-literary papyri and inscriptions.

The University of Florida brings expertise in 3D digitization, digital preservation, dissemination, and advanced automatic analysis of inscriptions and historical artifacts through its Digital Epigraphy and Archaeology Project (www.digitalepigraphy.org). The project also collaborates with the Data-Driven Humanities Research Group at the University of Florida (https://classics.ufl.edu/data-driven-humanities-research-group/), which specializes in applying Natural Language Processing and machine learning to analyze ancient Greek and Latin language, literature, and culture.