Collatinus

A swiss-knife for Latin

Authors

  • Philippe Verkerk Université de Lille Bât

DOI:

https://doi.org/10.60923/issn.2532-8816/23866

Keywords:

Latin, Morphological Analysis, Lemmatization, Artificial Intelligence

Abstract

Created by Yves Ouvrard, Collatinus is a lemmatizer and a morphological analyzer for Latin. Its initial aim was to help teachers prepare texts together with their vocabulary list, and to assist beginners in reading authentic Latin texts independently. In addition to the short translation proposed, Collatinus allows users to consult dictionaries, both digital and image-based, in order to access full lexical information. Collatinus relies on a lexical base of more than 80,000 lemmas and on tables of word-endings. The lexical base includes a short translation of the lemmas, mainly in English and in French. Collatinus’ ability to split an inflected form in its stem and ending enables it to provide its full morphological analysis. As the quantities of each syllable are given in its databases, Collatinus is also able to scan texts metrically or to indicate accentuation. Several external tools have been developed to extend these core possibilities. More recently, Collatinus has been coupled to AI techniques (Latin-BERT) to choose the more appropriate lemmatization and analysis for each word in context. The AI model was trained on the annotated LASLA corpus, and the resulting tagger produces output files in the same APN format used by LASLA.

References

[1] Bamman, David, and Patrick J. Burns, "Latin-BERT: A contextual language model for classical philology". arXiv Preprint arXiv:2009.10053. https://doi.org/10.48550/arXiv.2009.10053.

[2] Ghiringhelli, Elena. 2024. "La continuation des Fastes d’Ovide par le Dijonnais Claude-Barthélemy Morisot (1649): introduction et traduction annotée du mois de juillet du calendrier romain". PhD thesis. Université Bourgogne Franche-Comté. https://theses.hal.science/tel-05019439

[3] Jansson, Tore. 1975. Prose Rhythm in Medieval Latin from the 9th to the 13th Century, Almqvist & Wiksell International.

[4] Longree, Dominique, and Fantoli, Margherita. 2023. "LASLAfiles_Latin_APNformat", V1. ULiège Open Data Repository. https://doi.org/10.58119/ULG/QJJ0SA.

[5] Ouvrard, Yves, and Philippe Verkerk. 2014. "Collatinus, un outil polymorphe pour l’étude du latin". Archivum Latinitatis Medii Aevi 72: 305-311. https://www.persee.fr/doc/alma_0994-8090_2014_num_72_1_1156.

[6] Roelli, Philipp, and Jan Ctibor. 2022-2023. "A new version of Corpus Corporum, the Latin full-text database and tool". ALMA 80: 251-266.

[7] Thon, Valérie. forthcoming. Quand écrire, c'est agir. Étude linguistique des lettres de Pierre Damien (XIe siècle). PhD thesis (Liège).

[8] Turcan-Verkerk, Anne-Marie, and Philippe Verkerk. 1996. "Un programme informatique pour l’étude de la prose rimée et rythmée". Le médiéviste et l’ordinateur 33: 41–48. https://hal.science/hal-04394988v1.

[9] Verkerk, Philippe, Yves Ouvrard, Margherita Fantoli, and Dominique Longrée. 2020. "L.A.S.L.A. and Collatinus: A convergence in Lexica". Studi e Saggi Linguistici, 58 (1): 95-120. https://hal.science/hal-02399878v1.

[10] Verkerk, Philippe. 2022-2023. "Elaboration of a practical lemmatizer for Latin using Artificial Intelligence". ALMA 80: 267-294. https://hal.science/hal-04721577v1.

Downloads

Published

2026-05-21

How to Cite

Verkerk, P. (2026). Collatinus: A swiss-knife for Latin. Umanistica Digitale, 10(23), 121–132. https://doi.org/10.60923/issn.2532-8816/23866