The LiLa Knowledge Base

Querying Interoperable Latin Resources

Authors

  • Marco Passarotti Università Cattolica del Sacro Cuore
  • Eleonora Litta Università Cattolica del Sacro Cuore

DOI:

https://doi.org/10.60923/issn.2532-8816/23506

Keywords:

Latin, Linked Open Data, Interoperability, Linguistic Resources, SPARQL

Abstract

The proliferation of digital linguistic resources for Latin has created significant research opportunities, yet their potential is often constrained by a lack of interoperability. Developed in isolation, many corpora, dictionaries, and lexicons use heterogeneous formats, query languages, annotation criteria and tag sets, hindering integrated data analysis. The LiLa (Linking Latin) project addresses this challenge by creating a Knowledge Base of interconnected resources built on Linked Open Data principles. At its core is a lemma-based architecture, where a central Lemma Bank harmonizes divergent lemmatization practices across different sources, enabling seamless data integration. This paper introduces the fundamental structure of the LiLa Knowledge Base, which employs standard ontologies like OntoLex-Lemon and models data using RDF. We demonstrate the practical value of this interoperable ecosystem through a series of ready-to-use SPARQL queries. These use cases showcase how researchers can perform complex, cross-resource analyses, such as comparing lexical inventories between Classical and Medieval texts, examining word formation, or tracing semantic concepts across multiple corpora and dictionaries. By linking previously fragmented data, LiLa not only streamlines scholarly inquiry but also establishes a new paradigm for creating comprehensive, interconnected digital ecosystems for (not only) historical languages, making Latin a model for linguistic resource interoperability.

References

[1] Abdurahman, Suhaib, Alireza Salkhordeh Ziabari, Alexander K. Moore, Daniel M. Bartels, and Morteza Dehghani. 2025. "A primer for evaluating large language models in social-science research". Advances in Methods and Practices in Psychological Science 8 (2). https://doi.org/10.1177/25152459251325174

[2] Brown, Tom, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan et al. 2020. "Language models are few-shot learners". Advances in neural information processing systems 33: 1877-1901.

[3] Ciletti, Michele. 2025. "Prompting the muse: Generating prosodically-correct Latin speech with large language models." In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics 4: Student Research Workshop, ed. by Jin Zhao, Mingyang Wang, and Zhu Liu, 740-745.

[4] Ciletti, Michele. 2025. "Veras audire et reddere voces: A corpus of prosodically-correct latin poetic audio from large-language-model tts". In Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025).

[5] Comanici, Gheorghe, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein et al. 2025. "Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilitie". arXiv preprint arXiv:2507.06261.

[6] Fortson IV, Benjamin W. 2011. "Latin prosody and metrics". In A companion to the Latin language: 92-104. Blackwell Publishing Ltd. https://doi.org/10.1002/9781444343397.ch7.

[7] Gil, Alex, and Élika Ortega. 2016. "Global outlooks in digital humanities: Multilingual practices and minimal computing". In Doing digital humanities, 58-70. Routledge.

[8] Hurst, Aaron, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, A. J. Ostrow et al. 2024. "Gpt-4o system card". arXiv preprint arXiv:2410.21276.

[9] Johnson, Kyle P., Patrick J. Burns, John Stewart, Todd Cook, Clément Besnier, and William JB Mattingly. 2021. "The Classical Language Toolkit: An NLP framework for pre-modern languages". In Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing: System demonstrations, ed. by Heng Ji, Jong C. Park, Rui Xia, 20-29. Association for Computational Linguistics.

[10] Lam, Perry, Huayun Zhang, Nancy F. Chen, Berrak Sisman, and Dorien Herremans. "PRESENT: Zero-Shot Text-to-Prosody Control". IEEE Signal Processing Letters 32: 776 - 780. https://doi.org/10.1109/LSP.2025.3528359.

[11] Marvin, Ggaliwango, Nakayiza Hellen, Daudi Jjingo, and Joyce Nakatumba-Nabende. "Prompt engineering in large language models". 2023. In International conference on data intelligence and cognitive informatics, 387-402. Springer Nature Singapore.

[12] Mosqueira-Rey, Eduardo, Elena Hernández-Pereira, David Alonso-Ríos, José Bobes-Bascarán, and Ángel Fernández-Leal. 2023. "Human-in-the-loop machine learning: a state of the art." Artificial Intelligence Review 56 (4): 3005-3054.

[13] Passarotti, Marco, Francesco Mambrini, Greta Franzini, Flavio Massimiliano Cecchini, Eleonora Litta, Giovanni Moretti, Paolo Ruffolo, and Rachele Sprugnoli. 2020. "Interlinking through lemmas. the lexical collection of the lila knowledge base of linguistic resources for latin". Studi e Saggi Linguistici 58 (1): 177-212.

Reddy, G. Pradeep, YV Pavan Kumar, and K. Purna Prakash. "Hallucinations in large language models (LLMs)". 2024. In 2024 IEEE Open Conference of Electrical, Electronic and Information Sciences (eStream), 1-6. IEEE.

Downloads

Published

2026-05-21

How to Cite

Passarotti, M., & Litta, E. (2026). The LiLa Knowledge Base: Querying Interoperable Latin Resources. Umanistica Digitale, 10(23), 151–169. https://doi.org/10.60923/issn.2532-8816/23506