Prompting the Muse
Generating Prosodically Accurate Audio of Latin Poetry with Text-to-Speech Large Language Models: A Computational Workflow
DOI:
https://doi.org/10.60923/issn.2532-8816/23570Keywords:
Digital Humanities, Latin Poetry, AI, Computational toolsAbstract
While the field of Digital Humanities has successfully established robust infrastructures for the textual analysis of Latin, the auditory dimension of the language is still largely undeveloped. Specifically, the complex quantitative rhythm and intonation of classical poetry cannot be accurately replicated by Text-to-Speech models. This paper presents a computational workflow designed to bridge this gap, leveraging verified metrical data to produce high-fidelity and prosodically accurate audio recordings. By using the structured XML scansions of the Pedecerto project, the proposed pipeline employs a rule-based pre-processing routine to convert standard orthography into a phonetic script optimized for acoustic modelling. These adapted texts are then fed into a multimodal Large Language Model, which is steered via in-context prompt engineering to observe syllable quantity, ictus placement, pauses, and eventual elision. The technical architecture of this system is detailed, analyzing the specific orthographic interventions and prompts required to overcome the stress-timed bias of contemporary AI models. Finally, the implications of this tool for the wider Digital Humanities ecosystem are discussed, with particular attention to its potential to democratize access to Latin learning, support accessibility, and add new audio layers to existing digital projects and infrastructures.
References
[1] Abdurahman, Suhaib, Alireza Salkhordeh Ziabari, Alexander K. Moore, Daniel M. Bartels, and Morteza Dehghani. 2025. "A primer for evaluating large language models in social-science research". Advances in Methods and Practices in Psychological Science 8 (2). https://doi.org/10.1177/25152459251325174
[2] Brown, Tom, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan et al. 2020. "Language models are few-shot learners". Advances in neural information processing systems 33: 1877-1901.
[3] Ciletti, Michele. 2025. "Prompting the muse: Generating prosodically-correct Latin speech with large language models." In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics 4: Student Research Workshop, ed. by Jin Zhao, Mingyang Wang, and Zhu Liu, 740-745.
[4] Ciletti, Michele. 2025. "Veras audire et reddere voces: A corpus of prosodically-correct latin poetic audio from large-language-model tts". In Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025).
[5] Comanici, Gheorghe, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein et al. 2025. "Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilitie". arXiv preprint arXiv:2507.06261.
[6] Fortson IV, Benjamin W. 2011. "Latin prosody and metrics". In A companion to the Latin language: 92-104. Blackwell Publishing Ltd. https://doi.org/10.1002/9781444343397.ch7.
[7] Gil, Alex, and Élika Ortega. 2016. "Global outlooks in digital humanities: Multilingual practices and minimal computing". In Doing digital humanities, 58-70. Routledge.
[8] Hurst, Aaron, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, A. J. Ostrow et al. 2024. "Gpt-4o system card". arXiv preprint arXiv:2410.21276.
[9] Johnson, Kyle P., Patrick J. Burns, John Stewart, Todd Cook, Clément Besnier, and William JB Mattingly. 2021. "The Classical Language Toolkit: An NLP framework for pre-modern languages". In Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing: System demonstrations, ed. by Heng Ji, Jong C. Park, Rui Xia, 20-29. Association for Computational Linguistics.
[10] Lam, Perry, Huayun Zhang, Nancy F. Chen, Berrak Sisman, and Dorien Herremans. "PRESENT: Zero-Shot Text-to-Prosody Control". IEEE Signal Processing Letters 32: 776 - 780. https://doi.org/10.1109/LSP.2025.3528359.
[11] Marvin, Ggaliwango, Nakayiza Hellen, Daudi Jjingo, and Joyce Nakatumba-Nabende. "Prompt engineering in large language models". 2023. In International conference on data intelligence and cognitive informatics, 387-402. Springer Nature Singapore.
[12] Mosqueira-Rey, Eduardo, Elena Hernández-Pereira, David Alonso-Ríos, José Bobes-Bascarán, and Ángel Fernández-Leal. 2023. "Human-in-the-loop machine learning: a state of the art." Artificial Intelligence Review 56 (4): 3005-3054.
[13] Passarotti, Marco, Francesco Mambrini, Greta Franzini, Flavio Massimiliano Cecchini, Eleonora Litta, Giovanni Moretti, Paolo Ruffolo, and Rachele Sprugnoli. 2020. "Interlinking through lemmas. the lexical collection of the lila knowledge base of linguistic resources for latin". Studi e Saggi Linguistici 58 (1): 177-212.
[14] Reddy, G. Pradeep, YV Pavan Kumar, and K. Purna Prakash. "Hallucinations in large language models (LLMs)". 2024. In 2024 IEEE Open Conference of Electrical, Electronic and Information Sciences (eStream), 1-6. IEEE.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Michele Ciletti

This work is licensed under a Creative Commons Attribution 4.0 International License.