A Systematic Literature Review on the Representation of Texts as Linguistic Linked Open Data

Authors

  • Michela Bandini CNR-Istituto di Linguistica Computazionale “A. Zampolli”
  • Valeria Quochi CNR-Istituto di Linguistica Computazionale “A. Zampolli”

DOI:

https://doi.org/10.6092/issn.2532-8816/21195

Keywords:

Linguistic Linked Open Data, Semantic Web, Systematic literature review, Ancient texts, DigitAnt, Ancient languages

Abstract

Despite the growing interest in publishing linguistic data as Linked Open Data (LOD), the representation of ancient language corpora within the Semantic Web remains challenging. While LOD principles have been successfully applied to linguistic resources such as dictionaries, lexicon, and terminologies, their use for textual corpora - particularly those related to ancient languages - is still limited. Through a systematic literature review, we investigate how textual data has been represented as Linguistic Linked Open Data (LLOD), evaluating the potential and limitations of existing approaches and methodologies for enhancing data integration and interoperability in the Digital Humanities. This systematic literature review follows a rigorous methodology encompassing literature identification, screening for inclusion, and quality assessment. By classifying and analysing relevant studies, we provide a comprehensive overview of current practices and offer insights into the benefits and challenges of publishing ancient corpora such as LLOD.

References

[1] Akter, Yeasmin Ara, and Md. Ataur Rahman. 2019. ‘Extracting RDF Triples from Raw Text’. In 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT), 1–4. https://doi.org/10.1109/ICASERT.2019.8934694.

[2] Augenstein, Isabelle, Sebastian Padó, and Sebastian Rudolph. 2012. ‘LODifier: Generating Linked Data from Unstructured Text’. In: The Semantic Web: Research and Applications, edited by Elena Simperl, Philipp Cimiano, Axel Polleres, Oscar Corcho, and Valentina Presutti, 7295:210–24. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-30284-8_21.

[3] Berners-Lee, Tim, James Hendler, and Ora Lassila. 2001. ‘The Semantic Web’. Scientific American 284 (5): 34–43. http://www.sciam.com/article.cfm?articleID=00048144-10D2-1C70-84A9809EC588EF21.

[4] Bouziane, Abdelghani, Djelloul Bouchiha, and Noureddine Doumi. 2020. ‘Annotating Arabic Texts with Linked Data’. In 2020 4th International Symposium on Informatics and Its Applications (ISIA), 1–5. https://doi.org/10.1109/ISIA51297.2020.9416543.

[5] Bruce, Julie, and Jill Mollison. 2004. ‘Reviewing the Literature: Adopting a Systematic Approach’. Journal of Family Planning and Reproductive Health Care 30 (1): 13–16. https://doi.org/10.1783/147118904322701901.

[6] Buono, Maria Pia di, Philipp Cimiano, Mohammad Fazleh Elahi, and Frank Grimm. 2020. ‘Terme-à-LLOD: Simplifying the Conversion and Hosting of Terminological Resources as Linked Data’. In Proceedings of the 7th Workshop on Linked Data in Linguistics (LDL-2020), edited by Maxim Ionov, John P. McCrae, Christian Chiarcos, Thierry Declerck, Julia Bosque-Gil, and Jorge Gracia, 28–35. Marseille, France: European Language Resources Association. https://aclanthology.org/2020.ldl-1.5/.

[7] Cayless, Hugh A. 2019. ‘Sustaining Linked Ancient World Data’. In Ancient Greek and Latin in the Digital Revolution, edited by Monica Berti, 35–50. Berlin, Boston: De Gruyter Saur. https://doi.org/doi:10.1515/9783110599572-004.

[8] Chiarcos, Christian. 2012. ‘POWLA: Modeling Linguistic Corpora in OWL/DL’. In The Semantic Web: Research and Applications, edited by Elena Simperl, Philipp Cimiano, Axel Polleres, Oscar Corcho, and Valentina Presutti, 7295:225–39. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-30284-8_22.

[9] Chiarcos, Christian, Christian Fäth, and Maxim Ionov. 2020. ‘The ACoLi Dictionary Graph’. In Proceedings of the 12th Language Resources and Evaluation Conference, 3281–90. ELRA.

[10] Chiarcos, Christian, and Luis Glaser. 2020. ‘A Tree Extension for CoNLL-RDF’. In Proceedings of the Twelfth Language Resources and Evaluation Conference, 7161–69. Marseille, France: European Language Resources Association. https://aclanthology.org/2020.lrec-1.885.

[11] Chiarcos, Christian and Maxim Ionov. 2019. ‘Ligt: An LLOD-Native Vocabulary for Representing Interlinear Glossed Text as RDF’. 2nd Conference on Language, Data and Knowledge (LDK 2019). Open Access Series in Informatics (OASIcs), Volume 70, pp. 3:1-3:15, Schloss Dagstuhl – Leibniz-Zentrum für Informatik https://doi.org/10.4230/OASICS.LDK.2019.3

[12] Chiarcos, Christian, Bettina Klimek, Christian Fäth, Thierry Declerck, and John Philip McCrae. 2020. ‘On the Linguistic Linked Open Data Infrastructure’. In Proceedings of the 1st International Workshop on Language Technology Platforms, 8–15. Marseille, France: European Language Resources Association. https://aclanthology.org/2020.iwltp-1.2.

[13] Chiarcos, Christian, Émilie Pagé-Perron, Ilya Khait, Niko Schenk, and Lucas Reckling. 2018. ‘Towards a Linked Open Data Edition of Sumerian Corpora’. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japan: European Language Resources Association (ELRA). https://aclanthology.org/L18-1387.

[14] Chiarcos, Christian, and Antonio Pareja-Lora. 2020. ‘Open Data—Linked Data—Linked Open Data—Linguistic Linked Open Data (LLOD): A General Introduction’. In Development of Linguistic Linked Open Data Resources for Collaborative Data-Intensive Research in the Language Sciences, edited by Antonio Pareja-Lora, María Blume, Barbara C. Lust, and Christian Chiarcos, 1–18. The MIT Press. https://doi.org/10.7551/mitpress/10990.003.0003.

[15] Cimiano, Philipp, Christian Chiarcos, John P. McCrae, and Jorge Gracia. 2020. ‘Linguistic Linked Data in Digital Humanities’. In Linguistic Linked Data, by Philipp Cimiano, Christian Chiarcos, John P. McCrae, and Jorge Gracia, 229–62. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-30225-2_13.

[16] Decker, S., S. Melnik, F. van Harmelen, D. Fensel, M. Klein, J. Broekstra, M. Erdmann, and I. Horrocks. 2000. ‘The Semantic Web: The Roles of XML and RDF’. IEEE Internet Computing 4 (5): 63–73. https://doi.org/10.1109/4236.877487.

[17] Fantoli, Margherita, Marco Passarotti, Francesco Mambrini, Giovanni Moretti, and Paolo Ruffolo. 2022. ‘Linking the LASLA Corpus in the LiLa Knowledge Base of Interoperable Linguistic Resources for Latin’. In Proceedings of the 8th Workshop on Linked Data in Linguistics within the 13th Language Resources and Evaluation Conference, 26–34. Marseille, France: European Language Resources Association. https://aclanthology.org/2022.ldl-1.4

[18] Forstall, Christopher W, Simone Finkmann, and Berenice Verhelst. 2022. ‘Towards a Linked Open Data Resource for Direct Speech Acts in Greek and Latin Epic’. Digital Scholarship in the Humanities 37 (4): 972–81. https://doi.org/10.1093/llc/fqac006.

[19] Hellmann, Sebastian, Jens Lehmann, Sören Auer, and Martin Brümmer. 2013. ‘Integrating NLP Using Linked Data’. In Advanced Information Systems Engineering, edited by Camille Salinesi, Moira C. Norrie, and Óscar Pastor, 7908:98–113. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-41338-4_7.

[20] Hyvönen, Eero, Esko Ikkala, Mikko Koho, Jouni Tuominen, Toby Burrows, Lynn Ransom, and Hanno Wijsman. 2021. ‘Mapping Manuscript Migrations on the Semantic Web: A Semantic Portal and Linked Open Data Service for Premodern Manuscript Research’. In The Semantic Web – ISWC 2021, edited by Andreas Hotho, Eva Blomqvist, Stefan Dietze, Achille Fokoue, Ying Ding, Payam Barnaghi, Armin Haller, Mauro Dragoni, and Harith Alani, 12922:615–30. Lecture Notes in Computer Science. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-88361-4_36.

[21] Khan, Anas, Christian Chiarcos, Thierry Declerck, Daniela Gîfu, Elena García, Jorge Gracia, Maxim Ionov, et al. 2022. ‘When Linguistics Meets Web Technologies. Recent Advances in Modelling Linguistic Linked Data’. Semantic Web 13 (June):1–64. https://doi.org/10.3233/SW-222859.

[22] Mambrini, Francesco, and Marco Passarotti. 2019. ‘Linked Open Treebanks. Interlinking Syntactically Annotated Corpora in the LiLa Knowledge Base of Linguistic Resources for Latin’. In Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019), 74–81. Paris, France: Association for Computational Linguistics. https://doi.org/10.18653/v1/W19-7808.

[23] McCrae, John P., Steven Moran, Sebastian Hellmann, and Martin Brümmer. 2015. ‘Multilingual Linked Data’. Semantic Web 6 (4): 315–17. https://doi.org/10.3233/SW-150178.

[24] Navigli, Roberto, and Simone Paolo Ponzetto. 2012. ‘BabelNet: The Automatic Construction, Evaluation and Application of a Wide-Coverage Multilingual Semantic Network’. Artificial Intelligence 193:217–50. https://doi.org/10.1016/j.artint.2012.07.001.

[25] Paré, Guy, Marie-Claude Trudel, Mirou Jaana, and Spyros Kitsiou. 2015. ‘Synthesizing Information Systems Knowledge: A Typology of Literature Reviews’. Information & Management 52 (2): 183–99. https://doi.org/10.1016/j.im.2014.08.008.

[26] Passarotti, Marco, Eleonora Litta, Flavio Massimiliano Cecchini, Matteo Pellegrini, Giovanni Moretti, Paolo Ruffolo, and Giulia Pedonese. 2022. ‘The LiLa Knowledge Base of Interoperable Linguistic Resources for Latin. Architecture and Current State’.

[27] Passarotti, Marco, Francesco Mambrini, Greta Franzini, Flavio Massimiliano Cecchini, Eleonora Litta, Giovanni Moretti, Paolo Ruffolo, and Rachele Sprugnoli. 2020. ‘Interlinking through Lemmas. The Lexical Collection of the LiLa Knowledge Base of Linguistic Resources for Latin’. Studi e Saggi Linguistici 58 (1): 177–212. https://doi.org/10.4454/ssl.v58i1.277.

[28] Platas, María Luisa Diez, Salvador Ros, Elena González-Blanco, Helena Bermúdez, and Oscar Corcho. 2019. ‘The POSTDATA Network of Ontologies for European Poetry.’

[29] Rezk, Martín, Jungyeul Park, Yoon Yongun, Kyungtae Lim, John Larsen, YoungGyun Hahm, and Key-Sun Choi. 2013. ‘Korean Linked Data on the Web: Text to RDF’. In Semantic Technology, edited by Hideaki Takeda, Yuzhong Qu, Riichiro Mizoguchi, and Yoshinobu Kitamura, 7774:368–74. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-37996-3_31.

[30] Ruiz Fabo, Pablo, Helena Bermúdez Sabel, Clara Martínez Cantón, and Elena González-Blanco. 2021. ‘The Diachronic Spanish Sonnet Corpus: TEI and Linked Open Data Encoding, Data Distribution, and Metrical Findings’. Digital Scholarship in the Humanities 36 (Supplement_1): i68–80. https://doi.org/10.1093/llc/fqaa035.

[31] Schiavone, Luisa, Federico Morando, and The CoBis Communication Working Group. 2018. ‘The CoBiS Linked Open Data Project and Portal’. Edited by S. Lesteven, B. Kern, R. D’Abrusco, and B. Dorch. EPJ Web of Conferences 186:12013. https://doi.org/10.1051/epjconf/201818612013.

[32] Siemoneit, Benjamin, John Philip McCrae, and Philipp Cimiano. 2015. ‘Linking Four Heterogeneous Language Resources as Linked Data’. In Proceedings of the 4th Workshop on Linked Data in Linguistics: Resources and Applications, 59–63. Beijing, China: Association for Computational Linguistics. https://doi.org/10.18653/v1/W15-4207.

[33] Simpson, John, and Susan Brown. 2013. ‘From XML to RDF in the Orlando Project’. In 2013 International Conference on Culture and Computing, 194–95. https://doi.org/10.1109/CultureComputing.2013.61.

[34] Sobhy, Asmaa, Mahmoud Helmy, Michael Khalil, Sarah Elmasry, Youtham Boules, and Nermin Negied. 2023. ‘An AI Based Automatic Translator for Ancient Hieroglyphic Language—From Scanned Images to English Text’. IEEE Access 11:38796–804. https://doi.org/10.1109/ACCESS.2023.3267981.

[35] Templier, Mathieu, and Guy Paré. 2015. ‘A Framework for Guiding and Evaluating Literature Reviews’. Communications of the Association for Information Systems 37. https://doi.org/10.17705/1CAIS.03706.

[36] Tittel, Sabine, and Christian Chiarcos. 2018. ‘Historical Lexicography of Old French and Linked Open Data: Transforming the Resources of the Dictionnaire Étymologique de l’ancien Français with OntoLex-Lemon’. Proceedings of Globalex 2018: Lexicography & Wordnets, edited by Ilan Kernerman and Simon Krek. 8 May 2018, Miyazaki, Japan. European Language Resources Association (ELRA).

[37] Tupman, Charlotte. 2021. ‘Where Can Our Inscriptions Take Us?: Harnessing the Potential of Linked Open Data for Epigraphy’. In Epigraphy in the Digital Age: Opportunities and Challenges in the Recording, Analysis and Dissemination of Inscriptions, 115–28. Archaeopress. http://www.jstor.org/stable/j.ctv1xsm8s5.15.

[38] Xiao, Yu, and Maria Watson. 2019. ‘Guidance on Conducting a Systematic Literature Review’. Journal of Planning Education and Research 39 (1): 93–112. https://doi.org/10.1177/0739456X17723971.

[39] Zinn, Claus, Marie Hinrichs, and Erhard Hinrichs. 2022. ‘Adapting GermaNet for the Semantic Web’. In Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022), 41–47. Potsdam, Germany: KONVENS 2022 Organizers. https://aclanthology.org/2022.konvens-1.6.

[40] Hawkins, Ashleigh. 2022. ‘Archives, Linked Data and the Digital Humanities: Increasing Access to Digitised and Born-Digital Archives via the Semantic Web.’ Archival Science 22 (3): 319–44. https://doi.org/10.1007/s10502-021-09381-0.

[41] Kiefer, Ferenc. 1988. ‘Linguistic, conceptual and encyclopedic knowledge: Some implications for lexicography’. In Proceedings of the 3rd EURALEX International Congress, 1-10. Budapest: Akadémiai Kiadó.

[42] Sowa, John F. 1993. ‘Lexical Structures and Conceptual Structures’. In Semantics and the Lexicon, edited by J. Pustejovsky, 231-263. Studies in Linguistics and Philosophy, vol. 49. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-1972-6_12

[43] Tomasi, Francesca, Marilena Daquino and Lucia Giagnolini. 2021. ‘ARTchives: a linked open data native catalogue of art historians' archives’. In Proceedings of Linked Archives International Workshop 2021, co-located with 25th International Conference on Theory and Practice of Digital Libraries (TPDL 2021). Online, September 13th, 2021, edited by Carla Teixeira Lopes, Cristina Ribeiro, Franco Niccolucci, Irene Rodrigues, and Nuno Freire.

[44] Baillot, Anne, Marie Puren, Charles Riondet, Dorian Seillier, and Laurent Romary. 2017. ‘Access to cultural heritage data. A challenge for digital humanities. Proceedings of the Digital Humanities Conference 2017’. Aug 2017, Montréal, Canada. https://dh2017.adho.org/abstracts/DH2017-abstracts.pdf

[45] Lambon, Ralph Matthew A. 2014. ‘Neurocognitive insights on conceptual knowledge and its breakdown’. Philoosophical Transaction of the Royal Society B 369: 2012. http://doi.org/10.1098/rstb.2012.0392.

[46] Armaselu, Florentina , Chaya Liebeskind, Paola Marongiu, Barbara McGillivray, Giedre Valunaite Oleskeviciene, Elena-Simona Apostol, Ciprian-Octavian Truica, and Daniela Gifu. 2024. ‘LLODIA: A Linguistic Linked Open Data Model for Diachronic Analysis’. In Proceedings of the 9th Workshop on Linked Data in Linguistics @ LREC-COLING 2024, pages 1–10, Torino, Italia. ELRA and ICCL. https://aclanthology.org/2024.ldl-1.1/

[47] Stanković, Ranka, Milica Ikonić Nešić, Mihailo Škorić, Olja Perišić, and Olivera Kitanović. 2024. ‘Towards Semantic Interoperability: Parallel Corpora as Linked Data Incorporating Named Entity Linking’. In Proceedings of the 9th Workshop on Linked Data in Linguistics: Resources, Applications, Best Practices, Turin, 25 May 2024. ACL Anthology.

[48] Caporossi, Guillaume, and Cédric Leblay. 2011. ‘Online Writing Data Representation: A Graph Theory Approach’. In International Symposium on Intelligent Data Analysis, 80–89. Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-24800-9_10

[49] Michele Mallia, Michela Bandini, Andrea Bellandi, Francesca Murano, Silvia Piccini, Luca Rigobianco, Alessandro Tommasi, Cesare Zavattari, Mariarosaria Zinzi, and Valeria Quochi. 2024. ‘DigItAnt: a platform for creating, linking and exploiting LOD lexica with heterogeneous resources’. In Proceedings of the 9th Workshop on Linked Data in Linguistics @ LREC-COLING 2024, pages 55–65, Torino, Italia. ELRA and ICCL.

[50] Murano, Francesca, Valeria Quochi, Angelo Mario Del Grosso, Luca Rigobianco, and Mariarosaria Zinzi. 2023. ‘Describing Inscriptions of Ancient Italy. The ItAnt Project and Its Information Encoding Process.’ ACM Journal on Computing and Cultural Heritage Vol. 16, no. 3 (September 2023): 53. https://doi.org/10.1145/3606703

[51] Hyvönen, Eero. 2023. ‘Digital Humanities on the Semantic Web: Sampo Model and Portal Series.’ Semantic Web Vol. 14, no. 4: 729–744. https://doi.org/10.3233/SW-223034

[52] Siemoneit, Benjamin , John Philip McCrae, and Philipp Cimiano. 2015. ‘Linking Four Heterogeneous Language Resources as Linked Data’. In Proceedings of the 4th Workshop on Linked Data in Linguistics: Resources and Applications, pages 59–63, Beijing, China. Association for Computational Linguistics. https://aclanthology.org/W15-4207/

[53] Del Grosso, A. M., Capizzi, E., Cristofaro, S., De Luca, M. R., Giovannetti, E., Marchi, S., Seminara, G and Spampinato, D. (2019). Bellini’s Correspondence: a Digital Scholarly Edition for a Multimedia Museum. Umanistica Digitale, 3(7). https://doi.org/10.6092/issn.2532-8816/9162

[54] Daquino, M., Giovannetti, F., & Tomasi, F. (2019). ‘Linked Data per le edizioni scientifiche digitali. Il workflow di pubblicazione dell’edizione semantica del quaderno di appunti di Paolo Bufalini’. Umanistica Digitale, 3(7). https://doi.org/10.6092/issn.2532-8816/9091

Downloads

Published

2025-07-10

How to Cite

Bandini, M., & Quochi, V. (2025). A Systematic Literature Review on the Representation of Texts as Linguistic Linked Open Data. Umanistica Digitale, 9(20), 289–315. https://doi.org/10.6092/issn.2532-8816/21195