HERITRACE: A User-Friendly Semantic Data Editor with Change Tracking and Provenance Management for Cultural Heritage Institutions
DOI:
https://doi.org/10.6092/issn.2532-8816/21218Keywords:
AIUCD2024, Data Curation, Provenance, Change-tracking, Semantic Web TechnologiesAbstract
HERITRACE is a data editor designed for galleries, libraries, archives and museums, aimed at simplifying data curation while enabling non-technical domain experts to manage data intuitively without losing its semantic integrity. While the semantic nature of RDF can pose a barrier to data curation due to its complexity, HERITRACE conceals this intricacy while preserving the advantages of semantic representation. The system natively supports provenance management and change tracking, ensuring transparency and accountability throughout the curation process. Although HERITRACE functions effectively out of the box, it offers a straightforward customization interface for technical staff, enabling adaptation to the specific data model required by a given collection. Current applications include the ParaText project, and its adoption is already planned for OpenCitations. Future developments will focus on integrating the RDF Mapping Language (RML) to enhance compatibility with non-RDF data formats, further expanding its applicability in digital heritage management.
References
[1] ADLab - Laboratorio Analogico Digitale and /DH.arc - Digital Humanities Advanced Research Centre. 2022. “FICLIT Digital Library.” Bologna, Italy: University of Bologna.
[2] Alexiev, Vladimir. 2018. “Museum Linked Open Data: Ontologies, Datasets, Projects.” Digital Presentation and Preservation of Cultural and Scientific Heritage 8 (September):19–50. https://doi.org/10.55630/dipp.2018.8.1.
[3] Bangor, Aaron, Philip T. Kortum, and James T. Miller. 2008. “An Empirical Evaluation of the System Usability Scale.” International Journal of Human-Computer Interaction 24 (6): 574–94. https://doi.org/10.1080/10447310802205776.
[4] Beek, Wouter, Rinke Hoekstra, Fernie Maas, Albert Meroño-Peñuela, and Inger Leemans. 2014. “Linking the STCN and Performing Big Data Queries in the Humanities.” In Digital Humanities Benelux Conference 2014.
[5] Ben-Kiki, Oren, Clark Evans, and Brian Ingerson. 2009. “Yaml Ain’t Markup Language (YamlTM) Version 1.1.” Working Draft 2008 5:11.
[6] Berthereau, Daniel and Corporation for Digital Scholarship. 2015. “CSV Import.” https://omeka.org/s/modules/CSVImport/.
[7] Candela, Gustavo. 2023. “Towards a Semantic Approach in GLAM Labs: The Case of the Data Foundry at the National Library of Scotland.” Journal of Information Science.
[8] Candela, Gustavo, Pilar Escobar, Rafael C. Carrasco, and Manuel Marco-Such. 2018. “Migration of a Library Catalogue into RDA Linked Open Data.” Edited by Christoph Schlieder. Semantic Web 9 (4): 481–91. https://doi.org/10.3233/SW-170274.
[9] Candela, Gustavo, María Dolores Sáez, MPilar Escobar Esteban, and Manuel Marco-Such. 2022. “Reusing Digital Collections from GLAM Institutions.” Journal of Information Science 48 (2): 251–67. https://doi.org/10.1177/0165551520950246.
[10] Carriero, Valentina Anita, Aldo Gangemi, Maria Letizia Mancinelli, Ludovica Marinucci, Andrea Giovanni Nuzzolese, Valentina Presutti, and Chiara Veninata. 2019. “ArCo: The Italian Cultural Heritage Knowledge Graph.” In The Semantic Web – ISWC 2019, edited by Chiara Ghidini, Olaf Hartig, Maria Maleshkova, Vojtěch Svátek, Isabel Cruz, Aidan Hogan, Jie Song, Maxime Lefrançois, and Fabien Gandon, 11779:36–52. Lecture Notes in Computer Science. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-30796-7_3.
[11] Corine, Deliot, Wilson Neil, Costabello Luca, and Vandenbussche@ Pierre-Yves. 2017. “The British National Bibliography: Who Uses Our Linked Data?” In Proceedings of the International Conference on Dublin Core and Metadata Applications. Dublin Core Metadata Initiative. https://doi.org/10.23106/DCMI.952137546.
[12] Daquino, Marilena, Silvio Peroni, David Shotton, Giovanni Colavizza, Behnam Ghavimi, Anne Lauscher, Philipp Mayr, Matteo Romanello, and Philipp Zumstein. 2020. “The OpenCitations Data Model.” In Lecture Notes in Computer Science, 447–63. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-62466-8_28.
[13] Daquino, Marilena, Mari Wigham, Enrico Daga, Lucia Giagnolini, and Francesca Tomasi. 2022. “CLEF. A Linked Open Data Native System for Crowdsourcing.” https://doi.org/10.48550/ARXIV.2206.08259.
[14] Dimou, Anastasia, Miel Vander Sande, Pieter Colpaert, Ruben Verborgh, Erik Mannens, and Rik Van de Walle. 2014. “RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data.” In Proceedings of the 7th Workshop on Linked Data on the Web, edited by Christian Bizer, Tom Heath, Sören Auer, and Tim Berners-Lee. Vol. 1184. CEUR Workshop Proceedings. http://ceur-ws.org/Vol-1184/ldow2014_paper_01.pdf.
[15] Gil, Yolanda, James Cheney, Paul Groth, Olaf Hartig, Simon Miles, Luc Moreau, and Paulo Pinheiro da Silva. 2010. “Provenance XG Final Report.” W3C. http://www.w3.org/2005/Incubator/prov/XGR-prov-20101214/.
[16] Haak, Laurel L., Martin Fenner, Laura Paglione, Ed Pentz, and Howard Ratner. 2012. “ORCID: A System to Uniquely Identify Researchers.” Learned Publishing 25 (4): 259–64. https://doi.org/10.1087/20120404.
[17] Hannemann, Jan, and Jürgen Kett. 2010. “Linked Data for Libraries.” In Proc of the World Library and Information Congress of the Int’l Federation of Library Associations and Institutions (IFLA).
[18] Heibi, Ivan, Silvio Peroni, and David Shotton. 2019. “Enabling Text Search on SPARQL Endpoints through OSCAR.” Edited by Alejandra Gonzalez-Beltran, Alejandra Gonzalez-Beltran, Francesco Osborne, Silvio Peroni, and Sahar Vahdati. Data Science 2 (1–2): 205–27. https://doi.org/10.3233/DS-190016.
[19] Hyvönen, Eero. 2020. “Linked Open Data Infrastructure for Digital Humanities in Finland.” Digital Humanities in the Nordic and Baltic Countries Publications 3 (1): 254–59. https://doi.org/10.5617/dhnbpub.11195.
[20] Isaac, Antoine, and Bernhard Haslhofer. 2013. “Europeana Linked Open Data – Data.Europeana.Eu.” Semantic Web 4 (3): 291–97. https://doi.org/10.3233/SW-120092.
[21] Kremer, Christine. 2021. “Bibliothèque nationale du Luxembourg : multiple et ouverte à tous.” Arabesques, no. 102 (July), 26–27. https://doi.org/10.35562/arabesques.2657.
[22] Krötzsch, Markus, Denny Vrandečić, and Max Völkel. 2006. “Semantic MediaWiki.” In The Semantic Web - ISWC 2006, edited by Isabel Cruz, Stefan Decker, Dean Allemang, Chris Preist, Daniel Schwabe, Peter Mika, Mike Uschold, and Lora M. Aroyo, 4273:935–42. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/11926078_68.
[23] Lampa, Samuel, Egon Willighagen, Pekka Kohonen, Ali King, Denny Vrandečić, Roland Grafström, and Ola Spjuth. 2017. “RDFIO: Extending Semantic MediaWiki for Interoperable Biomedical Data Management.” Journal of Biomedical Semantics 8 (1): 35. https://doi.org/10.1186/s13326-017-0136-y.
[24] Laurence, Corinne M. 2013. “Linked Data and the Library of Congress.” Library Philosophy and Practice, 2–24.
[25] Lebo, Timothy, Satya Sahoo, and Deborah McGuinness. 2013. “PROV-O: The PROV Ontology.” W3C. PROV-O. 2013. http://www.w3.org/TR/2013/REC-prov-o-20130430/.
[26] Massari, Arcangelo. 2024a. “HERITRACE (Heritage Enhanced Repository Interface for Tracing, Research, Archival Curation, and Engagement).” https://archive.softwareheritage.org/swh:1:snp:4d1d83b7043649a21900fcbf6465f0879672228e;origin=https://github.com/opencitations/heritrace.
[27] Massari, Arcangelo. 2024b. “Opencitations/Time-Agnostic-Library: 3.6.13.” Software Heritage Archive. https://archive.softwareheritage.org/swh:1:snp:9d4ed12adc4b2bdfe96f9629cb99b1ab3c559261;origin=https://github.com/opencitations/time-agnostic-library.
[28] Massari, Arcangelo, and Silvio Peroni. 2024. “HERITRACE: A User-Friendly Semantic Data Editor with Change Tracking and Provenance Management for Cultural Heritage Institutions.” In Quaderni Di Umanistica Digitale. Catania: AMSActa. https://doi.org/10.48550/arXiv.2402.00477.
[29] Massari, A. (2025). Configuration Files for ParaText: SHACL and Display Rules for Semantic Data Management Using HERITRACE (1.0.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.14741864
[30] Oldman, Dominic, and Diana Tanase. 2018. “Reshaping the Knowledge Graph by Connecting Researchers, Data and Practices in ResearchSpace.” In The Semantic Web – ISWC 2018, edited by Denny Vrandečić, Kalina Bontcheva, Mari Carmen Suárez-Figueroa, Valentina Presutti, Irene Celino, Marta Sabou, Lucie-Aimée Kaffee, and Elena Simperl, 11137:325–40. Lecture Notes in Computer Science. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-00668-6_20.
[31] Pan, Jeff Z. 2009. “Resource Description Framework.” In Handbook on Ontologies, 71–90. Springer.
[32] Pareti, Paolo, and George Konstantinidis. 2022. “A Review of SHACL: From Data Validation to Schema Reasoning for RDF Graphs.” In Reasoning Web. Declarative Artificial Intelligence, edited by Mantas Šimkus and Ivan Varzinczak, 13100:115–44. Lecture Notes in Computer Science. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-95481-9_6.
[33] Peroni, Silvio, and David Shotton. 2020. “OpenCitations, an Infrastructure Organization for Open Scholarship.” Quantitative Science Studies 1 (1): 428–44. https://doi.org/10.1162/qss_a_00023.
[34] Peroni, Silvio, David Shotton, and Fabio Vitali. 2016. “A Document-Inspired Way for Tracking Changes of RDF Data.” In Detection, Representation and Management of Concept Drift in Linked Open Data, edited by L. Hollink, S. Darányi, A.M. Peñuela, and E. Kontopoulos, 26–33. Bologna: CEUR Workshop Proceedings. http://ceur-ws.org/Vol-1799/Drift-a-LOD2016_paper_4.pdf.
[35] Petz, Georg. 2023. “Linked Open Data. Zukunftsweisende Strategien.” Bibliothek Forschung Und Praxis 47 (2): 213–22. https://doi.org/10.1515/bfp-2023-0006.
[36] Rachinger, Johanna. 2008. “The Austrian National Library: A New Orientation in the Shadows of a Long History.” Alexandria: The Journal of National and International Library and Information Issues 20 (1): 151–60. https://doi.org/10.1177/095574900802000105.
[37] Ranjgar, Babak, Abolghasem Sadeghi-Niaraki, Maryam Shakeri, Fatema Rahimi, and Soo-Mi Choi. 2024. “Cultural Heritage Information Retrieval: Past, Present, and Future Trends.” IEEE Access 12:42992–26. https://doi.org/10.1109/ACCESS.2024.3374769.
[38] Salarelli, Alberto. 2016. “Gestire piccole collezioni digitali con Omeka: l’esperienza di MoRE (A Museum of REfused and unrealised art projects).” Bibliothecae.it Vol 5 (November):177-200 Paginazione. https://doi.org/10.6092/ISSN.2283-9364/6393.
[39] Silveira, Lúcia da, Fabiano Couto Corrêa da Silva, Sara Caselani Zilio, and Larissa Silva Cordeiro. 2020. “Convergência de Práticas Linked Open Data Na Bibliothèque Nationale de France (BNF DATA).” Revista ACB: Biblioteconomia Em Santa Catarina 25 (1): 21–40.
[40] WikiTeq. 2022. “Semantic Watchlist.” https://www.mediawiki.org/wiki/Extension:Semantic_Watchlist.
[41] Wilkinson, Mark D., Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al. 2016. “The FAIR Guiding Principles for Scientific Data Management and Stewardship.” Scientific Data 3 (1): 160018. https://doi.org/10.1038/sdata.2016.18.
[42] Wulff, Enrique. 2024. “Digital Humanities: The Case Study of the National Library in Spain.” In Advances in Information Quality and Management, edited by Mehdi Khosrow-Pour, D.B.A., 1–19. IGI Global. https://doi.org/10.4018/978-1-6684-7366-5.ch052.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Arcangelo Massari, Silvio Peroni

This work is licensed under a Creative Commons Attribution 4.0 International License.