Automatic Annotation of Legal References (Allegationes) in the Liber Extra’s Ordinary Gloss

Authors

  • Andrea Esuli Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo” – Pisa, Italy
  • Vincenzo Roberto Imperia University of Palermo - Department of Law – Palermo, Italy
  • Giovanni Puccetti Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo” – Pisa, Italy

DOI:

https://doi.org/10.60923/issn.2532-8816/22163

Keywords:

Legal references, Information Extraction, Conditional Random Fields, Dataset, IRCDL2025

Abstract

The study of normative corpora of the past is a key activity in the fields of Religious Studies and Legal History. The development of intelligent software tools that support this activity is of paramount importance to support the digital transformation of the community. We present an interdisciplinary activity that leads to an accurate automatic annotation of legal references in the Liber Extra’s Ordinary Gloss. An index of legal references has been derived from the annotations enabling the creation of novel navigation and data analysis tools. The contribution of this work is twofold: the actual index is already by itself valuable resource for the discipline, and we detail the process that leads to its production, showing that an effective result can be delivered by a small team with limited resources. Both the index and the code are made publicly available.

References

[1] Bamman, D., and P. J. Burns. 2020. “Latin BERT: A Contextual Language Model for Classical Philology.” arXiv preprint arXiv:2009.10053.

[2] Barzaghi, S., Palmirani, M., & Peroni, S. (2020). Development of an ontology for modelling medieval manuscripts: the case of Progetto IRNERIO. Umanistica Digitale, 4(9), 117–140. https://doi.org/10.6092/issn.2532-8816/11187

[3] Boschetti, F., Bambaci, L., Del Grosso, A. M., Mugelli, G., Khan, A. F., Bellandi, A., & Taddei, A. (2023). Collaborative and Multidisciplinary Annotations of Ancient Texts: The Euporia System. The Ancient World Goes Digital, 6, 172-223.

[4] Bellomo, M. 1995. The Common Legal Past of Europe, 1000–1800.

[5] Bernard of Parma. 1582. Glossa Ordinaria to Decretals Gregory IX. In Decretales D. Gregorii Papae IX. suae integritati una cum glossis restitutae. Cum privilegio Gregorii XIII. Pont. Max. et aliorum Principum. Roma: In Aedibus Populi Romani.

[6] Gius, E., Meister, J. C., Meister, M., Petris, M., Gerstorfer, D., Akazawa, M., & Messner, S. (2025). CATMA (7.2.0). Zenodo. https://doi.org/10.5281/zenodo.1470118

[7] Conte, E. 2021. “The Centre and the Margins of the Jungle of Glossed Manuscripts.” Rivista Internazionale di Diritto Comune: 55–73.

[8] Decretales D. Gregorii Papae IX. 1582. Suae Integritati Una Cum Glossis Restitutae. Cum Privilegio Gregorii XIII. Pont. Max. et aliorum Principum. 1582. Roma: In Aedibus Populi Romani. Accessed 2024. https://digital.library.ucla.edu/catalog/ark:/21198/zz0014rx7w?cv=35.

[9] Devlin, J., M.-W. Chang, K. Lee, and K. Toutanova. 2019. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), edited by J. Burstein, C. Doran, and T. Solorio, 4171–4186. Minneapolis: Association for Computational Linguistics. https://aclanthology.org/N19-1423. doi:10.18653/v1/N19-1423.

[10] Dolezalek, G. 2021. "Glosses and the Juridical Genre Apparatus glossarum in the Middle Ages." Rivista Internazionale di Diritto Comune 32: 9-54.

[11] Esuli, A., and F. Sebastiani. 2010. “Evaluating Information Extraction.” In International Conference of the Cross-Language Evaluation Forum for European Languages, 100–111. Springer.

[12] Esuli, A., and G. Puccetti. 2024. “A Machine Learning Pipeline to Automatically Annotate Legal References (Allegationes) in the Liber Extra’s Ordinary Gloss.” https://github.com/aesuli/CIC_annotation. doi:10.5281/zenodo.14381817.

[13] Esuli, A., V. R. Imperia, and G. Puccetti. 2025. “Automatic Annotation of the Legal References in the LiberExtra’s Ordinary Gloss (Version 2.0) [Data set].” doi:10.5281/zenodo.17953666.

[14] Hespanha, A. M. 2008. “Form and Content in Early Modern Legal Books.” Rechtsgeschichte-Legal History 12: 12–50.

[15] Kantorowicz, H. U. 1935. “Die Allegationen im Späteren Mittelalter.” Archiv für Urkundenforschung: 15–29.

[16] Klie, J.-C., M. Bugert, B. Boullosa, R. E. de Castilho, and I. Gurevych. 2018. “The INCEpTION Platform: Machine-Assisted and Knowledge-Oriented Interactive Annotation.” In Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations, 5–9. Association for Computational Linguistics. http://tubiblio.ulb.tu-darmstadt.de/106270/.

[17] Yi Luan, Dave Wadden, Luheng He, Amy Shah, Mari Ostendorf, and Hannaneh Hajishirzi. 2019. A general framework for information extraction using dynamic span graphs. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3036–3046, Minneapolis, Minnesota. Association for Computational Linguistics.

[18] García-Menéndez, Á., Labra-Gayo, J. E., & Gayo-Avello, D. (2024). Unlocking Historical Knowledge: A Semantic Web Approach to Medieval Notarial Document Analysis. CEUR Workshop Proceedings, 3967.

[19] Menzinger, S. 2017. “Reflections on the Connection between Author and Text in Medieval Juridical Production.” Historia et Ius 11.

[20] Menzinger, S. 2019. “The Past, the Others, Himself: The Open Dialogue of a Medieval Legal Author with His Text.” S. Boodts, P. De Leemans, S. Schorn (eds.), Sicut dicit: editing ancient and medieval commentaries on authoritative texts, Turnhout, Brepols, 2019, pp. 273-299.

[21] Menzinger, S. 2022. “Interazione tra Testo e ‘Citazione’ nella Dottrina Giuridica Civilistica: Secoli XII e XIII.” In Juristische Glossierungstechniken als Mittel Rechtswissenschaftlicher Rationalisierungen, 15–26. Erich Schmidt Verlag.

[22] Aldrian Obaja Muis and Wei Lu. 2016. Learning to Recognize Discontiguous Entities. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 75–84, Austin, Texas. Association for Computational Linguistics.

[23] Pennington, K. 2012. “Corpus iuris canonici.” In Diccionario General de Derecho Canónico, edited by J. Otaduy, A. Viana, and J. S. Rueda, 757–765. Thomson Reuters Aranzadi.

[24] Quaglioni, D. 2019. “Licet Allegare Poetas: Formanti Letterari del Diritto fra Medioevo ed Età Moderna.” in F. Meier, E. Zanin, Poesia e diritto nel Due e Trecento italiano, Ravenna, Longo Editore, 2019, pp. 209-220.

[25] Ratinov, L., and D. Roth. 2009. “Design Challenges and Misconceptions in Named Entity Recognition.” In Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009), 147–155.

[26] Reno, E. 2024. The Digital Decretals. Accessed November 1, 2024. https://www.digitaldecretals.com/.

[27] Sanderson, R., Ciccarese, P., & Van de Sompel, H. (2017). Web Annotation Data Model (W3C Recommendation). World Wide Web Consortium. https://www.w3.org/TR/annotation-model/

[28] Speciale, G. 2001. “Apparatus: Ipertesto Vivo e Aperto.” Ius Commune. Zeitschrift für Europäische Rechtsgeschichte 28: 47–59.

[29] Sutton, C., and A. McCallum. 2012. “An Introduction to Conditional Random Fields.” Foundations and Trends in Machine Learning 4: 267–373.

[30] TEI Consortium. (2025). TEI P5: Guidelines for Electronic Text Encoding and Interchange (version P5 4.10.2). The TEI Consortium. https://www.tei-c.org/release/doc/tei-p5-doc/en/html/index.html

[31] Vaswani, A. 2017. “Attention Is All You Need.” In Advances in Neural Information Processing Systems.

[32] Yu Wang, Hanghang Tong, Ziye Zhu, and Yun Li. 2022. Nested Named Entity Recognition: A Survey. ACM Trans. Knowl. Discov. Data 16, 6, Article 108 (December 2022), 29 pages. https://doi.org/10.1145/3522593

[33] Weimar, P. 1967. “Argumenta Brocardica.” Studia Gratiana 14: 89–123.

[34] Wolf, T., L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, and M. Fun-Towicz. 2020. “Transformers: State-of-the-Art Natural Language Processing.” In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 38–45.

[35] Radford, A., J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, et al. 2019. “Language Models Are Unsupervised Multitask Learners.” OpenAI Blog 1: 9.

Downloads

Published

2026-01-29

How to Cite

Esuli, A., Imperia, V. R., & Puccetti, G. (2026). Automatic Annotation of Legal References (Allegationes) in the Liber Extra’s Ordinary Gloss. Umanistica Digitale, 10(22), 139–156. https://doi.org/10.60923/issn.2532-8816/22163