Applying Text Mining Methods to Construct a Domain Ontology from Definitions
terminology, definition, domain-ontology, knowledge-rich information, domain corpus, terminological dictionaryAbstract
This paper aims to describe a text-mining approach on a domain corpus (cork) within the theoretical framework of the dual dimension of terminology to create a terminological dictionary and correlate it with an ontology. We will make some considerations on (i) domain specificities; (ii) lexical markers; (iii) automatic corpus processing using Sketch Engine; (iv) representation of lexical networks using CmapTools; and (v) representation of the concept system using Protégé. The goal of the ontology is to logically support the coherence and quality of the natural language definitions contained in the terminological resource.
Agbago, Akakpo, and Caroline Barrière. “Corpus Construction for Terminology.” Corpus Linguistics 2005 Conference. Birmingham: National Research Council of Canada, 2005.
Atkins, Sue, Jeremy Clear, and Nicholas Ostler. “Corpus Design Criteria.” Literary and Linguistic Computing 7, no. 1 (1992): 1 - 16.
Baker, Paul, Andrew Hardie, and Tony McEnery. A Glossary of Corpus Linguistics. In the series Glossaries in Linguistics. Edinburgh University Press, 2006.
Bowker, Lynne, and Jennifer Pearson. Working wiht specialized language: a practicle guide to using corpora. London: Routledge, 2002.
Brezina, Vaclav. Statistics in Corpus Linguistics: A Practical Guide. Cambridge University Press, 2018.
Costa, Rute. “Pressupostos teóricos e metodológicos para a extracção automática de unidades terminológicas multilexémicas.” PhD Thesis. Lisboa: Universidade Nova de Lisboa, Faculdade de Ciências Sociais e Humanas, 2001.
Fernández, Mariano, Asunción Gómez-Pérez, and Natalia Juristo. “Methontology: From Ontological Art Towards Ontological Engineering.” Ontological Engineering | AAAI Spring Symposium. Association for the Advancement of Artificial Intelligence, 1997.
Gruber, Tom. “Ontology.” In Encyclopedia od Database Systems, edited by Ling Liu and Tamer Özsu, 1963-1965. Boston, MA: Springer, 2009.
Horridge, Matthew, and Peter F. Patel-Schneider. “OWL 2 Web Ontology Language Manchester Syntax (Second Edition).” W3C Working Group Note. 11 December 2012.
Horridge, Matthew, Nick Drummond, Goodwin, Rector, Alan John, Robert Stevens, and Hai H. Wang. “The Manchester OWL Syntax.” Proceedings of the OWLED*06 Workshop on OWL: Experiences and Directions, Athens, Georgia, USA, 2006.
ISO/FDIS 1087 (E). “Terminology work and terminology science - Vocabulary.” Suisse: ISO, 2019.
ISO/NF 704. “Travail terminologique - Principes et méthodes.” La Plaine Saint-Denis: Association Française de Normalisation (AFNOR), 2009.
Izquierdo, Alba Fernández. Themis. 07 2020.
Laviosa, Sara. “Corpus Linguistics and translation studies.” In Perspectives on Corpus Linguistics, edited by Vander Viana, Sonia Zyngier and Geoff Barnbrook, 131-153. Amsterdam / Philadelphia: John Benjamins Publishing Company, 2011.
L'Homme, Marie Claude. La Terminologie: principes et techniques. Collection : Paramètres. Montréal: Les presses de l'Université de Montréal, 2004.
Lim, Edward, James Liu, and Raymond Lee. Knowledge Seeker – Ontology Modelling for Information Search and Management. Series: Intelligent Systems Reference Library. Edited by Janusz Kacprzyk, Jain and Lakhmi. Hong Kong: Springer Berlin, Heidelberg, 2011.
Mechura, Michal. “Introducing Lexonomy: an open-source dictionary writing.” Proceedings of the eLex 2017 conference, 2017.
Meyer, Ingrid. “Extracting Knowledge-Rich contexts for terminography: a conceptual and methodological framework.” In Recent Advances in Computational Terminology, edited by Didier Bourigault, Christian Jacquemin and Marie-Claude L'Homme, 279 - 302. Amsterdam / Philadelphia: John Benjamins Publishing Company, 2001.
Musen, M. A., and Protégé team. “The Protégé project: A look back and a look forward.” AI Matters 1, no. 4 (June 2015).
Ontology-Lexicon Community Group. “Lexicon Model for Ontologies.” W3C Community Group Final Report. Edited by Philipp Cimiano, John P. McCrae and Paul Buitelaar. 10 May 2016.
Pearson, Jennifer. Terms in context. Amsterdam: John Benjamins Publishing Company, 1998.
Pottier, Bernard. Théorie et analyse en Linguitique. 2, corrigée. Paris: HACHETTE, Supérieur, 1992.
Poveda-Villalón, María, Asunción Gómez-Pérez, and Mari Carmen Suárez-Figueroa. “OOPS!(Ontology Pitfall Scanner!): An on-line tool for ontology evaluation.” International Journal on Semantic Web and Information Systems (IJSWIS) (IGI Global) 10, no. 2 (2014): 7-34.
Ramos, Margarida. “Knowledge Organization and Terminology: application to Cork.” PhD Thesis. Lisboa: Universidade NOVA de Lisboa, Faculdade de Ciências Socias e Humanas; Université Savoie Mont Blanc, Laboratoire d'Informatique, Systèmes, Traitement de l'Information et de la Connaissance, 2020.;
Ramos, Margarida. OntoCork. Dataset - OWL File. 2020.
Ramos, Margarida, and Rute Costa. “Extracting knowledge rich information from definitions. A corpus-based approach to build a conceptual based terminological resource.” 2nd International Conference on Multilingual Digital Terminology Today (MDTT 2023). CEUR Workshop Proceedings (, 2023.
Ramzan, Talib, K. Hanif Muhammad, Ayesha Shaeela, and Fatima Fakeeha. “Text Mining: Techniques, Applications and Issues.” International Journal of Advanced Computer Science and Applications (IJACSA) 7, no. 11 (2016): 414 - 418.
Sabou, Marta, and Miriam Fernandez. “Ontology (Network) Evaluation.” In Ontology Engineering in a Networked World, edited by M. Suárez-Figueroa, A. Gómez-Pérez, E. Motta and A Gangemi, 193-212. Berlin, Heidelberg: Springer, 2012.
Suárez-Figueroa, Mari Carmen, Asunción Gómez-Pérez, and Mariano Fernández-López. “The NeOn Methodology for Ontology.” In Ontology Engineering in a Networked World, edited by Mari Carmen Suárez-Figueroa, Asunción Gómez-Pérez, Enrico Motta and Aldo Gangemi, 9-34. Berlin, Heidelberg: Springer, 2007.
Tognini Bonelli, Elena. “Theoretical overview of the evolution of corpus linguistics.” Chap. 2 in The Routledge Handbook of Corpus Linguistics, edited by Anne O’Keeffe and Michael McCarthy, 14-27. London: Routledge, 2010.
Uschold, Mike, and Michael Gruninger. “Ontologies: principles, methods and applications.” The Knowledge Engineering Review 11 (1996): 93-136.
Viana, Vander. “The politics of Corpus Linguistics.” In Perspectives on Corpus Linguistics, edited by Vander Viana, Sonia Zyngier and Geoff Barnbrook, 229-245. Amsterdam / Philadelphia: John Benjamins Publishing Company, 2011.
Wilkinson, Mark D., et al. “The FAIR Guiding Principles for scientific data management and stewardship.” Scientific Data, nº 3 (03 2016).
How to Cite
Copyright (c) 2024 Margarida Ramos, Rute Costa

This work is licensed under a Creative Commons Attribution 4.0 International License.