This is an outdated version published on 2022-10-21. Read the most recent version.

A Benchmark Corpus for Topic Modeling on the Origins of Modern Antisemitism

Authors

Giorgia Minello Ca' Foscari University of Venice
Deborah Paci Università di Modena e Reggio Emilia

DOI:

https://doi.org/10.6092/issn.2532-8816/14767

Keywords:

Corpus, Benchmark dataset, NLP, Topic Model, Anti-Semitism, Drumont

Abstract

The pace of digitized collective knowledge accumulation has become increasingly rapid in the last few years. That means we have tremendous amounts of information content to be organized, searched, and understood that can be arranged only by employing automatic methods. In the case of textual data analysis, topic modeling, a machine learning method, is definitely the most famous framework to uncover latent topics from text documents. Adopting topic modeling approaches for studying textual sources is a well-established practice in many scientific and humanities studies fields, including the historical research scope. In this paper, we present a benchmark corpus for topic models, a dataset containing an annotated real-world collection of texts focused on the antisemitism theme in 19th century France. The benchmark corpus has been developed to address a specific machine learning task but it can also support the enhancement of other natural language processing-based studies, in particular, those concerning the historical sphere.

Downloads

PDF
HTML

Published

2022-10-21

Versions

2022-10-24 (2)
2022-10-21 (1)

How to Cite

Minello, G., & Paci, D. (2022). A Benchmark Corpus for Topic Modeling on the Origins of Modern Antisemitism. Umanistica Digitale, 6(13), 117–151. https://doi.org/10.6092/issn.2532-8816/14767

Download Citation

Issue

No. 13 (2022)

Section

Articles

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

A Benchmark Corpus for Topic Modeling on the Origins of Modern Antisemitism

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Versions

How to Cite

Issue

Section

License

Language

Make a Submission

Current Issue