|
|
2019 » Papers » Volume 1 » Semantic Author Recommendations Based on their Biography from the General Romanian Dictionary of Literature 1. SEMANTIC AUTHOR RECOMMENDATIONS BASED ON THEIR BIOGRAPHY FROM THE GENERAL ROMANIAN DICTIONARY OF LITERATURE Authors: Neagu Laurentiu-Marian, Trausan-Matu Stefan, Dascalu Mihai, Cotet Teodor-Mihai, Badescu Laura, Simion Eugen Volume 1 | DOI: 10.12753/2066-026X-19-022 | Pages: 165-172 | Download PDF | Abstract
The Romanian Language Dictionary is a centralized text repository which contains detailed biographies of all Romanian authors and can be used to perform various subsequent analyses. The aim of this paper is to introduce a novel method to recommend authors based on their biography from the Romanian Language Dictionary. Starting from multiple PDF input files made available by the "G. C?linescu" Institute of Literary History and Theory, we extracted relevant information on Romanian authors which was indexed into Elasticsearch, a non-relational database optimized for full-text indexing and search. The relevant information considers author's full name, their pseudonym (if any), year of birth and of death (if applicable), brief description (including studies, cities they lived in, important people they met, brief history), writings, critical references of others, etc. The indexed information is easily accessible through a RESTful API and provides a powerful starting point which may contribute to future Romanian cultural findings. Based on this consistent database, our aim is to create an interactive map showing all Romanian literature contributors, enabling the identification of similarities and differences between them based on specific features (e.g., similar writing styles, time periods, or similar text descriptions in terms of semantic models). In order to have a clearer image on how authors relate one to another, we employed the k-Means and agglomerative clustering algorithms from the Scikit-learn machine learning library. The results depict the distribution of Romanian authors throughout history and enable the identification of correlations between them based on the emerging clusters. This paper is a proof of concept that makes use of only the first volume of the Romanian Language Dictionary and represents the first step for follow-up analyses performed using the indexed dictionary. | Keywords
Clustering; Text Categorization; Text Mining; Analysis of General Romanian Dictionary of Literature; Author Recommendations; Adaptive Technologies. |
|
|
|