Lamirel, J.C; Lareau, F et Malaterre, C
(2023).
« The CFMf Topic-Modeling Method Based on Neural Clustering with Feature Maximization: Comparison with LDA », dans 19th International Conference on Scientometrics & Informetrics (International Society for Scientometrics and Informetrics (I.S.S.I.)., Bloomington, États-Unis, 2023-07)
Bloomington, États-Unis, pp. 253-259.
Fichier(s) associé(s) à ce document :
Résumé
Mining the content of scientific publications is increasingly used to investigate the practice of science and the evolution of research domains. Topic-models, among which LDA, have notably been shown to provide rich insights into the thematic content of disciplinary fields, their structure and evolution through time. However, improving topic modeling methods remains a major concern. Here we propose an alternative topic-modeling approach based on neural clustering and feature maximization with F1-measure (in short: CFMf). We compare the performance of this approach to LDA by applying both methods to a reference corpus of full-text philosophy of science articles (N=16,917). The results show significant improvements along key quantitative performance measures such as coherence, independently of the number of topics. Qualitative comparisons also show improvements in the consistency of topics and their interpretability in light of expert knowledge. We discuss these promising results and highlight upcoming research work.