Browsing by Subject "text classification"

Now showing 1 - 4 of 4

Access status: Open Access ,
Analysis of data pre-processing methods for sentiment analysis of reviews
(Wydawnictwa AGH, 2019) Parlar, Tuba; Özel, Selma Ayşe; Song, Fei
The goals of this study are to analyze the effects of data pre-processing methods for sentiment analysis and determine which of these pre-processing methods (and their combinations) are effective for English as well as for an agglutinative language like Turkish. We also try to answer the research question of whether there are any differences between agglutinative and non-agglutinative languages in terms of pre-processing methods for sentiment analysis. We find that the performance results for the English reviews are generally higher than those for the Turkish reviews due to the differences between the two languages in terms of vocabularies, writing styles, and agglutinative property of the Turkish language.
Access status: Open Access ,
Impact of n-stage latent Dirichlet allocation on analysis of headline classification
(Wydawnictwa AGH, 2022) Güven, Zekeriya Anil; Diri, Banu; Çakaloğlu, Tolgahan
Data analysis becomes difficult when the amount of the data increases. More specifically, extracting meaningful insights from this vast amount of data and grouping it based on its shared features without human intervention requires advanced methodologies. There are topic-modeling methods that help overcome this problem in text analyses for downstream tasks (such as sentiment analysis, spam detection, and news classification). In this research, we benchmark several classifiers (namely, random forest, AdaBoost, naive Bayes, and logistic regression) using the classical latent Dirichlet allocation (LDA) and n-stage LDA topic-modeling methods for feature extraction in headline classification. We ran our experiments on three and five classes of publicly available Turkish and English datasets. We have demonstrated that, as a feature extractor, $n$-stage LDA obtains state-of-the-art performance for any downstream classifier. It should also be noted that random forest was the most successful algorithm for both datasets.
Access status: Restricted ,
Klasyfikacja i klasteryzacja dokumentów
(Data obrony: 2010-09-15) Bozowski, Przemysław; Cała, Konrad
Wydział Elektrotechniki, Automatyki, Informatyki i Elektroniki
Access status: Restricted ,
Klasyfikacja i klasteryzacja dokumentów
(Data obrony: 2010-09-15) Cała, Konrad; Bozowski, Przemysław
Wydział Elektrotechniki, Automatyki, Informatyki i Elektroniki