Repository logo
Article

Building semantic user profile for polish web news portal

creativeworkseries.issn1508-2806
dc.contributor.authorMisztal-Radecka, Joanna
dc.date.available2025-06-17T06:01:24Z
dc.date.issued2018
dc.descriptionBibliogr. s. 329-332.
dc.description.abstractThe aim of this research is to construct meaningful user profiles that are the most descriptive of user interests in the context of the media content that they browse. We use two distinct state-of-the-art numerical text-representation techniques: LDA topic modeling and Word2Vec word embeddings. We train our models on the collection of news articles in Polish and compare them with a model built on a general language corpus. We compare the performance of these algorithms on two practical tasks. First, we perform a qualitative analysis of the semantic relationships for similar article retrieval, and then we evaluate the predictive performance of distinct feature combinations for user gender classification. We apply the algorithms to the real-world dataset of Polish news service Onet. Our results show that the choice of text representation depends on the task - Word2Vec is more suitable for text comparison, especially for short texts such as titles. In the gender classification task, the best performance is obtained with a combination of features: topics from the article text and word embeddings from the title.en
dc.description.placeOfPublicationKraków
dc.description.versionwersja wydawnicza
dc.identifier.doihttps://doi.org/10.7494/csci.2018.19.3.2753
dc.identifier.eissn2300-7036
dc.identifier.issn1508-2806
dc.identifier.urihttps://repo.agh.edu.pl/handle/AGH/113210
dc.language.isoeng
dc.publisherWydawnictwa AGH
dc.relation.ispartofComputer Science
dc.rightsAttribution 4.0 International
dc.rights.accessotwarty dostęp
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/legalcode
dc.subjectuser profilingen
dc.subjectword embeddingsen
dc.subjecttopic modelingen
dc.subjectnatural language processingen
dc.subjectgender predictionen
dc.titleBuilding semantic user profile for polish web news portalen
dc.title.relatedComputer Scienceen
dc.typeartykuł
dspace.entity.typePublication
publicationissue.issueNumberNo. 3
publicationissue.paginationpp. 307-332
publicationvolume.volumeNumberVol. 19
relation.isJournalIssueOfPublication303f1a63-c66d-4b91-9144-8e6b77fd8016
relation.isJournalIssueOfPublication.latestForDiscovery303f1a63-c66d-4b91-9144-8e6b77fd8016
relation.isJournalOfPublication020291ee-249b-4dcf-98a3-276a2f7981aa

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
csci.2018.19.3.307.pdf
Size:
697.54 KB
Format:
Adobe Portable Document Format