Computer Science

Journal Issue

Computer Science

Files

csci.2023.24.2.tytul.pdf (22.57 KB)

csci.2023.24.2.red.pdf (39.32 KB)

csci.2023.24.2.contents.pdf (94.2 KB)

ISSN 1508-2806

e-ISSN: 2300-7036

Issue Date

2023

Volume

Vol. 24

Number

No. 2

Access rights

Access: otwarty dostęp

Rights: CC BY 4.0

Attribution 4.0 International (CC BY 4.0)

Journal Volume

Computer Science

Vol. 24 (2023)

Articles

Article

Open Access

Hybrid end-to-end approach integrating online learning with face-identification system

(Wydawnictwa AGH, 2023) Nguyen, Dat Van; Nguyen Son Trung; Pham, Hong-Anh Thi; Pham Van, Toan; Hoang, Thu Thao; Tạ, Minh Thanh

Facial recognition has been one of the most intriguing and exciting research topics over the last few years. It involves multiple face-based algorithms such asfacial detection, facial alignment, facial representation, and facial recognition. However, all of these algorithms are derived from large deep-learning architectures, leading to limitations in development, scalability, accuracy, and deployment for public use with mere CPU servers. Also, large data sets that contain hundreds of thousands of records are often required for training purposes. In this paper, we propose a complete pipeline for an effective face-recognition application that requires only a small data set of Vietnamese celebrities and a CPU for training, solving the problem of data leakage, and the need for GPU devices. The pipeline is based on the combination of a conversion algorithm from face vectors to string tokens and the indexing & retrieval process by Elasticsearch, thereby tackling the problem of online learning in facial recognition. Compared with other popular algorithms on the same data set, our proposed pipeline not only outperforms the counterpart in terms of accuracy but also delivers faster inference, which is essential to real-time applications.

Article

Open Access

Adaptation of domain-specific transformer models with text oversampling for sentiment analysis of social media posts on Covid-19 vaccine

(Wydawnictwa AGH, 2023) Bansal, Anmol; Choudhry, Arjun; Sharma, Anubhav; Susan, Seba

Covid-19 has spread across the world and many different vaccines have been developed to counter its surge. To identify the correct sentiments associated with the vaccines from social media posts, we fine-tune various state-of-the-art pretrained transformer models on tweets associated with Covid-19 vaccines. Specifically, we use the recently introduced state-of-the-art RoBERTa, XLNet, and BERT pre-trained transformer models, and the domain-specific CT-BER and BERTweet transformer models that have been pre-trained on Covid-19 tweets. We further explore the option of text augmentation by oversampling using the language model-based oversampling technique (LMOTE) to improve the accuracies of these models - specifically, for small sample data sets where there is an imbalanced class distribution among the positive, negative and neutral sentiment classes. Our results summarize our findings on the suitability of text oversampling for imbalanced, small-sample data sets that are used to fine-tune state-of-the-art pre-trained transformer models as well as the utility of domain-specific transformer models for the classification task.

Article

Open Access

ArNLI: Arabic Natural Language Inference entailment and contradiction detection

(Wydawnictwa AGH, 2023) Al Jallad, Khloud; Ghneim, Nada

Natural Language Inference (NLI) is a hot topic research in natural language processing, contradiction detection between sentences is a special case of NLI. This is considered a difficult NLP task which has a significant influence when added as a component in many NLP applications (such as question answering systems and text summarization). The Arabic language is one of the most challenging low-resources languages for detecting contradictions due to its rich lexical semantics ambiguity. We have created a data set of more than 12k sentences and named it ArNLI, it will be publicly available. Moreover, we have applied a new model that was inspired by Stanford's proposed contradiction-detection solutions for the English language. We proposed an approach for detecting contradictions between pairs of sentences in the Arabic language using a contradiction vector combined with a language model vector as an input to a machine-learning model. We analyzed the results of different traditional machine-learning classifiers and compared their results on our created data set (ArNLI) and on the automatic translation of both the PHEME and SICK English data sets. The best results were achieved by using the random forest classifier, with accuracies of 0.99, 0.60 and 0.75 on PHEME, SICK, and ArNLI respectively.

Article

Open Access

Transformation and classification of ordinal survey data

(Wydawnictwa AGH, 2023) Sadh, Roopam; Kumar, Rajeev

Currently, machine learning is being significantly used in almost all of the research domains, however, its applicability in survey research is still in its infancy. In this paper, we attempt to highlight the applicability of machine learning in survey research while working on two different aspects in parallel. First, we introduce a pattern-based transformation method for ordinal survey data. Our purpose for developing such a transformation method is two-fold: our transformation facilitates the easy interpretation of ordinal survey data and provides convenience while applying standard machine-learning approaches, and second, we demonstrate the application of various classification techniques over real and transformed ordinal survey data and interpret their results in terms of their suitability in survey research. Our experimental results suggest that machine learning coupled with a pattern-recognition paradigm has tremendous scope in survey research.

Article

Open Access

Deep convolutional neural network using a new data set for berber language

(Wydawnictwa AGH, 2023) Mokrane, Kemiche; Sadou, Malika

Currently, handwritten character recognition (HCR) technology has become an interesting and immensely useful technology, it has been explored with impressive performance in many languages. However, few HCR systems have been proposed for the Amazigh (Berber) language. Furthermore, the validation of any Amazigh handwritten character-recognition system remains a major challenge due to the lack of availability of a robust Amazigh database. To address this problem, we first created two new data sets for Tifinagh and Amazigh Latin characters by extending the well-known EMNIST database with the Amazigh alphabet. Then, we proposed a handwritten character recognition system that is based on a deep convolutional neural network to validate the created data sets. The proposed convolutional neural network (CNN) has been trained and tested on our created data sets, the experimental tests showed that it achieves satisfactory results in terms of accuracy and recognition efficiency.

Full item page