Browsing by Subject "natural language processing"

Now showing 1 - 20 of 23

Access status: Open Access ,
A blackboard system for generating poetry
(Wydawnictwa AGH, 2016) Misztal-Radecka, Joanna; Indurkhya, Bipin
We present a system to generate poems based on the information extracted from input text such as blog posts. Our design uses the blackboard architecture, in which independent specialized modules cooperate during the generation process by sharing a common workspace known as the blackboard. Each module is responsible for a particular task while generating poetry. Our implementation incorporates modules that retrieve information from the input text, generate new ideas, or select the best partial solutions. These distinct modules (experts) are implemented as diverse computational units that make use of lexical resources, grammar models, sentiment-analyzing tools, and languageprocessing algorithms. A control module is responsible for scheduling actions on the blackboard. We argue that the blackboard architecture is a promising way of simulating creative processes because of its flexibility and compliance with the Global Workspace Theory of mind. The main contribution of this work is the design and prototype implementation of an extensible platform for a poetry-generating system that may be further extended by incorporating new experts as well as some existing poetrygenerating systems as parts of the blackboard architecture. We claim that this design provides a powerful tool for combining many of the existing efforts in the domain of automatic poetry generation.
Access status: Open Access ,
Analiza możliwości i ograniczeń systemów translatacji automatycznej wspomaganej przez człowieka na przykładzie systemu tłumaczącego z języka włoskiego na polski
(2006) Gajer, Mirosław
Translacja automatyczna jest dyscypliną nauki dostarczającą wiedzy o tym, jak programować komputery, aby były one w stanie dokonywać automatycznych przekładów pomiędzy wybranymi językami naturalnymi. Translacja automatyczna była również jedną z pierwszych aplikacji, jakie zostały zaproponowane dla komputerów. Niestety szybko okazało się, że zadanie translacji automatycznej jest znacznie trudniejsze, ale zarazem o wiele ciekawsze z naukowego punktu widzenia, niż pierwotnie sądzono. W artykule omówiono podstawowe przyczyny powodujące, że translacja automatyczna jest zadaniem tak niezwykle trudnym. Omówiono również najbardziej obiecujące kierunki rozwoju systemów translacji automatycznej. W dalszej części artykułu przedstawiono podstawowe koncepcje związane z nowym, zaproponowanym przez autora podejściem do zagadnień translacji automatycznej. Zamieszczone w artykule rozważania zilustrowano na przykładzie eksperymentalnego systemu translacji automatycznej, dokonującego przekładu zdań zapisanych w języku włoskim na polski.
Access status: Open Access ,
Application of linguistic cues in the analysis of language of hate groups
(Wydawnictwa AGH, 2015) Balcerzak, Bartłomiej; Jaworski, Wojciech
Hate speech and fringe ideologies are social phenomena that thrive on-line. Members of the political and religious fringe are able to propagate their ideas via the Internet with less effort than in traditional media. In this article, we attempt to use linguistic cues such as the occurrence of certain parts of speech in order to distinguish the language of fringe groups from strictly informative sources. The aim of this research is to provide a preliminary model for identifying deceptive materials online. Examples of these would include aggressive marketing and hate speech. For the sake of this paper, we aim to focus on the political aspect. Our research has shown that information about sentence length and the occurrence of adjectives and adverbs can provide information for the identification of differences between the language of fringe political groups and mainstream media.
Access status: Open Access ,
Automatyczna ekstrakcja powiązań semantycznych z tekstu polskiego
(Wydawnictwa AGH, 2002) Lubaszewski, Wiesław; Gajęcki, Marek
In the paper we presents a method for automatic construction of an association list for a particular word. Association list is a set of word, each of which is in a semantic relation with the words, to define. To construct the association list we use statistic reasoning algorythm, which works on the base of the Polish inflection dictionary, text corpus, and the quantitative dictionary created for the corpus. Experiments are encouraging enough - one may think that the association can serve as the base for the entry in a semantic dictionary.
Access status: Open Access ,
Automatyczna klasyfikacja rzeczowników do grup semantycznych na podstawie korpusu tekstów
(Wydawnictwa AGH, 2003) Gajęcki, Marek; Krężołek, Marek
This article presents a method of classification of nouns into semantic groups based on statistical inference. The algorithm uses the inflectional dictionary of the Polish language and a corpus of texts to analyse adjective-noun relationships. The semantic groups are consistent with the categorization in the WordNet dictionary. The classification of nouns into semantic groups is a small step towards constructing a semantic dictionary for the Polish language.
Access status: Open Access ,
Benchmarking high performance architectures with natural language processing algorithms
(Wydawnictwa AGH, 2011) Kuta, Marcin; Kitowski, Jacek
Natural Language Processing algorithms are resource demanding, especially when tuning to inflective language like Polish is needed. The paper presents time and memory requirements of part of speech tagging and clustering algorithms applied to two corpora of the Polish language. The algorithms are benchmarked on three high performance platforms of different architectures. Additionally sequential versions and OpenMP implementations of clustering algorithms were compared.
Access status: Open Access ,
Bielik7B v0.1: Polish language model – development, insights, and evaluation
(Wydawnictwa AGH, 2025) Ociepa, Krzysztof; Flis, Łukasz; Wróbel, Krzysztof; Gwoździej, Adrian; Kinas, Remigiusz
We introduce Bielik 7B v0.1 – a seven-billion-parameter generative text model for Polish language processing. Trained on curated Polish corpora, this model addresses key challenges in language model development through innovative techniques; these include Weighted Instruction Cross-Entropy Loss (which balances the learning of different instruction types) and Adaptive Learning Rate (which dynamically adjusts the learning rate based on training progress). To evaluate performance, we created the Open PL LLM Leaderboard and Polish MT-Bench – novel frameworks assessing various NLP tasks and conversational abilities. Bielik 7B v0.1 demonstrates significant improvements, achieving a ninepercentage- point increase in its average score compared to Mistral-7B-v0.1 on the RAG Reader task. It also excels in the Polish MT-Bench – particularly in the Reasoning (6.15/10) and Role-playing (7.83/10) categories. This model represents a substantial advancement in Polish language AI, offering a powerful tool for diverse linguistic applications and setting new benchmarks in the field.
Access status: Open Access ,
Building semantic user profile for polish web news portal
(Wydawnictwa AGH, 2018) Misztal-Radecka, Joanna
The aim of this research is to construct meaningful user profiles that are the most descriptive of user interests in the context of the media content that they browse. We use two distinct state-of-the-art numerical text-representation techniques: LDA topic modeling and Word2Vec word embeddings. We train our models on the collection of news articles in Polish and compare them with a model built on a general language corpus. We compare the performance of these algorithms on two practical tasks. First, we perform a qualitative analysis of the semantic relationships for similar article retrieval, and then we evaluate the predictive performance of distinct feature combinations for user gender classification. We apply the algorithms to the real-world dataset of Polish news service Onet. Our results show that the choice of text representation depends on the task - Word2Vec is more suitable for text comparison, especially for short texts such as titles. In the gender classification task, the best performance is obtained with a combination of features: topics from the article text and word embeddings from the title.
Access status: Open Access ,
Compressing sentiment analysis CNN models for efficient hardware processing
(Wydawnictwa AGH, 2020) Wróbel, Krzysztof; Karwatowski, Michał; Wielgosz, Maciej; Pietroń, Marcin; Wiatr, Kazimierz
Convolutional neural networks (CNNs) were created for image classification tasks. Shortly after their creation, they were applied to other domains, including natural language processing (NLP). Nowadays, solutions based on artificial intelligence appear on mobile devices and embedded systems, which places constraints on memory and power consumption, among others. Due to CNN memory and computing requirements, it is necessary to compress them in order to be mapped to the hardware. This paper presents the results of the compression of efficient CNNs for sentiment analysis. The main steps involve pruning and quantization. The process of mapping the compressed network to an FPGA and the results of this implementation are described. The conducted simulations showed that the 5-bit width is enough to ensure no drop in accuracy when compared to the floating-point version of the network. Additionally, the memory footprint was significantly reduced (between 85 and 93% as compared to the original model).
Access status: Open Access ,
Finite-state methodology in natural language processing
(Wydawnictwa AGH, 2001) Korzycki, Michał
Recent mathematical and algorithmic results in thc field of finite-state technology, as well as the increase in computing power, have constructed the base for a new approach in natural language processing. However the task of creating an appropriate model that would describe the phenomena of the natural language is still to be achieved. In this paper I'm presenting some notions related to the finite-state modelling of syntax and morphology.
Access status: Open Access ,
Geolocalization of 19th-century villages and cities mentioned in geographical dictionary of the Kingdom of Poland
(Wydawnictwa AGH, 2013) Jaśkiewicz, Grzegorz
This article presents a method of the rough estimation of geographical coordinates of villages and cities, which is described in the 19th-Century geographical encyclopedia entitled: »The Geographical Dictionary of the Polish Kingdom and Other Slavic Countries«[18]. Described are the algorithm function for estimating location, the tools used to acquire and process necessary information, and the context of this research.
Access status: Open Access ,
Knowledge graphs effectiveness in Neural Machine Translation improvement
(Wydawnictwa AGH, 2020) Ahmadnia, Benyamin; Dorr, Bonnie J.; Kordjamshidi, Parisa
Maintaining semantic relations between words during the translation process yields more accurate target-language output from Neural Machine Translation (NMT). Although difficult to achieve from training data alone, it is possible to leverage Knowledge Graphs (KGs) to retain source-language semantic relations in the corresponding target-language translation. The core idea is to use KG entity relations as embedding constraints to improve the mapping from source to target. This paper describes two embedding constraints, both of which employ Entity Linking (EL) - assigning a unique identity to entities - to associate words in training sentences with those in the KG: (1) a monolingual embedding constraint that supports an enhanced semantic representation of the source words through access to relations between entities in a KG, and (2) a bilingual embedding constraint that forces entity relations in the source-language to be carried over to the corresponding entities in the target-language translation. The method is evaluated for English-Spanish translation exploiting Freebase as a source of knowledge. Our experimental results demonstrate that exploiting KG information not only decreases the number of unknown words in the translation but also improves translation quality.
Access status: Open Access ,
Kontekstowe rozstrzyganie wieloznaczności w tekście polskim
(Wydawnictwa AGH, 2002) Krężołek, Marek
This article describes a computer programme designed to assign the appropriate meaning to homonyms based on the context in which they appear. Homonyms can be defined as words with a number of different, often unrelated, meanings (for example a bank - a business that lends or keeps money, the land along the side of river, a large pile of earth, sand, snow). The programme described in the paper exploits a monolingual dictionary, which, together with and supported by declension dictionary assigns the appropriate meaning for any given word from the text. In order to assign the correct and appropriate meaning for the given context two methods are used and exploited: 1) Recognition of the meaning of a particular homonym based on the occurrence of its collocations in the immediate context. 2) Applying the correct meaning of the word based on the overall theme or topic of the text in which appears.
Access status: Restricted ,
Korektor z transliteracją dla tekstów w języku łemkowskim
(Data obrony: 2017-01-19) Klinkowski, Aleksander
Wydział Elektrotechniki, Automatyki, Informatyki i Inżynierii Biomedycznej
Access status: Open Access ,
Metody i narzędzia automatycznego przetwarzania informacji tekstowej i ich wykorzystanie w procesie zarządzania wiedzą
(Wydawnictwa AGH, 2011) Potiopa, Piotr
Tematem niniejszego artykułu jest przegląd metod i narzędzi służących reprezentacji i przetwarzaniu informacji, która jest aktualnie jednym z podstawowych środków budowania i zarządzania w każdej organizacji. Sprawne funkcjonowanie każdej instytucji uzależnione jest od dostępu do przechowywanej w niej wiedzy, jak również możliwości sprawnego jej wyszukiwania, systematyzowania i podejmowania na jej podstawie nowych decyzji.
Access status: Restricted ,
Natural language processing for Electronic Health Record
(Data obrony: 2019-01-29) Wojtaś, Julia
Wydział Elektrotechniki, Automatyki, Informatyki i Inżynierii Biomedycznej
Access status: Restricted ,
Normalizacja tekstu w syntezie mowy
(Data obrony: 2018-09-26) Gargas, Tomasz
Wydział Elektrotechniki, Automatyki, Informatyki i Inżynierii Biomedycznej
Access status: Restricted ,
Platforma do nauki języka angielskiego automatycznie budująca bazę słów na podstawie przeglądanych stron internetowych
(Data obrony: 2018-06-29) Świetlik, Joanna
Wydział Elektrotechniki, Automatyki, Informatyki i Inżynierii Biomedycznej
Access status: Restricted ,
Realizacja internetowego systemu translacji automatycznej
(Data obrony: 2010-07-16) Stadnik, Grzegorz
Wydział Elektrotechniki, Automatyki, Informatyki i Elektroniki
Access status: Open Access ,
Retrieval and interpretation of textual geolocalized information based on semantic geolocalized relations
(Wydawnictwa AGH, 2015) Korczyński, Wojciech
This paper describes a method for geolocalized information retrieval from natural language text and its interpretation by assigning it geographic coordinates. Proof-of-concept implementation is discussed, along with a geolocalized dictionary stored in a PostGIS/PostgreSQL spatial relational database. The discussed research focuses on the strongly inflectional Polish language, hence, additional complexity had to be taken into account. The presented method has been evaluated with the use of diverse metrics.