Browsing by Subject "information extraction"

Now showing 1 - 4 of 4

Access status: Open Access ,
Distributed web-scale infrastructure for crawling, indexing and search with semantic support
(Wydawnictwa AGH, 2012) Dlugolinský, Štefan; Šeleng, Martin; Laclavík, Michal; Hluchý, Ladislav
In this paper, we describe our work in progress in the scope of web-scale information extraction and information retrieval utilizing distributed computing. We present a distributed architecture built on top of the MapReduce paradigm for information retrieval, information processing and intelligent search supported by spatial capabilities. Proposed architecture is focused on crawling documents in several different formats, information extraction, lightweight semantic annotation of the extracted information, indexing of extracted information and finally on indexing of documents based on the geo-spatial information found in a document. We demonstrate the architecture on two use cases, where the first is search in job offers retrieved from the LinkedIn portal and the second is search in BBC news feeds and discuss several problems we had to face during the implementation. We also discuss spatial search applications for both cases because both LinkedIn job offer pages and BBC news feeds contain a lot of spatial information to extract and process.
Access status: Open Access ,
Geolocalization of 19th-century villages and cities mentioned in geographical dictionary of the Kingdom of Poland
(Wydawnictwa AGH, 2013) Jaśkiewicz, Grzegorz
This article presents a method of the rough estimation of geographical coordinates of villages and cities, which is described in the 19th-Century geographical encyclopedia entitled: »The Geographical Dictionary of the Polish Kingdom and Other Slavic Countries«[18]. Described are the algorithm function for estimating location, the tools used to acquire and process necessary information, and the context of this research.
Access status: Open Access ,
Information extraction from chemical patents
(Wydawnictwa AGH, 2012) Romberg, Mathilde; Klenner, Alexander; Zimmermann, Marc; Bergmann, Sandra
The development of new chemicals or pharmaceuticals is preceded by an indepth analysis of published patents in this field. This information retrieval is a costly and time inefficient step when done by a human reader, yet it is mandatory for potential success of an investment. The goal of the research project UIMA-HPC is to automate and hence speed-up the process of knowledge mining about patents. Multi-threaded analysis engines, developed according to UIMA (Unstructured Information Management Architecture) standards, process texts and images in thousands of documents in parallel. UNICORE (UNiform Interface to COmputing Resources) workflow control structures make it possible to dynamically allocate resources for every given task to gain best cpu-time/realtime ratios in an HPC environment.
Access status: Open Access ,
Metody i narzędzia automatycznego przetwarzania informacji tekstowej i ich wykorzystanie w procesie zarządzania wiedzą
(Wydawnictwa AGH, 2011) Potiopa, Piotr
Tematem niniejszego artykułu jest przegląd metod i narzędzi służących reprezentacji i przetwarzaniu informacji, która jest aktualnie jednym z podstawowych środków budowania i zarządzania w każdej organizacji. Sprawne funkcjonowanie każdej instytucji uzależnione jest od dostępu do przechowywanej w niej wiedzy, jak również możliwości sprawnego jej wyszukiwania, systematyzowania i podejmowania na jej podstawie nowych decyzji.