Repository logo
Article

Diacritic-aware Yorùbá spell checker

Loading...
Thumbnail Image

Date

Presentation Date

Editor

Other contributors

Access rights

Access: otwarty dostęp
Rights: CC BY 4.0
Attribution 4.0 International

Attribution 4.0 International (CC BY 4.0)

Other title

Resource type

Version

wersja wydawnicza
Item type:Journal Issue,
Computer Science
2023 - Vol. 24 - No. 1

Pagination/Pages:

pp. 31-51

Research Project

Event

Description

Bibliogr. s. 49-50.

Abstract

Spell checking and correction is still in its infancy for the Yorùbá language, existing tools cannot be directly applied to address the problem, as Yorùbá uses diacritics extensively for distinguishing phonemes and for marking tone. A model was formulated as a parallel combination of a unigram language model and a diacritic model to form a dictionary sub-model that can be used by error-detection and candidate-generation modules. The candidate-generation module was implemented as a reverse Levensthein edit-distance algorithm. The system was evaluated by using detection accuracy (calculated from the precision and recall) and suggestion accuracy (SA) as metrics. Our experimental setups compared the performance of the component subsystems when used alone and with their combination into a unified model. The detection accuracies for the different models range from 93.23 to 95.01%, and the suggestion accuracies range from 26.94 to 72.10%. The results indicated that each of the sub-models in the dictionary played different roles.

Access rights

Access: otwarty dostęp
Rights: CC BY 4.0
Attribution 4.0 International

Attribution 4.0 International (CC BY 4.0)