Diacritic-aware Yorùbá spell checker

Spell checking and correction is still in its infancy for the Yorùbá language, existing tools cannot be directly applied to address the problem, as Yorùbá uses diacritics extensively for distinguishing phonemes and for marking tone. A model was formulated as a parallel combination of a unigram language model and a diacritic model to form a dictionary sub-model that can be used by error-detection and candidate-generation modules. The candidate-generation module was implemented as a reverse Levensthein edit-distance algorithm. The system was evaluated by using detection accuracy (calculated from the precision and recall) and suggestion accuracy (SA) as metrics. Our experimental setups compared the performance of the component subsystems when used alone and with their combination into a unified model. The detection accuracies for the different models range from 93.23 to 95.01%, and the suggestion accuracies range from 26.94 to 72.10%. The results indicated that each of the sub-models in the dictionary played different roles.

Access rights

Access: otwarty dostęp

Rights: CC BY 4.0

Attribution 4.0 International (CC BY 4.0)

URI

https://repo.agh.edu.pl/handle/AGH/113321

Collections

Artykuły (CN-csci)

Full item page