Towards textual data augmentation for neural networks: synonyms and maximum loss

Data augmentation is one of the ways to deal with labeled data scarcity and overfitting. Both of these problems are crucial for modern deep-learning algorithms, which require massive amounts of data. The problem is better explored in the context of image analysis than for text, this work is a step forward to help close this gap. We propose a method for augmenting textual data when training convolutional neural networks for sentence classification. The augmentation is based on the substitution of words using a thesaurus as well as Princeton University's WordNet. Our method improves upon the baseline in most of the cases. In terms of accuracy, the best of the variants is 1.2% (pp.) better than the baseline.

Access rights

Access: otwarty dostęp

Rights: CC BY 4.0

Attribution 4.0 International (CC BY 4.0)

URI

https://repo.agh.edu.pl/handle/AGH/113221

Collections

Artykuły (CN-csci)

Full item page