Repository logo
Article

Towards textual data augmentation for neural networks: synonyms and maximum loss

Loading...
Thumbnail Image

Date

Presentation Date

Editor

Other contributors

Access rights

Access: otwarty dostęp
Rights: CC BY 4.0
Attribution 4.0 International

Attribution 4.0 International (CC BY 4.0)

Other title

Resource type

Version

wersja wydawnicza
Item type:Journal Issue,
Computer Science
2019 - Vol. 20 - No. 1

Pagination/Pages:

pp. 57-83

Research Project

Event

Description

Bibliogr. s. 79-83.

Abstract

Data augmentation is one of the ways to deal with labeled data scarcity and overfitting. Both of these problems are crucial for modern deep-learning algorithms, which require massive amounts of data. The problem is better explored in the context of image analysis than for text, this work is a step forward to help close this gap. We propose a method for augmenting textual data when training convolutional neural networks for sentence classification. The augmentation is based on the substitution of words using a thesaurus as well as Princeton University's WordNet. Our method improves upon the baseline in most of the cases. In terms of accuracy, the best of the variants is 1.2% (pp.) better than the baseline.

Access rights

Access: otwarty dostęp
Rights: CC BY 4.0
Attribution 4.0 International

Attribution 4.0 International (CC BY 4.0)