Unsupervised labeling of data for supervised learning and its application to medical claims prediction

The task identifying changes and irregularities in medical insurance claim payments is a difficult process of which the traditional practice involves querying historical claims databases and flagging potential claims as normal or abnormal. Because what is considered as normal payment is usually unknown and may change over time, abnormal payments often pass undetected, only to be discovered when the payment period has passed. This paper presents the problem of on-line unsupervised learning from data streams when the distribution that generates the data changes or drifts over time. Automated algorithms for detecting drifting concepts in a probability distribution of the data are presented. The idea behind the presented drift detection methods is to transform the distribution of the data within a sliding window into a more convenient distribution. Then, a test statistics p-value at a given significance level can be used to infer the drift rate, adjust the window size and decide on the status of the drift. The detected concepts drifts are used to label the data, for subsequent learning of classification models by a supervised learner. The algorithms were tested on several synthetic and real medical claims data sets.

Access rights

Access: otwarty dostęp

Rights: CC BY 4.0

Attribution 4.0 International (CC BY 4.0)

URI

https://repo.agh.edu.pl/handle/AGH/49106

Collections

Artykuły (CN-csci)

Full item page