Presentation + Paper
14 May 2019 Simple linear regression model based data clustering
Author Affiliations +
Abstract
KMeans is one of most popular algorithms in data mining (ranking number 2) and has be widely used in many fields. KMeans uses Euclidean distance to compare two data. However Euclidean distance is sensitive to linear transform in data collection process. Due to these linear transforms, the distance between two data points for the same class (intra-class distance) may larger than those for different classes (inter-class distance) that may cause low clustering performance for KMeans algorithm. In this paper, we propose simple linear regression approach for data clustering. Instead of using Euclidean distance to measure the difference, we recommend using the goodness of fitting (or normalized cross correlation) to measure the similarity and compare two data points. Using this new data comparison technique, we introduce linear regression approach for data clustering and demonstrate that the proposed method has higher performance and low computational cost than KMeans methods.
Conference Presentation
© (2019) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Bingcheng Li "Simple linear regression model based data clustering", Proc. SPIE 10988, Automatic Target Recognition XXIX, 109880A (14 May 2019); https://doi.org/10.1117/12.2518037
Lens.org Logo
CITATIONS
Cited by 1 patent.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Data modeling

Data processing

Distance measurement

Algorithm development

Computer security

Data compression

Data mining

Back to Top