搜档网
当前位置:搜档网 › 高维数据流的自适应子空间聚类算法

高维数据流的自适应子空间聚类算法

*The Natural Science Foundation of Hebei Province of China under Grant No. F2010001298(河北省自然科学基金). Received 2010-05, Accepted 2010-07.

ISSN 1673-9418 CODEN JKYTA8 E-mail: fcst@https://www.sodocs.net/doc/8312438796.html, Journal of Frontiers of Computer Science and Technology https://www.sodocs.net/doc/8312438796.html, 1673-9418/2010/04(09)-0859-06 Tel: +86-10-51616056 DOI: 10.3778/j.issn.1673-9418.2010.09.009 高维数据流的自适应子空间聚类算法*

任家东1,2, 周玮玮1+, 何海涛1

1. 燕山大学 信息科学与工程学院, 河北 秦皇岛 066004

2. 北京理工大学 计算机科学技术学院, 北京 100081

Adaptive Clustering Algorithm for Mining Subspace Clusters in High-Dimensional Data Stream *

REN Jiadong 1,2, ZHOU Weiwei 1+, HE Haitao 1

1. College of Information Science and Engineering, Yanshan University, Qinhuangdao, Hebei 066004, China

2. School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China

+ Corresponding author: E-mail: dingxiangyu_100200@https://www.sodocs.net/doc/8312438796.html,

REN Jiadong, ZHOU Weiwei, HE Haitao. Adaptive clustering algorithm for mining subspace clusters in high-dimensional data stream. Journal of Frontiers of Computer Science and Technology, 2010, 4(9): 859-864. Abstract: Clustering high-dimensional data streams is a research focused on the area of data mining. As the data stream is large volume, rapidly, high-dimensional, many clustering algorithms cannot achieve good clustering quality. This paper proposes a new adaptive clustering algorithm for mining subspace clusters in high-dimensional data stream, called SAStream. It improves the cluster structure in HPStream and defines the candidate clusters. The algorithm only computes the distance between the newly coming data points and the centroids of the candidate clusters instead of all clusters, so the number of examined clusters is reduced during clustering process. The created clusters are stored in pyramidal time frame and time fading function is used to discount the history of past behavior. When the data rate is fast, the LimitingRadius and cluster selection factor adjust automatically, and the clustering granularity adjusts all along. The experimental results show that the algorithm can group well with high speed.

Key words: high-dimensional data stream; subspace clustering; data rate; adaptive

相关主题