Basic Search / Detailed Display

Author: 蕭詠文
Hsiao, Yung-Wen
Thesis Title: Clustering analysis of trajectory data: Comparison of mixture of regression models and hierarchical clustering with dynamic time warping
Clustering analysis of trajectory data: Comparison of mixture of regression models and hierarchical clustering with dynamic time warping
Advisor: 蔡碧紋
Tsai, Pi-Wen
Degree: 碩士
Master
Department: 數學系
Department of Mathematics
Thesis Publication Year: 2019
Academic Year: 107
Language: 英文
Number of pages: 36
Keywords (in Chinese): 混合回歸模型階層式分群法動態時間扭曲法
Keywords (in English): Mixture of regression models, Hierarchical clustering, Dynamic time warping
DOI URL: http://doi.org/10.6345/NTNU201900857
Thesis Type: Academic thesis/ dissertation
Reference times: Clicks: 139Downloads: 16
Share:
School Collection Retrieve National Library Collection Retrieve Error Report
  • 路徑資料為對應著時間的曲線資料,常見於許多領域如氣候、時間序列等。而路徑資料的分群為統計分析中重要的一環,透過分群我們將相似的資料分為一群,藉此我們可以分析各群的性質甚至預測下一個資料屬於的集群。這篇論文中我們使用了兩種分群方法,混合回歸模型(mixture of regression models)和應用動態時間扭曲法的階層分群法(hierarchical clustering with dynamic time warping),透過模擬以及實際資料的分析將之做比較。
    在模擬中我們以分群的正確率來比較兩個方法在不同情況下的表現,以及討論了混合回歸模型在不同情況下參數估計的結果。根據模擬結果,兩個方法並沒有絕對的優劣,而是在不同情況下擁有各自的優勢。最後則是將這兩個方法分別應用在實際資料的分析上。

    The clustering of trajectory data is an important part of statistical analysis. Trajectory data is curve data corresponding to time. Through clustering, we divide similar curves into groups, so that we can analyze the properties of each group. Two methods are studied: one is model-based clustering, mixture of regression models, and the other is hierarchical clustering with dynamic time warping. These two methods are compared by simulation study.
    In the simulation, we discuss the results of the parameter estimation of the mixture of regression models, and compare the performance of the two methods in different situations by the correct clustering rate. According to the simulation results, the two methods have their own advantages in different situations. Additionally, the two clustering methods are applied to a practical data.

    1 Introduction 1 2 Mixture of Regression Models 4 2.1 Model 4 2.2 EM Algorithm 6 2.3 Example 9 3 Hierarchical Clustering with Dynamic Time Warping 12 3.1 Hierarchical Clustering 12 3.2 Dynamic Time Warping 14 3.3 Example 19 4 Simulation Study 21 4.1 Clustering Results of mixture of regression models 23 4.2 Comparing clustering results of two methods 26 5 Practical Data Analysis 28 6 Conclusions 32 References 34

    Camargo, S. J., Robertson, A. W., Gaffney, S. J., Smyth, P., & Ghil, M. (2007). Cluster analysis of typhoon tracks. Part I: General properties. Journal of Climate, 20(14), 3635--3653.
    Celeux, G. (1985). The sem algorithm: a probabilistic teacher algorithm derived from the em algorithm for the mixture problem. Computational Statistics Quarterly, 2, 73--82.
    Celeux, G., & Govaert, G. (1992). A classi cation em algorithm for clustering and two stochastic versions. Computational Statistics & Data Analysis, 14 (3), 315--332.
    Defays, D. (1977). An efficient algorithm for a complete link method. The Computer Journal, 20 (4), 364--366.
    DeSarbo, W. S., & Cron, W. L. (1988). A maximum likelihood methodology for clusterwise linear regression. Journal of Classi cation, 5 (2), 249--282.
    Draper, N. R., & Smith, H. (1981). Applied regression analysis 2nd ed. New York: John Wiley & Sons.
    Gaffney, S. (2004). Probabilistic curve-aligned clustering and prediction with regression mixture models (Ph.D. dissertation). University of California, Irvine.
    Gaffney, S., & Smyth, P. (1999). Trajectory clustering with mixtures of regression models. In Proceedings of the fth acm sigkdd international conference on knowledge discovery and data mining (pp. 63--72).
    Izakian, H., Pedrycz, W., & Jamal, I. (2015). Fuzzy clustering of time series data using dynamic time warping distance. Engineering Applications of Arti cial Intelligence, 39, 235--244.
    Lee, J. G., Han, J., & Whang, K. Y. (2007). Trajectory clustering: a partition-and-group framework. In Proceedings of the 2007 acm sigmod international conference on management of data (pp. 593--604).
    Leisch, F. (2004). FlexMix: A general framework for nite mixture models and latent class regression in R. Journal of Statistical Software, 11 (i08), 1-18.
    Leisch, F., & Gruen, B. (2012). Package 'flexmix'. Information found at https://cran.r-project.org/web/packages/flexmix/flexmix.pdf.
    Morris, B., & Trivedi, M. (2009). Learning trajectory patterns by clustering: Experimental studies and comparative evaluation. In 2009 ieee conference on computer vision and pattern recognition (pp. 312--319).
    Niennattrakul, V., & Ratanamahatana, C. A. (2007). On clustering multimedia time series data using K-means and dynamic time warping. In Proceedings of the 2007 international conference on multimedia and ubiquitous engineering (pp. 733--738).
    Sakoe, H., & Chiba, S. (1971). A dynamic programming approach to continuous speech recognition. In Proceedings of the seventh international congress on acoustics (Vol. 3, p. 65-69).
    Sakoe, H., & Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26(1), 43--49.
    Sardá-Espinosa, A. (2017). Comparing time-series clustering algorithms in r using the dtwclust package. Vienna: R Development Core Team.
    Sarda-Espinosa, A. (2019). Package `dtwclust'. Information found at https://cran.r-project.org/web/packages/dtwclust/dtwclust.pdf.
    Schwarz, G., et al. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461--464.
    Sibson, R. (1973). Slink: an optimally efficient algorithm for the single-link cluster method. The Computer Journal, 16(1), 30--34.
    Wilks, D. S. (2011). Statistical methods in the atmospheric sciences (Vol. 100). Academic Press.
    Zheng, Y. (2015). Trajectory data mining: an overview. ACM Transactions on Intelligent Systems and Technology (TIST), 6(3), 1--41.

    下載圖示
    QR CODE