簡易檢索 / 詳目顯示

研究生: 蔡詠名
Tsai, Yung-Ming
論文名稱: 應用非監督式機器學習於多維度路網資料之探勘
An Unsupervised Machine Learning Approach for Multi-Dimensional Network Data Mining
指導教授: 張國楨
Chang, Kuo-Chen
學位類別: 碩士
Master
系所名稱: 地理學系
Department of Geography
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 48
中文關鍵詞: 路網分析多維度非監督式機器學習K-Medoids
英文關鍵詞: network analysis, multi-dimension, unsupervised machine learning, K-Medoids
DOI URL: http://doi.org/10.6345/NTNU202000822
論文種類: 學術論文
相關次數: 點閱:209下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來由於智慧型運輸系統、物聯網科技以及無線網路科技的進步,加上政府機關對於資料開放的支持,目前可取得大量的交通資料。這些資料具有詳細的空間與時間資訊,甚至更複雜的資料維度。為了萃取隱藏在資料當中的重要資訊,勢必需要多維度的資料分析方法。
      本研究提出多維度路網資料的非監督式機器學習方法,用以分析多維度的交通路網資料。演算法利用多維度路網加權矩陣,計算路網在多維度中的距離,並結合K-Medoids演算法適用於離散資料之特性,發展集群分析演算法。為解決K-Medoids集群分析演算法對於初始集群種子與K值的敏感性,演算法採用兩個解決方案。首先,演算法以系統性間距採樣產生初始種子,降低演算法的隨機因素。集群分析演算法中導入集群分割與集群合併的方法,用以彌補初始種子選擇不佳對於結果的影響力。
      從高速公路車流量的集群分析中,可以發現演算法具有下列優勢。首先,演算法具有一致性與可靠性。由於系統性間距採樣降低了演算法的隨機要素,因此當給予相同的輸入資料與參數,可以預期演算法產出相同的集群結果。不同的K值對於結果的影響較低,但是適當的K值選擇對於演算法的效能有其助益。集群結果顯示演算法忠於路網的拓樸關係,距離相近但路網距離差距甚遠的資料不會被分配在同一個集群中。演算法也能成功辨識跨路網的交通樣態。集群結果也顯示演算法能分辨在時間與車流量維度的特徵的差異,將具有特殊時間或車流量樣態的資料具為一類。
      本研究的結果可以提供運輸管理、物流、交通地理等領域一個系統性分析時空或多維度路網資料的取徑,從集群中心可得知資料樣態的規則,而集群也能做為可操作的單元,供進一步的決策使用。

    In recent years, with the advanced of ITS, IoT and wireless communication technology, and also the positive attitudes toward open data from the government, we can retrieve a big amount of traffic data. These data contain detailed spatial and temporal information, and even features with complicated data dimension. In order to extract useful information hidden within the data, a multi-dimensional data analysis technique are required to extract useful information hidden in the data.
    This study designs an unsupervised machine learning approach for multi-dimensional network data. The algorithm adopts the concepts of network weight matrix and space-time matrix to calculate multi-dimensional distances in the network space. In combine with K-Medoids algorithm, which has the capability of dealing with discrete data, a clustering algorithm is developed. To solve the problems of the sensitivity to initial seeds and K value of K-Medoids algorithm, two methods are adopted. First, a systematic sampling approach for seeds generation is adopted to cut down on the randomness of the algorithm. Cluster splitting and merging method is introduced to compensate the poor seeds selection in the initial phase.
    From the case of highway traffic clustering, the algorithm demonstrates several advantages. First, the algorithm possesses consistency and robustness. Because systematic sampling seeds generation removes the randomness of the algorithm, the results can be expected throughout several experiments giving the same inputs and parameters. The algorithm also demonstrates that it respects the topology of the highway network. Features that are proximate in space but distant in network space will not be assigned as the same clusters. The algorithm can also recognize cross-system traffic patterns. The results of clustering also demonstrate that the algorithm can identify the difference in temporal dimension and the data dimension of traffic. Features with unique temporal and traffic patterns will be grouped together
    This study can provide an approach for systematically analyse space-time or multi-dimensional network data, which can be used in researches like transportation management, logistics and transportation geography. The medoids of the clusters can serve as the rules for traffic patterns. Also, the clusters can be used as operational units for further decision making.

    Chapter 1 Introduction 1 Chapter 2 Literature Review 5 (1) Application of network analysis in geographic information system 5 (2) Data mining 8 (3) Machine learning 9 Chapter 3 Methodology 13 (1) Algorithm 15 Chapter 4 Model Evaluation 21 (1) Study Area and Materials 21 (2) Case Study Design 24 Chapter 5 Results and Discussion 29 (1) Consistency and Robustness 30 (2) Topological Rules 32 (3) Multi-Dimensional Patterns 34 Chapter 6 Conclusion and Future Works 37 (1) Conclusion 37 (2) Future Works 38

    Ankerst, M., Breunig, M. M., Kriegel, H. P., and Sander, J. (1999). OPTICS: Ordering Points To Identify the Clustering Structure. ACM Sigmod record, 28(2), 49-60.
    Anselin, L. (1995). Local Indicators of Spatial Association - LISA. Geographical Analysis, 27(2), 93-115.
    Ball, G. H., and Hall, D. J. (1965). ISODATA, a novel method of data analysis and pattern classification. Retrieved from http://www.dtic.mil/dtic/tr/fulltext/u2/699616.pdf
    Black, W. R. (1991). Highway accidents: a spatial and temporal analysis. Transportation Research Record, 1318, 75-82.
    Black, W. R. (1992). Network autocorrelation in transport network and flow systems. Geographical Analysis, 24(3), 207-222.
    Black, W. R., and Thomas, I. (1998). Accidents on Belgium's motorways: a network autocorrelation analysis. Journal of Transport Geography, 6(1), 23-31.
    Borruso, G. (2008). Network Density Estimation: A GIS Approach for Analysing Point Patterns in a Network Space. Transactions in GIS(3), 377.
    Chang, T. H., Chen, A. Y., Hsu, Y. T., and Yang, C. L. (2016). Freeway travel time prediction based on seamless spatio-temporal data fusion: case study of the freeway in Taiwan. Transportation Research Procedia, 17, 452-459.
    Cliff, A. D., and Ord, J. K. (1981). Spatial processes : models and applications: London : Pion, c1981.
    Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. Paper presented at the Kdd.
    Getis, A., and Ord, J. K. (1992). The Analysis of Spatial Association by Use of Distance Statistics. Geographical Analysis, 24(3), 189-206. doi:10.1111/j.1538-4632.1992.tb00261.x
    Han, J., Lee, J. G., and Kamber, M. (2009). An overview of clustering methods in geographic data analysis. Geographic data mining and knowledge discovery, 2, 149-170.
    Han, J., Pei, J., and Kamber, M. (2011). Data mining: concepts and techniques: Elsevier.
    Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651-666. doi:10.1016/j.patrec.2009.09.011
    Knox, E., and Bartlett, M. (1964). The detection of space-time interactions. Journal of the Royal Statistical Society., 13(1), 25-30.
    Li, C. S., and Chen, M. C. (2014). A data mining based approach for travel time prediction in freeway with non-recurrent congestion. Neurocomputing, 133, 74-83.
    Lu, C.-T., Boedihardjo, A. P., and Shekhar, S. (2009). Analysis of spatial data with map cubes: highway traffic data. Geographic Data Mining Knowledge Discovery, 2nd edition, 69-97.
    MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Paper presented at the Proceedings of the fifth Berkeley symposium on mathematical statistics and probability.
    Mannila, H. (1996). Data mining: machine learning, statistics, and databases. Paper presented at the ssdbm.
    Mennis, J., and Guo, D. (2009). Spatial data mining and geographic knowledge discovery—An introduction. Computers, Environment and Urban Systems, 33(6), 403-408.
    Miller, H. J., and Han, J. (2009). Geographic data mining and knowledge discovery: CRC Press.
    Moran, P. A. (1950). Notes on continuous stochastic phenomena. Biometrika, 37(1-2), 17-23.
    Pacifici, F., Chini, M., and Emery, W. J. (2009). A neural network approach using multi-scale textural metrics from very high-resolution panchromatic imagery for urban land-use classification. Remote Sensing of Environment, 113(6), 1276-1292.
    Pal, M., and Foody, G. M. (2010). Feature selection for classification of hyperspectral data by SVM. IEEE Transactions on Geoscience Remote Sensing, 48(5), 2297-2307.
    Richards, J. A., and Jia, X. (1999). Remote sensing digital image analysis : an introduction: New York : Springer, 1999. 3rd ed.
    Rodriguez-Galiano, V. F., Ghimire, B., Rogan, J., Chica-Olmo, M., and Rigol-Sanchez, J. P. (2012). An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS Journal of Photogrammetry Remote Sensing, 67, 93-104.
    Selvi, H. Z., and Caglar, B. (2018). Using cluster analysis methods for multivariate mapping of traffic accidents. Open Geosciences, 10(1), 772-781.
    Soltani, A., and Askari, S. (2017). Exploring spatial autocorrelation of traffic crashes based on severity. Injury, 48(3), 637-647. doi:https://doi.org/10.1016/j.injury.2017.01.032
    Steenberghen, T., Aerts, K., and Thomas, I. (2010). Spatial clustering of events on a network. Journal of Transport Geography, 18(3), 411-418.
    Tan, P.-N., Steinbach, M., and Kumar, V. (2006). Introduction to data mining: Boston : Pearson Addison Wesley, c2006.
    1st ed.
    Upton, G., and Fingleton, B. (1986). Spatial data analysis by example. Volume 1: Point pattern and quantitative data: Royal Statistical Society.
    Vinod, H. D. (1969). Integer programming and the theory of grouping. Journal of the American Statistical association, 64(326), 506-519.
    Yao, X. (2003). Research issues in spatio-temporal data mining. Paper presented at the Workshop on Geospatial Visualization and Knowledge Discovery, University Consortium for Geographic Information Science, Virginia.
    Yao, X., Zhu, D., Gao, Y., Wu, L., Zhang, P., and Liu, Y. (2018). A Stepwise Spatio-Temporal Flow Clustering Method for Discovering Mobility Trends. IEEE Access, 6, 44666-44675.
    Zhou, Q. (2018). Traffic flow data analysis and mining method based on clustering recognition algorithm. Advances in Transportation Studies, 3.

    無法下載圖示 電子全文延後公開
    2025/07/20
    QR CODE