簡易檢索 / 詳目顯示

研究生: 許強
XU QIANG
論文名稱: 應用文字探勘技術進行博物館遊客情感分析
Applying Text Mining Techniques for Sentiment Analysis of Museum Visitor Reviews
指導教授: 施人英
Shih, Jen-Ying
口試委員: 施人英
Shih, Jen-Ying
何宗武
Ho, Tsung-Wu
江艾軒
Ching, Au-Hsuan
口試日期: 2024/07/04
學位類別: 碩士
Master
系所名稱: 全球經營與策略研究所
Graduate Institute of Global Business and Strategy
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 80
中文關鍵詞: 文字探勘博物館情緒分析TF-IDF主題模型關鍵詞共現
英文關鍵詞: Text Mining, Museum, Sentiment Analysis, TF-IDF, Topic Modeling, Keyword Co-occurrence
DOI URL: http://doi.org/10.6345/NTNU202401815
論文種類: 學術論文
相關次數: 點閱:169下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在當今時代,信息獲取的途徑已經變得多樣化,遊客除了在傳統的旅遊景點官方網站上收集信息,還越來越多地從旅遊評價網站(如TripAdvisor和Google Maps)上收集相關旅行經驗分享。這些信息對於景點的運營和管理至關重要。文字探勘技術在分析非結構化數據方面非常有效,並且可以為分析這類評論數據提供一種可行的研究方法。因此,本研究利用文字探勘技術對博物館評論進行分析,包括斷詞、TF-IDF向量化、特徵詞選取、關鍵詞共現、主題模型建立和情緒分析。通過關鍵詞共現和Leiden社群檢測,得到5個常見的討論話題,包含遊覽時長、著名藏品、入場情況、整體評價、及知名藝術家,以及某些僅屬於特定博物館的特定話題。並且,利用主題模型建立來識別每個主題的內容和重要性,其中部分主題與關鍵詞共現分析結果一致,進一步驗證了這些主題的重要性。此外,不同語言評論者對於博物館的關注重點也被發掘。通過計算遊客對博物館評論的情緒分析準確度,並對羅吉斯迴歸(LR)、隨機森林(RF)、支持向量機(SVM)及BERT模型進行深度比較。整體而言,綜合不同語言類別的評論進行情緒分析的結果來看,LR模型的預測效能最佳。此外,根據LR模型中單詞的係數,進行篩選後,形成了相對應的不同語言下的,關於博物館評論的正面和負面情緒詞典。這些結果呈現了主題分佈,並檢驗了特徵詞與情緒分析結果之間的關係。在本研究中,共從TripAdvisor上收集了八家世界知名博物館的英語、簡體中文和繁體中文的遊客評論,評論數據總數量約415,000條。這些研究結果能為提高博物館的管理及運營策略提供寶貴的見解。

    In an era where information is obtained through multiple channels, tourists increasingly gather experiential travel information from travel evaluation websites (such as TripAdvisor and Google Maps) in addition to official websites of traditional tourist attractions. This information is crucial for the operation and management of scenic spots. Text mining techniques are effective in analyzing unstructured data and thus provide a feasible research method for analyzing such review data. Therefore, this study utilized text mining to analyze museum review data, encompassing word segmentation, TF-IDF vectors, feature word selection, keyword co-occurrence, topic modeling, and sentiment analysis. Using keyword co-occurrence analysis and the Leiden community detection algorithm, five common discussion topics were identified: Visit Duration, Famous Collections, Entry Situation, Overall Rating, and Famous Artist, along with certain museum-specific topics. Additionally, topic modeling analysis was employed to identify the content and significance of each theme, with some topics aligning with the keyword co-occurrence analysis results, further validating their importance. Moreover, the focus areas of reviewers from different languages were uncovered. By calculating the accuracy of sentiment analysis for museum reviews, a comprehensive comparison of Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), and Bidirectional Encoder Representations from Transformers (BERT) models was conducted. Overall, sentiment analysis results across different language categories indicated that the LR model performed the best. Furthermore, by examining the coefficients of words in the LR model, a curated dictionary of positive and negative sentiment words for museum reviews in different languages was established. The results generated review topic distributions and examined the relationships between feature words and the outcomes of sentiment analysis. In this study, tourist reviews of eight world-renowned museums in English, Simplified Chinese, and Traditional Chinese were collected from TripAdvisor. The dataset was composed of approximately 415,000 reviews. These findings can provide valuable insights for enhancing the management and operation strategies of museums.

    ACKNOWLEDGMENT I ABSTRACT III TABLE OF CONTENTS IV LIST OF TABLES VI LIST OF FIGURES VIII CHAPTER 1 INTRODUCTION 1 1.1 RESEARCH BACKGROUND AND MOTIVATION 1 1.2 RESEARCH PURPOSES 3 1.3 RESEARCH QUESTIONS 4 1.4 RESEARCH PROCESSES 4 CHAPTER 2 LITERATURE REVIEW 7 2.1 TRIPADVISOR 7 2.2 MUSEUM 8 2.3 TEXT MINING 9 2.4 TF-IDF 10 2.5 KEYWORD CO-OCCURRENCE 11 2.6 TOPIC MODELING 12 2.7 SENTIMENT ANALYSIS 14 CHAPTER 3 RESEARCH METHODOLOGY 16 3.1 RESEARCH INSTRUMENT 16 3.2 DATA COLLECTION 16 3.3 DATA PREPROCESSING 18 3.4 DATA ANALYSIS METHOD 19 3.4.1 TF-IDF 19 3.4.2 Keyword Co-occurrence Analysis 19 3.4.3 Topic Modeling Analysis 20 3.4.4 Sentiment Analysis 20 CHAPTER 4 RESEARCH RESULTS AND FINDINGS 23 4.1 KEYWORD CO-OCCURRENCE ANALYSIS 23 4.1.1 English Reviews 23 4.1.2 Simplified Chinese Reviews 34 4.1.3 Traditional Chinese Reviews 36 4.2. TOPIC MODELING 38 4.3 SENTIMENT ANALYSIS 49 4.3.1 Comparisons of Sentiment Analysis Models 49 4.3.2 Regression Analysis 55 4.3.3 Sentiment Dictionary 56 CHAPTER 5 CONCLUSIONS AND SUGGESTIONS 64 5.1 DISCUSSIONS AND CONCLUSIONS 64 5.2 RESEARCH CONTRIBUTIONS 67 5.3 RESEARCH LIMITATIONS AND SUGGESTIONS 68 REFERENCE 70 APPENDIX 80 WORDS REMOVED IN KEYWORD CO-OCCURRENCE ANALYSIS 80

    English
    Agrawal, R., Imieliński, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. Proceedings of the 1993 ACM SIGMOD international conference on Management of data,
    Banerjee, S., & Chua, A. Y. (2016). In search of patterns among travellers' hotel ratings in TripAdvisor. Tourism management, 53, 125-131.
    Barde, B. V., & Bainwad, A. M. (2017). An overview of topic modeling methods and tools. 2017 International Conference on Intelligent Computing and Control Systems (ICICCS),
    Barreda, A., & Bilgihan, A. (2013). An analysis of user‐generated content for hotel experiences. Journal of Hospitality and Tourism Technology, 4(3), 263-280.
    Bhuyan, A., Sanguri, K., & Sharma, H. (2021). Improving the keyword co-occurrence analysis: An integrated semantic similarity approach. 2021 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM),
    Bunghez, C. L. (2016). The importance of tourism to a destination’s economy. Journal of Eastern Europe Research in Business & Economics, 2016, 1-9.
    Capstick, B. (1985). Museums and tourism. Museum management and curatorship, 4(4), 365-372.
    Cárdenas-García, P. J., Sánchez-Rivero, M., & Pulido-Fernández, J. I. (2015). Does tourism growth influence economic development? Journal of travel Research, 54(2), 206-221.
    Chen, G., Liu, H., Yu, L., Wei, Q., & Zhang, X. (2006). A new approach to classification based on association rule mining. Decision Support Systems, 42(2), 674-689.
    Ding, L., & Romainoor, N. H. (2022). A study on the perception of Sichuan Museum tourism experience based on web text analysis. Journal of Social Science and Humanities, 5(5), 01-09.
    Dirsehan, T. (2015). Gaining competitive advantage in tourism marketing: a text mining approach to hotel visitors’ comments in Durres.
    Dirsehan, T. (2016). An application of text mining to capture and analyze eWOM: A pilot study on tourism sector. In Capturing, analyzing, and managing word-of-mouth in the digital marketplace (pp. 168-186). IGI Global.
    Ferreira‐Mello, R., André, M., Pinheiro, A., Costa, E., & Romero, C. (2019). Text mining in education. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(6), e1332.
    Gao, L., Chang, E., & Han, S. (2005). Powerful tool to expand business intelligence: Text mining. Transactions on ENFORMATIKA, Systems Sciences and Engineering, vol. 8, ESSE 2005, Budapest, 110-115.
    Garg, M., & Kumar, M. (2018). The structure of word co-occurrence network for microblogs. Physica A: Statistical Mechanics and its Applications, 512, 698-720.
    Geetha, M., Singha, P., & Sinha, S. (2017). Relationship between customer sentiment and online customer ratings for hotels-An empirical analysis. Tourism management, 61, 43-54.
    Guerreiro, J., & Rita, P. (2020). How to predict explicit recommendations in online reviews using text mining and sentiment analysis. Journal of Hospitality and Tourism Management, 43, 269-272.
    Günay, B. (2012). Museum concept from past to present and importance of museums as centers of art education. Procedia-Social and behavioral sciences, 55, 1250-1258.
    Gupta, V., & Lehal, G. S. (2009). A survey of text mining techniques and applications. Journal of emerging technologies in web intelligence, 1(1), 60-76.
    Hearst, M. (2003). What is text mining. SIMS, UC Berkeley, 5.
    Hong, T.-P., Lin, C.-W., Yang, K.-T., & Wang, S.-L. (2013). Using TF-IDF to hide sensitive itemsets. Applied intelligence, 38, 502-510.
    Howard, J. (2018). Talk-Back Boards and Text Mining: New Digital Approaches in Museum Visitor Studies. Current Research in Digital History, 1.
    Huang, S., Cai, N., Pacheco, P. P., Narrandes, S., Wang, Y., & Xu, W. (2018). Applications of support vector machine (SVM) learning in cancer genomics. Cancer genomics & proteomics, 15(1), 41-51.
    Hussein, D. M. E.-D. M. (2018). A survey on sentiment analysis challenges. Journal of King Saud University-Engineering Sciences, 30(4), 330-338.
    Jeacle, I., & Carter, C. (2011). In TripAdvisor we trust: Rankings, calculative regimes and abstract systems. Accounting, Organizations and Society, 36(4-5), 293-309.
    Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., & Zhao, L. (2019). Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimedia tools and applications, 78, 15169-15211.
    Kotler, N. (2001). New ways of experiencing culture: the role of museums and marketing implications. Museum management and curatorship, 19(4), 417-425.
    Krishna, B. R., & Sushma, B. (2012). Novel approach to museums development & emergence of text mining. International Journal of Computer Technology and Electronics Engineering (IJCTEE, 2(2).
    Kuang, D., Choo, J., & Park, H. (2015). Nonnegative matrix factorization for interactive topic modeling and document clustering. Partitional clustering algorithms, 215-243.
    Kumar, B. S., & Ravi, V. (2016). A survey of the applications of text mining in financial domain. Knowledge-Based Systems, 114, 128-147.
    Kumbhare, T. A., & Chobe, S. V. (2014). An overview of association rule mining algorithms. International Journal of Computer Science and Information Technologies, 5(1), 927-930.
    Lee, H. A., Law, R., & Murphy, J. (2011). Helpful reviewers in TripAdvisor, an online travel community. Journal of Travel & Tourism Marketing, 28(7), 675-688.
    Li, X., Wu, X., Hu, X., Xie, F., & Jiang, Z. (2008). Keyword extraction based on lexical chains and word co-occurrence for Chinese news web pages. 2008 IEEE International Conference on Data Mining Workshops,
    Liu, Y., & Wu, H. (2017). Prediction of road traffic congestion based on random forest. 2017 10th international symposium on computational intelligence and design (ISCID),
    Mavragani, E. (2018). Museum services in the era of tourism. The future of museums, 37-47.
    Mavragani, E. (2021). Greek museums and tourists’ perceptions: An empirical research. Journal of the Knowledge Economy, 12(1), 120-133.
    Miguéns, J., Baggio, R., & Costa, C. (2008). Social media and tourism destinations: TripAdvisor case study. Advances in tourism research, 26(28), 1-6.
    Mouthami, K., Devi, K. N., & Bhaskaran, V. M. (2013). Sentiment analysis and classification based on textual reviews. 2013 international conference on Information communication and embedded systems (ICICES),
    Nadi-Ravandi, S., & Batooli, Z. (2022). Gamification in education: A scientometric, content and co-occurrence analysis of systematic review and meta-analysis articles. Education and Information Technologies, 27(7), 10207-10238.
    Sahu, S. (2024a). Addressing Internally-Disconnected Communities in Leiden and Louvain Community Detection Algorithms. arXiv preprint arXiv:2402.11454.
    Sahu, S. (2024b). A Starting Point for Dynamic Community Detection with Leiden Algorithm. arXiv preprint arXiv:2405.11658.
    Shao, J., Ying, Q., Shu, S., Morrison, A. M., & Booth, E. (2019). Museum tourism 2.0: experiences and satisfaction with shopping at the national gallery in London. Sustainability, 11(24), 7108.
    Shaukat, K., Zaheer, S., & Nawaz, I. (2015). Association rule mining: An application perspective. International Journal of Computer Science and Innovation, 2015(1), 29-38.
    Simeon, M. I., Buonincontri, P., Cinquegrani, F., & Martone, A. (2017). Exploring tourists’ cultural experiences in Naples through online reviews. Journal of Hospitality and Tourism Technology, 8(2), 220-238.
    Soucy, P., & Mineau, G. W. (2005). Beyond TFIDF weighting for text categorization in the vector space model. IJCAI,
    Stepchenkova, S., & Shichkova, E. (2017). Attractiveness of the United States as a travel destination for the Russian tourist in the era of strained bilateral relations. International Journal of Tourism Cities, 3(1), 87-101.
    Stynes, D. J. (1997). Economic impacts of tourism.
    Taecharungroj, V., & Mathayomchan, B. (2019). Analysing TripAdvisor reviews of tourist attractions in Phuket, Thailand. Tourism management, 75, 550-568.
    Talib, R., Hanif, M. K., Ayesha, S., & Fatima, F. (2016). Text mining: techniques, applications and issues. International journal of advanced computer science and applications, 7(11), 414-418.
    Tan, A.-H. (1999). Text mining: The state of the art and the challenges. Proceedings of the pakdd 1999 workshop on knowledge disocovery from advanced databases,
    Taqi, M., Gurkaynak, N., & Gencer, M. (2019). Marketing concept evolution: A bibliometrics co-occurrence analysis. Marketing and Management of Innovations(2), 185-197.
    Tudor, I. (2008). Association rule mining as a data mining technique. Seria Matematic Informatic Fizic Buletin, 1, 49-56.
    Vayansky, I., & Kumar, S. A. (2020). A review of topic modeling methods. Information Systems, 94, 101582.
    Xiang, Z., Du, Q., Ma, Y., & Fan, W. (2017). A comparative analysis of major online review platforms: Implications for social media analytics in hospitality and tourism. Tourism management, 58, 51-65.
    Xiang, Z., Schwartz, Z., Gerdes Jr, J. H., & Uysal, M. (2015). What can big data and text analytics tell us about hotel guest experience and satisfaction? International journal of hospitality management, 44, 120-130.
    Ye, Q., Li, H., Wang, Z., & Law, R. (2014). The influence of hotel price on perceived service quality and value in e-tourism: An empirical investigation based on online traveler reviews. Journal of Hospitality & Tourism Research, 38(1), 23-39.
    Zapotichna, R. (2021). ADVANTAGES AND DISADVANTAGES OF USING REGRESSION ANALYSIS IN ECONOMIC RESEARCHES. «Сучасна молодь в світі інформаційних технологій»: матеріали, 106.
    Zhang, W., Yoshida, T., & Tang, X. (2008). TFIDF, LSI and multi-word in information retrieval and text categorization. 2008 IEEE International Conference on Systems, Man and Cybernetics,
    Zhu, L., Liu, X., He, S., Shi, J., & Pang, M. (2015). Keywords co-occurrence mapping knowledge domain research base on the theory of Big Data in oil and gas industry. Scientometrics, 105, 249-260.
    Chinese
    廖秀珠. (2022). 運用關聯規則探討大學生對餐食選擇偏好之研究-以桃園市某大學為例. 萬能科技大學資訊管理研究所在職專班碩士論文,1-129.
    伊沙林(Salim Josef Hoy). (2015). 使用確定性統計和回歸分析對貝里斯國際機場服務的旅客滿意度研究. 國立臺北科技大學管理國際學生碩士專班碩士論文,1-65.
    張文鴻. (2020). 應用最小生成樹與效用分析於關聯規則之研究. 國立中央大學企業管理學系碩士論文,1-48.
    URL
    British Museum on Tripadvisor. Retrieved from https://www.tripadvisor.com/Attraction_Review-g186338-d187555-Reviews-or43570-The_British_Museum-London_England.html (Jan 10, 2024)
    Introduction about TripAdvisor. Retrieved from https://tripadvisor.mediaroom.com/CN-about-us (Feb 23, 2024)
    Louvre Museum on Tripadvisor. Retrieved from https://www.tripadvisor.com/Attraction_Review-g187147-d188757-Reviews-Louvre_Museum-Paris_Ile_de_France.html (Jan 8, 2024)
    Metropolitan Museum of Art on Tripadvisor. Retrieved from https://www.tripadvisor.com/Attraction_Review-g60763-d105125-Reviews-The_Metropolitan_Museum_of_Art-New_York_City_New_York.html (Jan 8, 2024)
    Musée d'Orsay on Tripadvisor. Retrieved from https://www.tripadvisor.com/Attraction_Review-g187147-d188150-Reviews-Musee_d_Orsay-Paris_Ile_de_France.html (Jan 8, 2024)
    National Palace Museum in Taipei on Tripadvisor. Retrieved from https://www.tripadvisor.com/Attraction_Review-g13806879-d321216-Reviews-National_Palace_Museum-Shilin_Taipei.html (Jan 10, 2024)
    Prado Museum on Tripadvisor. Retrieved from https://www.tripadvisor.com/Attraction_Review-g187514-d190143-Reviews-Prado_National_Museum-Madrid.html (Jan 9, 2024)
    Rijksmuseum on Tripadvisor. Retrieved from https://www.tripadvisor.com/Attraction_Review-g188590-d189379-Reviews-Rijksmuseum-Amsterdam_North_Holland_Province.html (Jan 9, 2024)
    Royal Ontario Museum on Tripadvisor. Retrieved from https://www.tripadvisor.com/Attraction_Review-g155019-d155481-Reviews-Royal_Ontario_Museum-Toronto_Ontario.html (Jan 10, 2024)
    Vatican Museum on Tripadvisor. Retrieved from https://www.tripadvisor.com/Attraction_Review-g187793-d191000-Reviews-Vatican_Museums-Vatican_City_Lazio.html (Jan 10, 2024)
    Wikipedia about TripAdvisor. Retrieved from https://en.wikipedia.org/wiki/Tripadvisor (Feb 15, 2024)

    下載圖示
    QR CODE