研究生: |
韓怡臻 Han, Yi-Chen |
論文名稱: |
應用自動文字探勘於臺灣中文饒舌音樂歌詞之研究 A Study on Text Mining of Chinese Rap Music in Taiwan |
指導教授: |
Ke, Hao-Ren |
學位類別: |
碩士 Master |
系所名稱: |
圖書資訊學研究所 Graduate Institute of Library and Information Studies |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 中文 |
論文頁數: | 127 |
中文關鍵詞: | 饒舌 、文字探勘 、詞頻分析 、分群 、分類 |
英文關鍵詞: | Rap, Text Mining, Word Frequency Analysis, Clustering, Classification |
DOI URL: | http://doi.org/10.6345/NTNU202100327 |
論文種類: | 學術論文 |
相關次數: | 點閱:468 下載:27 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
After entering the millennium, rap songs have gradually entered the mainstream music market and are very popular among young people. Rappers often express their emotions or express criticism of society through their own lyrics. Understanding the content of rap music lyrics can also understand contemporary culture and social atmosphere. The purpose of this study is to explore possible thematic types in Chinese rap music lyrics in Taiwan through text mining.
This study first conducted word frequency analysis, calculated the total number of occurrences of keywords in the lyrics text, and observed the frequency of each keyword from three aspects: overall, singer, and age to understand the basic connotation and word frequency distribution of the lyrics texts. Then, this study used K-means and affinity propagation clustering to conduct unsupervised clustering experiments, and used the calculation of silhouette coefficients and in-depth observation of each cluster to evaluate the effectiveness of clustering. As a result, seven possible lyrics themes were found: music, party, friendship, love, growth, local place, and society. Finally, this study used the results of the clustering experiment and manual labeling with the support vector machine and the K-nearest neighbor algorithm to conduct a supervised binary classification experiment, and through the calculation of accuracy, precision, recall and F1 value, the effectiveness of these two classification algorithms on the classification of Chinese rap music lyrics in Taiwan under different lyrics themes and different labeling methods was evaluated.
The findings of the study show that the themes of music, love, and party are the most common themes of Chinese rap music lyrics in Taiwan in the past two decades. As years go by, more and more different lyrics themes appear, such as daily life, social issues, school, etc. In terms of clustering effectiveness, the affinity propagation clustering performed slightly better than K-means. In terms of classification performance, the K-nearest neighbor algorithm outperformed the support vector machine slightly, and the labeling through the clustering results could train a binary classification model for music lyrics that is better than pure manual labeling. The lyrics with the theme of music do exist in Chinese rap music lyrics in Taiwan, and it remains to be seen whether other themes exist due to the problem of data imbalance. It is suggested that future research can increase the coverage of lyrics text, try different dimension reduction methods, analyze word frequency from different aspects, label types of lyrics by experts or listeners, and use different clustering and classification methods.
Bennett, A. (2000). Popular Music and Youth Culture: Music, Identity and Place. London, England: Macmillan.
Chen, S. Y., Tseng, T. T., Ke, H. R., & Sun, C. T. (2011). Social trend tracking by time series based social tagging clustering. Expert Systems with Applications, 38(10), 12807-12817.
Chervonenkis, A. Y. (2013). Early history of support vector machines. In Empirical Inference (pp. 13-20). Springer, Berlin, Heidelberg.
Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE transactions on information theory, 13(1), 21-27.
Dueck, D., & Frey, B. J. (2007, October). Non-metric affinity propagation for unsupervised image categorization. In 2007 IEEE 11th International Conference on Computer Vision (pp. 1-8). IEEE.
Furuya, M., Huang, H. H., & Kawagoe, K. (2014). Music classification method based on lyrics for music therapy. In Proceedings of the 18th International Database Engineering & Applications Symposium (pp. 382-383). ACM.
George, N. (1999). Hip Hop America. London, England: Penguin Press.
Herd, D. (2005). Changes in the prevalence of alcohol use in rap song lyrics, 1979–97. Addiction, 100(9), 1258-1269.
Herd, D. (2008). Changes in drug use prevalence in rap music songs, 1979–1997. Addiction Research & Theory, 16(2), 167-180.
Herd, D. (2014). Changes in the prevalence of alcohol in rap music lyrics 1979–2009. Substance use & misuse, 49(3), 333-342.
MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, No. 14, pp. 281-297).
Motley, C. M., & Henderson, G. R. (2008). The global hip-hop Diaspora: Understanding the culture. Journal of Business Research, 61(3), 243-253.
Oxford English Dictionary (2019). subculture. Retrieved from https://en.oxforddictionaries.com/definition/us/subculture
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12, 2825-2830.
Petchauer, E. (2012). Hip-hop culture in college students' lives: Elements, embodiment, and higher edutainment. Routledge.
Python (programming language) (2020). In Wikipedia, The Free Encyclopedia. Retrieved January 1, 2020, from https://en.wikipedia.org/w/index.php?title=Python_(programming_language)
Rapping (2019). In Wikipedia, The Free Encyclopedia. Retrieved December 29, 2019, from https://en.wikipedia.org/w/index.php?title=Rapping
Richardson, L. (2004). Beautiful Soup Documentation. Retrieved from https://www.crummy.com/software/BeautifulSoup/bs4/doc/
Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics, 20, 53-65.
Schweig, M. L. (2013). The song readers: Rap music and the politics of storytelling in Taiwan (Unpublished doctoral dissertation). Harvard University, Massachusetts.
PyInvest(2020年4月19日)。[機器學習首部曲]K-近鄰演算法 KNN【部落格文字資料】。取自https://pyecontech.com/2020/04/19/%e6%a9%9f%e5%99%a8%e5%ad%b8%e7%bf%92%e9%a6%96%e9%83%a8%e6%9b%b2k-%e8%bf%91%e9%84%b0%e6%bc%94%e7%ae%97%e6%b3%95-knn/
Tommy Huang(2018年3月16日)。機器學習-支撐向量機(support vector machine, SVM)詳細推導【部落格文字資料】。取自https://chih-sheng-huang821.medium.com/%E6%A9%9F%E5%99%A8%E5%AD%B8%E7%BF%92-%E6%94%AF%E6%92%90%E5%90%91%E9%87%8F%E6%A9%9F-support-vector-machine-svm-%E8%A9%B3%E7%B4%B0%E6%8E%A8%E5%B0%8E-c320098a3d2e
World Peace(2017年10月20日)。獨家專訪 / 代表參劈的學院派饒舌巨擘 ─ 老莫 ILL MO【新聞群組】。取自https://www.cool-style.com.tw/wd2/archives/268664
Yuki Liu(2019年6月19日)。Clustering method 5【部落格文字資料】。取自https://medium.com/ai-academy-taiwan/clustering-method-5-88c109369fa8
余至浩(2019年9月4日)。千呼萬喚十多年!中研院終於開源釋出國產自動化中文斷詞工具,正式採用GPL 3.0釋出【新聞群組】。取自https://www.ithome.com.tw/news/132838
宋天龙(2015年5月19日)。聚类算法Affinity Propagation(AP) 【部落格文字資料】。取自https://www.dataivy.cn/blog/%E8%81%9A%E7%B1%BB%E7%AE%97%E6%B3%95affinity-propagation_ap/
李朋軒(2019)。Chinese README。取自https://github.com/ckiplab/ckiptagger/wiki/Chinese-README
洪雅萍(2013)。台灣嘻哈音樂的在地異世界-以豬頭皮、MC HOTDOG為例(未出版之碩士論文)。國立中興大學,臺中市。
蕭蘋與蘇振昇(2002)。揭開風花雪月的迷霧:解讀台灣流行音樂中的愛情世界 (1989-1998)。新聞學研究,(70),167-195。
謝邦昌(2017)。Text Mining文本探勘【HyRead版】。取自https://0-ntnu.ebook.hyread.com.tw.opac.lib.ntnu.edu.tw/bookDetail.jsp?id=132354