簡易檢索 / 詳目顯示

研究生: 吳威霖
Wu, Wei-Lin
論文名稱: 從空品物聯網探討空氣品質對於社群媒體風向之影響
A Study on Impacts of Air Quality on Social Media Polarity
指導教授: 陳伶志
Chen, Ling-Jyh
學位類別: 碩士
Master
系所名稱: 資訊工程學系
Department of Computer Science and Information Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 43
中文關鍵詞: 空氣品質社群媒體文字探勘情緒分析詞向量網路聲量
英文關鍵詞: Air Quality, Social Media, Text Mining, Sentiment Analysis, Distributed Vector, Public Internet Sentiment
DOI URL: http://doi.org/10.6345/NTNU201900427
論文種類: 學術論文
相關次數: 點閱:108下載:6
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 空氣汙染會提高呼吸道疾病及死亡之風險,是目前全世界都必須正視的環境議題,而現今民眾得知空氣品質的方法大多還是依賴環保署的儀器資料,但其目的是為了監測大範圍空氣品質的長期變化趨勢且所提供的空氣品質數值多為小時平均值,其即時性並無法讓民眾及早預防突發的空氣汙染。然而在社群媒體發達的時空背景下,民眾發文時的情緒能夠反應出民眾當下最直接的感受,且這些情緒同時也會被空氣品質的好壞所影響,因此本篇論文提出了一個分析文章標題情緒的方法,去分類社群媒體文章的情緒,並找出其與空氣品質的關聯性。研究中使用了環保署的空氣品質資料及批踢踢實業坊的文章資料,以情緒分類模型將文章區分為正面及負面,其準確度可以達到 85 %,並將正負面文章的數量與空氣品質數值進行初步分析,而結果中可以發現空氣品質在影響民眾情緒上有相關連性,實驗過程中也能觀察到民眾的慣用詞語及對特定詞語的觀感,在 PM2.5 相關研究及政策制定等方面能夠作為一項參考的依據。

    Study shows air pollution will increase the risk of respiratory diseases and death. It is one of important environmental issues that is often cause a great concern for the world. Nowadays, people could obtain air quality with the data from the Environment Protection Administration. Due to the purpose of monitoring large-scale and long term trend regarding air quality and its average value, the immediacy of air quality data is inadequate for people to prevent sudden air pollution accidents. However, social networks have turned into part of people's daily lives. The aspect of articles that people published can accurately reflects public perception, which may be affected by air quality. In this research, a method is proposed for analyzing and classifying the emotions according to the title and content of articles with their relevance to air quality. The classification model determines the relationship between air quality and public internet sentiment based on the volume of internet post and their emotion ratio. Using the data from the Environment Protection Administration and PTT (the largest Bulletin Board System in Taiwan), the proposed model able to divides article titles into positive/negative emotion with 85% accuracy. The results show that air quality has relevance to people’s emotion. Also, the idiom and people’s emotion against particular words can be associated. Thus, this research work can contribute to PM2.5 related work and aid to policy- making process.

    圖目錄 V 表目錄 VI 第一章 緒論 1 第二章 相關探討 3 2.1 自相似程度 (Self-Similarity) 3 2.2 時間序列分解 (Time Series Decomposition) 4 2.3 空品事件 5 第三章 研究方法 6 3.1 資料來源及前處理 6 3.1.1 資料來源 6 3.1.2 資料前處理 7 3.1.3 資料分類 10 3.2 關鍵詞擷取 (Keyword Extraction) 11 3.2.1 TFIDF(Term Frequency, Inverse Document Frequency) 12 3.2.2 Delta TFIDF 14 3.2.3 LLR (Log-Likelihood Ratio) 15 3.3 Word2Vec模型 18 3.4 特徵向量轉換 20 3.4.1 標註關鍵詞 20 3.4.2 相似詞轉換 21 3.5 Weighted Average 24 3.6 分類器訓練 25 第四章 實驗結果 26 4.1 實驗設定 26 4.2 模型比較 27 4.3 空氣品質AQI與社群媒體 29 4.4 社群媒體聲量變化 34 4.4.1 慣用詞語 34 4.4.2 空氣議題聲量變化 35 4.4.3 地區聲量變化 37 4.4.4 政治人物聲量變化 38 第五章 結論與未來展望 39 參考文獻 41

    Chen, L. J., Chen, Y. C., Sun, T., Sreedevi, P., Chen, K. T., Yu, C. H., & Chu, H. H. (2007, May). Finding self-similarities in opportunistic people networks. In IEEE INFOCOM 2007-26th IEEE International Conference on Computer Communications(pp. 2286-2290). IEEE.

    Abrahamsson, H. (1999). Traffic measurement and analysis.

    Zhang, H. F., Shu, Y. T., & Yang, O. (1997, August). Estimation of Hurst parameter by variance-time plots. In 1997 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, PACRIM. 10 Years Networking the Pacific Rim, 1987-1997 (Vol. 2, pp. 883-886). IEEE.

    Hurst, H.E.; Black, R.P.; Simaika, Y.M. (1965). Long-term storage: an experimental study. London: Constable.

    Lau, W. C., Erramilli, A., Wang, J. L., & Willinger, W. (1995, November). Self-similar traffic parameter estimation: a semi-parametric periodogram-based algorithm. In Proceedings of GLOBECOM'95 (Vol. 3, pp. 2225-2231). IEEE.

    Gospodinov, M., & Gospodinova, E. (2005, June). The graphical methods for estimating hurst parameter of self-similar network traffic. In Proceedings of the 2005 International Conference on Computer Systems and Technologies (pp. 1-6). ACM.

    West, M. (1997). Time series decomposition. Biometrika, 84(2), 489-494

    Guowen Huang, Ling-Jyh Chen, Wen-Han Hwang, ShengLi Tzeng, and Hsin-Cheng Huang. Real-Time PM2.5 Mapping and Anomaly Detection from AirBoxes in Taiwan. Environmetrics, volume 29, issue 8, e2537, December, 2018.

    行政院環境保護署 (2019),空氣品質指標,取自https://taqm.epa.gov.tw/taqm/tw/b0201.aspx

    批踢踢實業坊,八卦看板文章,取自https://www.ptt.cc/bbs/Gossiping/index.html

    Robertson, S. (2004). Understanding inverse document frequency: on theoretical arguments for IDF. Journal of documentation, 60(5), 503-520.

    Martineau, J. C., & Finin, T. (2009, March). Delta tfidf: An improved feature space for sentiment analysis. In Third international AAAI conference on weblogs and social media.

    Hsieh, Y. L., Liu, S. H., Chang, Y. C., & Hsu, W. L. (2015, November). Distributed keyword vector representation for document categorization. In 2015 Conference on Technologies and Applications of Artificial Intelligence (TAAI) (pp. 245-251). IEEE

    Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM transactions on intelligent systems and technology (TIST), 2(3), 27.

    Le, Q., & Mikolov, T. (2014, January). Distributed representations of sentences and documents. In International conference on machine learning (pp. 1188-1196).

    Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

    Müller, Meinard (2007). Dynamic Time Warping. In Information Retrieval for Music and Motion, chapter 4, pages 69-84

    S. M. H. Dadgar, M. S. Araghi and M. M. Farahani, "A novel text mining approach based on TF-IDF and Support Vector Machine for news classification," 2016 IEEE International Conference on Engineering and Technology (ICETECH), Coimbatore, 2016, pp. 112-116

    Joachims T. (1998) Text categorization with Support Vector Machines: Learning with many relevant features. In: Nédellec C., Rouveirol C. (eds) Machine Learning: ECML-98. ECML 1998. Lecture Notes in Computer Science (Lecture Notes in Artificial Intelligence), vol 1398. Springer, Berlin, Heidelberg

    Mäntylä, Mika & Graziotin, Daniel & Kuutila, Miikka. (2016). The Evolution of Sentiment Analysis - A Review of Research Topics, Venues, and Top Cited Papers. Computer Science Review. 27. 10.1016/j.cosrev.2017.10.002.

    Joachims, T. (1996). A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization (No. CMU-CS-96-118). Carnegie-mellon univ pittsburgh pa dept of computer science.

    Dillon, M. (1983). Introduction to modern information retrieval: G. Salton and M. McGill. McGraw-Hill, New York (1983). xv+ 448 pp., $32.95 ISBN 0-07-054484-0.

    Nikolopoulou, M., Kleissl, J., Linden, P. F., & Lykoudis, S. (2011). Pedestrians' perception of environmental stimuli through field surveys: Focus on particulate pollution. Science of the total environment, 409(13), 2493-2502.

    Chu, C. H., Wang, C. A., Chang, Y. C., Wu, Y. W., Hsieh, Y. L., & Hsu, W. L. (2016, November). Sentiment analysis on Chinese movie review with distributed keyword vector representation. In 2016 Conference on Technologies and Applications of Artificial Intelligence (TAAI) (pp. 84-89). IEEE.

    Du, X., & Varde, A. S. (2016, April). Mining PM2. 5 and traffic conditions for air quality. In Information and Communication Systems (ICICS), 2016 7th International Conference on (pp. 33-38). IEEE.

    Jieba 分詞. https://github.com/fxsjy/jieba

    下載圖示
    QR CODE