研究生: |
沈信佑 Shen, Hsin-You |
---|---|
論文名稱: |
劇本文件探勘與廣告推薦之研究 Script Text Mining for Advertisement Recommendation |
指導教授: |
侯文娟
Hou, Wen-Juan |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2016 |
畢業學年度: | 104 |
語文別: | 中文 |
論文頁數: | 59 |
中文關鍵詞: | 文件探勘 、劇本分析 、廣告推薦 、特徵詞 、廣義知網 |
英文關鍵詞: | text mining, script analysis, advertisement recommendation, feature words, E-HowNet |
DOI URL: | https://doi.org/10.6345/NTNU202205100 |
論文種類: | 學術論文 |
相關次數: | 點閱:133 下載:13 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文的研究議題,主要發想是因為觀察在目前電視劇之後的廣告時段中,大部分廣告播映的時機點都不一定恰當,而且廣告播映都需要人力排程,費時費力;此外,在觀看戲劇節目的經驗中,每次看完一個段落的戲劇,接著就會進入一段對觀眾而言,很漫長且無趣的廣告期,導致在此時間內有不少的觀眾會選擇轉去其他台,先收看他台正在播放的戲劇或節目,所以在此期間廠商的廣告效益就會因此而降低。因此本論文希望建立自動化劇本分析與廣告推薦系統,先經由分析與探勘劇本中重要的特徵詞,目的在於找出有效且具高準確率的模型,使推薦播出的廣告可以吸引觀眾目光,得到廣告商品的最大效益。
本論文實驗資料分別由兩種來源取得:第一種來源從金穗獎劇本找出12個劇本做為劇本文件資料,第二種來源為隨機取得的一些廣告群做為廣告商品資料庫。經由本論文所提方法實驗之後,最後會與人力評斷的結果互相比對,用來驗證本實驗各項結果是否成功,實驗結果評估對象包含各段落重點度與最佳之推薦廣告。
研究方法以兩項目標為導向:包含(1)計算各段落重點度,與(2)推薦最佳廣告。為了計算各段落重點度,首先需找出劇本中幫助分析重點度的特徵詞,這些特徵詞將是日後分析重點度時重要的關鍵。而在最佳廣告推薦方面,於每個段落內先找出所有特徵詞Na,選取每個段落排名前三名的Na詞,接著使用廣義知網找出延伸詞,幫助劇本內容與廣告商品的聯結,然後再找出重點度特徵詞後,就可以分析劇本中各段落的評分狀況,每個段落會得出最佳推薦的廣告,最後再供廣告商選擇那些段落需下廣告,詳細的步驟與方法本文內會再敘述。對於實驗結果,本研究以準確度當做評估的標準。
The motivation of this study comes from the observation that in the current ad schedule after the drama, most of the advertising broadcast is not necessary at the appropriate time point, and the advertising scheduling needs manpower efforts. Every time when television viewers watch a drama, it will enter an advertising program that seems long and very boring to viewers, leading to a lot of viewers change to another television channel to watch other plays or shows being played. It causes during this time the commercial effectiveness of advertising will be lowered. Consequently, the thesis aims to build an automatic script analysis and advertisement recommendation system. This study proposes approaches to mining and analyze the scripts, and to locate the features helpful for building an effective and accurate model, so that the advertisements recommended by the model can catch viewer's eyes. This attractive characteristics will raise a lot of advertising effectiveness.
The experimental data are composed of 12 screenplays which are award the "Golden Harvest Awards for Outstanding Short Films". We also randomly retrieve the advertising products from the websites to serve as our advertisement database. The experimental results are compared with the answers provided by humans. The evaluation targets include the weighting scores of the paragraphs and the best recommended advertising.
The methods are proposed based on two objectives, including the weighting scores of the paragraphs and the best recommended advertising. For computing the weighting scores of the paragraphs, the first step is to identify the feature words from scripts. The feature words are important keys in mining the scripts. For making the best recommendation, the feature words, NA, are retrieved from each paragraph. Then the top three NA words of each paragraph are selected. In the following, E-HowNet is employed to find the extension words of the NA words. The extension words plays an important role in making association between scripts and the advertising products. Combining with the weighting scores of the paragraphs and the extension words, an algorithm for recommending the advertisement is proposed. Finally, the advertising vendors can decide their broadcast time based on our proposed advertising suggestion. Details of methods will be described in the thesis. For the evaluation, the precision rate is used as our evaluation measure.
Agarwal, A., Balasubramanian, S., Zheng, J., & Dash, S. (2014). Parsing screenplays for extracting social networks from movies. Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014), pp. 50-58.
Blackstock, A., & Spitz, M. (2008). Classifying movie scripts by genre with a MEMM using NLP-Based features. Available at December 12, 2015 from nlp.stanford.edu/course/cs224n/2008/06.pdf.
Eliashberg, J., Jonker, J. J., Sawhney, M. S., Wierenga, B. (2000) MOVIEMOD: An implementable decision support system for pre-release market evaluation of motion pictures, Marketing Science, Vol. 19, No. 3, pp. 226-243.
Gil, S., Kuenzel, L., & Caroline, S. (2011). Extraction and analysis of character interaction networks from plays and movies. Technical report, Stanford University.
Hodeghatta, U. R. (2013). Sentiment analysis of Hollywood movies on Twitter. Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 1401-1404.
John, G. H., & Langley, P. (1995). Estimating continuous distributions in Bayesian classifiers. Proceedings of the Eleventh conference on Uncertainty in Artificial Intelligence, pp. 338-345.
Li, S., Wang, Z., Zhou, G., & Lee, S. Y. M. (2011). Semi-supervised learning for imbalanced sentiment classification. Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, Vol. 22, No. 3, pp. 1826-1831.
Manning, C., & Klein, D. (2003). Optimization, maxent models, and conditional estimation without magic. Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Tutorials, Volume 5, pp. 8.
McCallum, A., Freitag, D., & Pereira, F. C. (2000). Maximum entropy Markov models for information extraction and segmentation. ICML, Vol. 17, pp. 591-598.
Qin, Y., Zhang, Y., Zhang, M., & Zheng, D. (2013). Feature-rich segment-based news event detection on twitter. Proceedings of 2013 International Joint Conference on Natural Language Processing, pp. 302-310.
金穗獎優良劇本, from http://www.movieseeds.com.tw/
廣義知網, from http://ckipsvr.iis.sinica.edu.tw/
中文斷詞系統, from http://ckipsvr.iis.sinica.edu.tw/