研究生: |
陳靖允 Chen, Ching-Yun |
---|---|
論文名稱: |
結合PTZ攝影機與光學雷達之CNN虛擬圍籬系統 CNN-Based Virtual Fence System with PTZ Camera and LiDAR |
指導教授: |
陳世旺
Chen, Sei-Wang 方瓊瑤 Fang, Chiung-Yao |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2018 |
畢業學年度: | 106 |
語文別: | 中文 |
論文頁數: | 96 |
中文關鍵詞: | 虛擬圍籬系統 、卷積神經網路 、光學雷達 、PTZ攝影機 、連續影像相減法 、形態學 、資料集前處理 |
英文關鍵詞: | virtual fence system, temporal differencing method, dataset preprocessing |
DOI URL: | http://doi.org/10.6345/THE.NTNU.DCSIE.028.2018.B02 |
論文種類: | 學術論文 |
相關次數: | 點閱:180 下載:13 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本研究開發一套結合PTZ(Pan-Tilt-Zoom)攝影機與光學雷達(Light Detection and Ranging, LiDAR)之CNN(Convolutional Neural Network,卷積神經網路)虛擬圍籬系統。
虛擬圍籬與傳統的隔離方式不同,並不需要真正築起一道實體的牆壁或護欄,而是利用各種電子裝置與軟體程式的結合,建立人眼不可察覺的虛擬防線。虛擬圍籬具有下列優點:(A)低人力介入且警戒可為全天候、大範圍 (B)具機動性與擴充性 (C)不破壞原景觀 (D)即時通報且可延伸後續處理。但實際應用上,虛擬圍籬常因誤報率太高,處理和通報速度太慢等因素,尚未被大眾所接受。
本研究分別從軟硬體兩方面來提升傳統虛擬圍籬偵測與辨識的準確度與速度。在硬體方面的改良是使用LiDAR與PTZ攝影機。LiDAR所發出的紅外線不但可以做為系統的觸發器,而且它不易受天候與光影影響,可以降低誤判,提高系統的精準度與穩定度。此外,LiDAR也將偵測到的侵入物距離資訊傳送給PTZ攝影機以控制鏡頭的變焦縮放,使得拍攝到的影像都有適當的大小,增加後續CNN分類辨識的準確度。
在軟體方面的改良則是使用CNN,利用它強大的特徵學習能力,提升辨識分類的速度與準確率。本研究以不同的訓練模式以及不同的資料集前處理來進行VGG-16與Darknet-19的實驗。就訓練模式而言,使用以ImageNet大量資料訓練所得的pretrained參數,再加上與測試資料前處理類型相近的資料集進行fine-tune,可以得到最佳的成效。就資料集前處理而言,本研究將其大致分為Original(即原本的邊界框影像)、Rescaled(以程式自動將邊界框影像等比縮放置中放入符合CNN輸入尺寸的黑色或灰色底)、Matting(去背景,將背景塗成黑色或灰色)、以及Matting&Rescaled(以程式自動將去背景後的邊界框影像等比縮放置中放入符合CNN輸入尺寸的黑色或灰色底)。實驗顯示,訓練和測試都使用Rescaled版的資料集可以得到最高的mAP,其中VGG-16實驗中,訓練和測試都使用Rescaled-Grey版資料集可得到96.3%的mAP。
對於虛擬圍籬系統而言,因為侵入物件有移動的動態資訊,會造成連續畫面的變化,因此系統可經由移動物件定位法來找出侵入物件及其邊界框,不需像傳統的物件偵測系統是以單張靜態的影像畫面為輸入,必須產生和評比各種可能的物件邊界框,並浪費資源和時間在不必要的背景物件的偵測和辨識上。本研究所採用的移動物件定位法是運用三個連續畫面的連續影像相減法,並且採運作速度極快的bitwise_and函式取相減影像的交集,以得到較精確的移動前景與邊界框。此外,可用經過動態形態學填補空洞後的二值化前景影像為遮罩,與原影像或邊界框影像結合後,達到粗略的去背景(matting)效果。Matting& Rescaled-Grey版資料集在VGG-16也有很高的mAP(95.3%)。
目前本系統設定區分的侵入者類別為三類,分別是「行人」、「動物」和「非人且非動物」。使用者可以視應用場所的需求,對三個類別的侵入者做不同的處理,使後續的應用更有彈性。從整合測試的實驗結果顯示,本研究虛擬圍籬系統整體的偵測準確率mAP達95%以上,而從LiDAR觸發取像至判斷出物件類別的平均處理時間則在0.2sec.以下,是一套準確率高且速度快的實用系統。
This study proposes a CNN(Convolutional Neural Network)-based virtual fence system equipped with a LiDAR (Light Detection and Ranging) sensor and a PTZ (Pan-Tilt-Zoom) camera. The proposed system detects and classifies invaders with high mAP (mean average precision) and short operation time.
Virtual fence, as opposed to a real physical fence, plays an important role in intelligent surveillance systems. It involves less human resources to build, makes no physical impacts on the surrounding areas, and is easily extendable and portable. However, due to a high false alarm rate and high time complexity, it is still challeng- ing for a virtual fence system to provide satisfactory performance.
The proposed virtual fence system in this study improves both the detection rate and speed. First, a LiDAR sensor is used to detect invaders. Once an invader is sensed, the sensor triggers the PTZ camera. LiDAR’s tolerance to variations of weather and light enhances the robustness of the system. Besides, since small objects in images easily cause detection and classification errors, the distance information of objects provided by the LiDAR sensor is passed to the PTZ camera for controlling its zoom-in and zoom-out operations to ensure proper sizes of objects present in images.
Then, a three-frames temporal differencing algorithm is applied to locate the moving objects in video frames. Through a bitwise-and operation and dynamic morphological processings applied to the differencing frames, the contours and bounding boxes of the moving objects can quickly be determined. Compared with the existing object detection systems, such as RCNN and YOLO series, which provide lots of bounding boxes and evaluations at multiple locations and scales, the proposed object location method is less complicated. Besides, object detection systems above are trying to locate and classify all the objects appearing in an image, while a virtual fence system is only interested in detecting the invading moving objects. Thus, using the proposed moving object location method can avoid unnecessary processings of irrelevant background objects.
Finally, a CNN system is used to classify the objects in the bounding box images into 3 classes, mainly, pedestrian, animal and others. The CNN frameworks experimented in this study are VGG-16 and Darknet-19 (the CNN framework used in YOLOv2). Different training modes and dataset preprocessings for CNN are investi- gated. For training modes, experiments of VGG-16 demonstrate that training with ImageNet-pretrained parameters and fine-tuned with bounding box datasets achieves the best performance. For dataset preprocessing, there are 4 main preprocessing types, mainly, Original, Rescaled (isotropically rescaling an image into a predefined fixed- size black or grey underlay), Matting (color background with black or grey), and Rescaled&Matting. Experimental results indicate that using Rescaled preprocessing for both training and testing datasets outperforms other combinations. VGG-16 with ImageNet-pretrained parameters and fine-tuned using a bounding box dataset with Rescaled-Grey preprocessing achieves 96.3% mAP.
The integration test of the proposed virtual fence system demonstrate that the performance of the best performing configuration mentioned above achieves higher than 95% mAP and the processing time averagely taken from LiDAR detection to the end of CNN classification is less than 0.2 second. The experimental results show that the proposed system is fast, accurate, stable and of practical use.
[Can17] A. Canziani, E. Culurciello and A. Paszke, “Evaluation of Neural Network Architectures for Embedded Systems,” Proceedings of 2017 IEEE Inter- national Symposium on Circuits & Systems, pp. 224-227, USA, 2017.
[Che12] J. Chen, T. Tseng, C. Lai, and S. Hsieh, “An Intelligent Virtual Fence Security System for the Detection of People Invading,” Proceedings of 9th International Conference on Ubiquitous Intelligence and Computing and 9th International Conference on Autonomic and Trusted Computing, pp. 786-791, Japan, 2012.
[Fuk83] K. Fukushima, S. Miyake, and T. Ito, “Neocognitron: A Neural Network Model for a Mechanism of Visual Pattern Recognition,” IEEE Transactions on Systems, Man, & Cybernetics, Vol. 13, No. 5, pp. 826- 834, 1983.
[Gir14] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” Proceedings of Computer Vision and Pattern Recognition, pp. 580-587, USA, 2014.
[Gir15] R. Girshick, “Fast R-CNN,” Proceedings of International Conference on Computer Vision, pp. 1440-1448, Chile, 2015.
[He15] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 37, No. 9, pp. 1904-1916, 2015.
[He16] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-9, USA, 2016.
[Hua17] J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Z. Wojna, Y. Song, S. Guadarrama, and K. Murphy, “Speed/Accuracy Trade- offs for Modern Convolutional Object Detectors,” Proceedings of Computer Vision and Pattern Recognition, pp. 7310-7319, USA, 2017.
[Kae01] P. KaewTraKulPong, and R. Bowden, “An improved adaptive background mixture model for real-time tracking with shadow detection,” Proceedings of 2nd European Workshop on Advanced Video Based Surveillance Systems, pp. 135-144, London, 2001.
[Kap17] P. Kaplanoglou, “Content-Based Image Retrieval using Deep Learning”, Thesis, Department of Information Technology, Alexander Technological Educational Institute of Thessaloniki, 2017.
[Kho18] A. Khosla, N. Jayadevaprakash, B. Yao, and L. Fei-Fei, Stanford Dog Dataset, http://vision.stanford.edu/aditya86/ImageNetDogs/, 2018.
[Kri12] A. Krizhevsky, H. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Proceedings of Advances in Neural Information Processing Systems 25, pp. 1-9, California, 2012.
[Lec98] Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-Based Learning Applied to Document Recognition,” Proceedings of the IEEE, Vol. 86, No.11, pp: 2278-2324, 1998.
[Li16-1] F. F. Li, A. Karpathy and J. Johnson, “Lecture 7: Convolutional Neural Network,” Lecture Slides of Stanford CS231n, 2016.
[Li16-2] F. F. Li, A. Karpathy, and J. Johnson, “Lecture 8: Spatial Localization and Detection,” Lecture Slides of Stanford CS231n, 2016.
[Lin14] M. Lin, Q. Chen, and S. Yan, “Network In Network,” arXiv preprint arXiv:1312.4400, 2014.
[Mal18] S. Mallick, “Keras Tutorial : Fine-tuning Using Pre-trained Models,” Learn OpenCV, from https://www.learnopencv.com/keras-tutorial-fine- tuning- using-pre-trained-models, 2018.
[Ouy13] W. Ouyang and X. Wang, “Joint Deep Learning for Pedestrian Detection,” Proceedings of IEEE International Conference on Computer Vision, pp. 2056-2063, Australia, 2013.
[Ope17] OpenCV, “Morphological Transformations,” OpenCV 3.0.0-dev document -ation » OpenCV-Python Tutorials » Image Processing in OpenCV», https: //docs.opencv.org/3.0-beta/doc/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops.html.
[Ope18] Mr. Opengate, Deep Learning: the role of the activation function, from http://mropengate.blogspot.com/2017/02/deep-learning-role-of-activation.html, 2018.
[Ove08] G. Overett, L. Petersson, N. Brewer, L. Andersson and N. Pettersson, “A New Pedestrian Dataset for Supervised Learning,” Proceedings of IEEE Intelligent Vehicles Symposium, Netherlands, pp. 373-378, 2008.
[Par12] O. Parkhi, A. Vedaldi, A. Zisserman, and C. Jawahar, Visual Geometry Group: The Oxford-IIIT Pet Dataset, http://www.robots.ox.ac.uk/~vgg/ data/pets/, 2012.
[Red17-1] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” Proceedings of Computer Vision and Pattern Recognition, pp. 7310-7319, USA, 2017.
[Red17-2] J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” Pro- ceedings of Computer Vision and Pattern Recognition, pp. 6517-6525, USA, 2017.
[Red18-1] J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement”, arXiv:1804.02767, 2018.
[Red18-2] J. Redmon, “Darknet: Open Source Neural Networks in C,” https:// pjreddie.com/darknet/, 2018.
[Ren17] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 2017.
[Rus15] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision, vol. 115, issue 3, pp. 211-252, 2015.
[San18] L. A. Santos, “Tensorflow,” from https://leonardoaraujosantos.gitbooks.io/ artificial-inteligence/content/tensorflow.html, 2018.
[Ser14] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun, “OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks,” Proceedings of International Conference on Learning Representations, Canada, 2014.
[Sim15] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” Proceedings of Advances in Neural Information Processing Systems, pp. 1-14, Canada, 2015.
[Sim16] Simshang, VGGNet|簡說, 取自http://simtalk.cn/2016/09/25/VGGNet/, 2016.
[Sta99] C. Stauffer, “Adaptive Background Mixture Models for Real-time Tracking,” Proceedings of Computer Vision and Pattern Recognition, pp. 246-252, USA, 1999.
[Sze16] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens and Z. Wojna, “Rethinking the Inception Architecture for Computer Vision,” Proceedings of Computer Vision and Pattern Recognition, pp. 2818-2826, USA, 2016.
[Uij13] J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders, “Selective Search for Object Recognition,” International Journal of Computer Vision, Vol. 104, no. 2, pp. 154-171, 2013.
[Var16] D. Varga and T. Szirányi, “Detecting Pedestrians in Surveillance Videos Based on Convolutional Neural Network and Motion,” Proceedings of 24th European Signal Processing Conference, pp. 2161-2165, Hungary, 2016.
[Yos14] J. Yosinski, J. Clune, A. Nguyen, Y. Bengio and H. Lipson, “How Transferable are Features in Deep Neural Networks,” Proceedings of Advances in Neural Information Processing Systems 27, pp. 3320-3328, Canada, 2014.
[Yos15] J. Yosinski, J. Clune, A. Nguyen, T. Fuchs and H. Lipson, “Understanding Neural Networks Through Deep Visualization,” International Conference on Machine Learning Workshop, France, 2015.
[Zac04] Z. Butler, P. Corket, R. Peterson, and D. Rust, “Virtual Fences for Controlling Cows,” Proceedings of IEEE International Conference on Robotics and Automation, pp. 4429–4436, 2004.
[Ziv04] Z. Zivkovic, “Improved Adaptive Gaussian Mixture Model for Back- ground Subtraction,” Proceedings of the 17th International Conference on Pattern Recognition, vol.2, pp. 28-31, 2004.
[Ziv06] Z. Zivkovic, and F. Van Der Heijden, “Efficient Adaptive Density Estimation per Image Pixel for the Task of Background Subtraction,” Pattern Recognition Letters, vol. 27, no. 7, pp. 773-780, 2006.
[大 1] 王奕超,【大範圍空間公共安防應用】大範圍公共安全維安與智慧監控,iDS智慧安防雜誌,取自http://www.idsmag.com.tw/ids/new_article. as p? ar_id = 30312,2017年。
[中 1] 中央社新聞,校園電子圍籬成效不佳 柯文哲:沒裝的不再裝,取自https://udn.com/news/story/7323/2546986,2017年。
[台 1] 台灣WiKi,激光雷達,取自http://www.twwiki.com/wiki/激光雷達,2017年。
[李 1] 李宏毅,一天搞懂深度學習,台灣資料科學年會系列活動課程講義,2016年。
[雨 1] 雨石,一文讀懂卷積神經網絡CNN,取自https://read01.com/7Rx00O.ht ml#.W17xxy2FOHs,2016年。
[馬 1] 馬瑞良,侯鈞元,IEKTopics|運用人工智慧降低城市犯罪機率,取自http: //www.iek.org.tw/BookView.do?domain=79&rptidno=134130732,IEK產業情報網,2016年。
[規1] AXIS P5515 PTZ Network Camera規格,取自AXIS官網https://www.ax is.com/files/datasheet/ds_p5515_1502155_en_1702.pdf,2017年。
[規 2] Leddar Sensor Evaluation Kit規格,取自LeddarTech官網https://support. leddartech.com/file.php/174448KWMZKHDYJNYRXTX0/LeddarTM-Sensor-Evaluation-Kit-datasheet.pdf,2017年。
[粘 1] 粘為博,陳澤民,張雍昌,陳立函,楊宗賢,徐志偉,無人駕駛車/自駕車技術探索,電腦與通訊 (Journal of Information and Communi -cation Technology),工業技術研究院 資訊與通訊研究所,取自https:// ictjournal.itri.org.tw/Content/Messagess/contents.aspx?MmmID=654304432061644411&MSID=745621454255354636,2017年。
[黃 1] 黃彥棻,智慧監控的聰明大腦:智慧影像分析,iThome技術文章,取自https://www.ithome.com.tw/node/82759,2013年。
[張 1] 張鈞閔,許之凡,手把手的深度學習實務,台灣資料科學年會系列活動課程講義,2017年。
[零 1] 零壹科技,透過電子圍籬系統,強化周界安全管理,iDS智慧安防雜誌,取自http://www.idsmag.com.tw/ids/new_article.asp?ar_id=30333,2017年。
[審 1] 審計部全球資訊網,中華民國105年度臺北市地方總決算審核報告,取自https://www.audit.gov.tw/ezfiles/0/1000/attach/7/pta_5339_605241 8_32478.pdf,2016年。
[藍 1] 藍榮禕,1-by-1 Convolution Layer,取自https://zhuanlan.zhihu.com/p/ 30182988,2017年。
[蘇 1] 蘇家祥,王元凱,淺談影像監控之背景建立技術,電腦視覺監控產學研究聯盟電子報, 取自 http://140.113.87.114/cvrc/edm/vol_2/skill_7.ht m,2005年。
[1] “ImageNet,” Avaliable at: http://www.image-net.org/, Accessed 2018.
[2] “Data61 Pedestrian Dataset,” Avaliable at: https://research.csiro.au/data61/automap-datasets-and-code/#pedestrian-dataset, Accessed 2018.
[3] “Stanford Dog Dataset,” Avaliable at: http://vision.stanford.edu/aditya86/ImageNetDogs/, Accessed 2018.
[4] “Oxford-IIIT Pet Dataset,” Avaliable at: http://www.robots.ox.ac.uk/~vgg/data/pets/, Accessed 2018.