研究生: |
翁瑞鴻 Weng, Jui-Hung |
---|---|
論文名稱: |
分層隱私保留 K 匿名 Multi-Level Privacy Preserving K-Anonymity |
指導教授: |
紀博文
Chi, Po-Wen |
口試委員: |
王銘宏
Wang, Ming-Hung 莊允心 Chuang, Yun-Hsin 紀博文 Chi, Po-Wen |
口試日期: | 2021/07/30 |
學位類別: |
碩士 Master |
系所名稱: |
資訊工程學系 Department of Computer Science and Information Engineering |
論文出版年: | 2022 |
畢業學年度: | 110 |
語文別: | 英文 |
論文頁數: | 44 |
中文關鍵詞: | 匿名化 、資料隱私 、K 匿名 |
英文關鍵詞: | Anonymization, Data privacy, k-anonymity |
DOI URL: | http://doi.org/10.6345/NTNU202200936 |
論文種類: | 學術論文 |
相關次數: | 點閱:60 下載:7 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
K 匿名是達到資料隱私的一種常見做法,其確保發布的資料集中,任一筆紀錄至少有其他 k - 1 筆與其具有相同屬性值的紀錄。在 K 匿名的保護下,資料發布者會以所有紀錄中最高的隱私要求去設定 k 值,使得每筆紀錄達到相同程度的匿名保護。然而,不同的人或物時常會有不同的隱私要求。有些紀錄需要額外的保護,有些紀錄則僅須較低程度的隱私要求。
在這篇論文中,我們提出了基於 K 匿名架構的分層隱私保留 K 匿名。其將資料集中的紀錄分至不同群組,並限制各群組符合自己對應的隱私要求。此作法使得資料發布者不必再以最高的隱私要求去設定 k 值,從而減輕匿名化造成的資訊損失。此外,我們提出了一個叢聚的演算法,來達到分層隱私保留 K 匿名的要求。從真實世界資料集的實驗與評估中,我們證實了提出的方法,對比傳統的 K 匿名,除了在設定參數有更大的彈性,也提供了更高的資料可用性。此外,實驗結果也顯示提出的演算法不僅可以有效率地運行在大型資料集上,也不會因為分層的架構產生額外的執行時間。
k-anonymity is a well-known definition of privacy, which guarantees that any person in the released dataset cannot be distinguished from at least k-1 other individuals. In the protection model, the records are anonymized through generalization or suppression with a fixed value of k. Accordingly, each record has the same level of anonymity in the published dataset. However, different people or items usually have inconsistent privacy requirements. Some records need extra protection while others require a relatively low level of privacy constraint.
In this paper, we propose Multi-Level Privacy Preserving K-Anonymity, an advanced protection model based on k-anonymity, which divides records into different groups and requires each group to satisfy its respective privacy requirement. Moreover, we present a practical algorithm using clustering techniques to ensure the property. The evaluation on a real-world dataset confirms that the proposed method has the advantages of offering more flexibility in setting privacy parameters and providing higher data utility than traditional k-anonymity.
[1] S. D. Warren and L. D. Brandeis, “The right to privacy,” Harvard Law Review, vol. 4, no. 5, pp. 193–220, 1890.
[2] A. Lukács, “what is privacy? the history and definition of privacy,” 2016.
[3] J. Verble, “The nsa and edward snowden: Surveillance in the 21st century,” SIGCAS Comput. Soc., vol. 44, p. 14–20, Oct. 2014.
[4] L. Sweeney, “Simple demographics often identify people uniquely,” Health (San Francisco), vol. 671, no. 2000, pp. 1–34, 2000.
[5] L. Sweeney, “k-anonymity: A model for protecting privacy,” Int. J. Uncertain. Fuzziness Knowl.-Based Syst., vol. 10, p. 557–570, Oct. 2002.
[6] P. Samarati and L. Sweeney, “Protecting privacy when disclosing information: kanonymity and its enforcement through generalization and suppression,” 1998.
[7] A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam, “ℓ-diversity: Privacy beyond k-anonymity,” ACM Trans. Knowl. Discov. Data, vol. 1, p. 3–es, Mar. 2007.
[8] N. Li, T. Li, and S. Venkatasubramanian, “t-closeness: Privacy beyond k-anonymity and ℓ-diversity,” in 2007 IEEE 23rd International Conference on Data Engineering, pp. 106–115, 2007.
[9] L. Sweeney, “Achieving k-anonymity privacy protection using generalization and suppression,” Int. J. Uncertain. Fuzziness Knowl.-Based Syst., vol. 10, p. 571–588, Oct. 2002.
[10] R. Agrawal and R. Srikant, “Privacy-preserving data mining,” in Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD ’00, (New York, NY, USA), p. 439–450, Association for Computing Machinery, 2000.
[11] Z. Huang, W. Du, and B. Chen, “Deriving private information from randomized data,” in Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, SIGMOD ’05, (New York, NY, USA), p. 37–48, Association for Computing Machinery, 2005.
[12] B. Gedik and L. Liu, “Protecting location privacy with personalized k-anonymity: Architecture and algorithms,” IEEE Transactions on Mobile Computing, vol. 7, no. 1, pp. 1–18, 2008.
[13] D. Di Castro, L. Lewin-Eytan, Y. Maarek, R. Wolff, and E. Zohar, “Enforcing kanonymity in web mail auditing,” in Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, WSDM ’16, (New York, NY, USA), p. 327–336, Association for Computing Machinery, 2016.
[14] B. Zhou, J. Pei, and W. Luk, “A brief survey on anonymization techniques for privacy preserving publishing of social network data,” SIGKDD Explor. Newsl., vol. 10, p. 12–22, Dec. 2008.
[15] European Parliament and Council of the European Union, “Regulation (eu) 2016/679 of the european parliament and of the council of 27 april 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46/ec (general data protection regulation).” Available at https://eur-lex.europa.eu/legal-content/EN/TXT/ PDF/?uri=CELEX:32016R0679.
[16] C. Clifton and T. Tassa, “On syntactic anonymity and differential privacy,” in 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW), pp. 88–93, 2013.
[17] B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu, “Privacy-preserving data publishing: A survey of recent developments,” ACM Comput. Surv., vol. 42, June 2010.
[18] R. J. Bayardo and Rakesh Agrawal, “Data privacy through optimal kanonymization,” in 21st International Conference on Data Engineering (ICDE’05), pp. 217–228, 2005.
[19] V. S. Iyengar, “Transforming data to satisfy privacy constraints,” in Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’02, (New York, NY, USA), p. 279–288, Association for Computing Machinery, 2002.
[20] J.-W. Byun, A. Kamra, E. Bertino, and N. Li, “Efficient k-anonymization using clustering techniques,” in Proceedings of the 12th International Conference on Database Systems for Advanced Applications, DASFAA’07, (Berlin, Heidelberg), p. 188–200, Springer-Verlag, 2007.
[21] A. Meyerson and R. Williams, “On the complexity of optimal k-anonymity,” in Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’04, (New York, NY, USA), p. 223–228, Association for Computing Machinery, 2004.
[22] G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, and A. Zhu, “Approximation algorithms for k-anonymity,” Journal of Privacy Technology (JOPT), 2005.
[23] K. LeFevre, D. J. DeWitt, and R. Ramakrishnan, “Mondrian multidimensional kanonymity,” in 22nd International Conference on Data Engineering (ICDE’06), pp. 25–25, 2006.
[24] Y. Liang and R. Samavi, “Optimization-based k-anonymity algorithms,” Computers Security, vol. 93, p. 101753, 2020.
[25] X. Hu, Z. Sun, Y. Wu, W. Hu, and J. Dong, “K-anonymity based on sensitive tuples,” in 2009 First International Workshop on Database Technology and Applications, pp. 91–94, 2009.
[26] X. Xiao and Y. Tao, “Personalized privacy preservation,” in Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, SIGMOD ’06, (New York, NY, USA), p. 229–240, Association for Computing Machinery, 2006.
[27] Ministry of the Interior. Republic of China(Taiwan), “Statistics(2019-2020).” Available at https://www.ris.gov.tw/app/en/3910.