Basic Search / Detailed Display

Author: 翁瑞鴻
Weng, Jui-Hung
Thesis Title: 分層隱私保留 K 匿名
Multi-Level Privacy Preserving K-Anonymity
Advisor: 紀博文
Chi, Po-Wen
Committee: 王銘宏
Wang, Ming-Hung
莊允心
Chuang, Yun-Hsin
紀博文
Chi, Po-Wen
Approval Date: 2021/07/30
Degree: 碩士
Master
Department: 資訊工程學系
Department of Computer Science and Information Engineering
Thesis Publication Year: 2022
Academic Year: 110
Language: 英文
Number of pages: 44
Keywords (in Chinese): 匿名化資料隱私K 匿名
Keywords (in English): Anonymization, Data privacy, k-anonymity
DOI URL: http://doi.org/10.6345/NTNU202200936
Thesis Type: Academic thesis/ dissertation
Reference times: Clicks: 78Downloads: 7
Share:
School Collection Retrieve National Library Collection Retrieve Error Report
  • K 匿名是達到資料隱私的一種常見做法,其確保發布的資料集中,任一筆紀錄至少有其他 k - 1 筆與其具有相同屬性值的紀錄。在 K 匿名的保護下,資料發布者會以所有紀錄中最高的隱私要求去設定 k 值,使得每筆紀錄達到相同程度的匿名保護。然而,不同的人或物時常會有不同的隱私要求。有些紀錄需要額外的保護,有些紀錄則僅須較低程度的隱私要求。

    在這篇論文中,我們提出了基於 K 匿名架構的分層隱私保留 K 匿名。其將資料集中的紀錄分至不同群組,並限制各群組符合自己對應的隱私要求。此作法使得資料發布者不必再以最高的隱私要求去設定 k 值,從而減輕匿名化造成的資訊損失。此外,我們提出了一個叢聚的演算法,來達到分層隱私保留 K 匿名的要求。從真實世界資料集的實驗與評估中,我們證實了提出的方法,對比傳統的 K 匿名,除了在設定參數有更大的彈性,也提供了更高的資料可用性。此外,實驗結果也顯示提出的演算法不僅可以有效率地運行在大型資料集上,也不會因為分層的架構產生額外的執行時間。

    k-anonymity is a well-known definition of privacy, which guarantees that any person in the released dataset cannot be distinguished from at least k-1 other individuals. In the protection model, the records are anonymized through generalization or suppression with a fixed value of k. Accordingly, each record has the same level of anonymity in the published dataset. However, different people or items usually have inconsistent privacy requirements. Some records need extra protection while others require a relatively low level of privacy constraint.

    In this paper, we propose Multi-Level Privacy Preserving K-Anonymity, an advanced protection model based on k-anonymity, which divides records into different groups and requires each group to satisfy its respective privacy requirement. Moreover, we present a practical algorithm using clustering techniques to ensure the property. The evaluation on a real-world dataset confirms that the proposed method has the advantages of offering more flexibility in setting privacy parameters and providing higher data utility than traditional k-anonymity.

    Chapter 1 Introduction 1 1.1 The Importance of Data Privacy 1 1.2 Data Publishing and Anonymization 3 1.3 Motivation 5 1.4 Contributions 6 1.5 Organization 7 Chapter 2 Related Work 8 2.1 Privacy-Preserving Data Publishing 8 2.1.1 k-Anonymity 9 2.1.2 ℓ-Diversity 11 2.1.3 t-Closeness 13 2.2 Cost Metrics 16 2.2.1 Discernibility Metric 16 2.2.2 Classification Metric 17 2.2.3 Information Loss Metric 18 2.3 k-Anonymization Approaches 20 2.3.1 Complexity of k-Anonymization 20 2.3.2 Optimization Algorithms 21 2.3.3 Partitioning-based Algorithms 24 2.3.4 Clustering-based Algorithms 25 2.4 k-anonymity with Multiple Privacy Constraints 26 Chapter 3 Multi-Level Privacy Preserving K-anonymity 28 3.1 Definition 28 3.2 Cost Metric 30 3.3 Greedy Clustering-Based Algorithm 32 Chapter 4 Evaluation 35 4.1 Setup 35 4.2 Comparison with Traditional k-anonymity 36 4.3 Scalability of |K| 38 Chapter 5 Conclusion 40 References 41

    [1] S. D. Warren and L. D. Brandeis, “The right to privacy,” Harvard Law Review, vol. 4, no. 5, pp. 193–220, 1890.
    [2] A. Lukács, “what is privacy? the history and definition of privacy,” 2016.
    [3] J. Verble, “The nsa and edward snowden: Surveillance in the 21st century,” SIGCAS Comput. Soc., vol. 44, p. 14–20, Oct. 2014.
    [4] L. Sweeney, “Simple demographics often identify people uniquely,” Health (San Francisco), vol. 671, no. 2000, pp. 1–34, 2000.
    [5] L. Sweeney, “k-anonymity: A model for protecting privacy,” Int. J. Uncertain. Fuzziness Knowl.-Based Syst., vol. 10, p. 557–570, Oct. 2002.
    [6] P. Samarati and L. Sweeney, “Protecting privacy when disclosing information: kanonymity and its enforcement through generalization and suppression,” 1998.
    [7] A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam, “ℓ-diversity: Privacy beyond k-anonymity,” ACM Trans. Knowl. Discov. Data, vol. 1, p. 3–es, Mar. 2007.
    [8] N. Li, T. Li, and S. Venkatasubramanian, “t-closeness: Privacy beyond k-anonymity and ℓ-diversity,” in 2007 IEEE 23rd International Conference on Data Engineering, pp. 106–115, 2007.
    [9] L. Sweeney, “Achieving k-anonymity privacy protection using generalization and suppression,” Int. J. Uncertain. Fuzziness Knowl.-Based Syst., vol. 10, p. 571–588, Oct. 2002.
    [10] R. Agrawal and R. Srikant, “Privacy-preserving data mining,” in Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD ’00, (New York, NY, USA), p. 439–450, Association for Computing Machinery, 2000.
    [11] Z. Huang, W. Du, and B. Chen, “Deriving private information from randomized data,” in Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, SIGMOD ’05, (New York, NY, USA), p. 37–48, Association for Computing Machinery, 2005.
    [12] B. Gedik and L. Liu, “Protecting location privacy with personalized k-anonymity: Architecture and algorithms,” IEEE Transactions on Mobile Computing, vol. 7, no. 1, pp. 1–18, 2008.
    [13] D. Di Castro, L. Lewin-Eytan, Y. Maarek, R. Wolff, and E. Zohar, “Enforcing kanonymity in web mail auditing,” in Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, WSDM ’16, (New York, NY, USA), p. 327–336, Association for Computing Machinery, 2016.
    [14] B. Zhou, J. Pei, and W. Luk, “A brief survey on anonymization techniques for privacy preserving publishing of social network data,” SIGKDD Explor. Newsl., vol. 10, p. 12–22, Dec. 2008.
    [15] European Parliament and Council of the European Union, “Regulation (eu) 2016/679 of the european parliament and of the council of 27 april 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46/ec (general data protection regulation).” Available at https://eur-lex.europa.eu/legal-content/EN/TXT/ PDF/?uri=CELEX:32016R0679.
    [16] C. Clifton and T. Tassa, “On syntactic anonymity and differential privacy,” in 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW), pp. 88–93, 2013.
    [17] B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu, “Privacy-preserving data publishing: A survey of recent developments,” ACM Comput. Surv., vol. 42, June 2010.
    [18] R. J. Bayardo and Rakesh Agrawal, “Data privacy through optimal kanonymization,” in 21st International Conference on Data Engineering (ICDE’05), pp. 217–228, 2005.
    [19] V. S. Iyengar, “Transforming data to satisfy privacy constraints,” in Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’02, (New York, NY, USA), p. 279–288, Association for Computing Machinery, 2002.
    [20] J.-W. Byun, A. Kamra, E. Bertino, and N. Li, “Efficient k-anonymization using clustering techniques,” in Proceedings of the 12th International Conference on Database Systems for Advanced Applications, DASFAA’07, (Berlin, Heidelberg), p. 188–200, Springer-Verlag, 2007.
    [21] A. Meyerson and R. Williams, “On the complexity of optimal k-anonymity,” in Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’04, (New York, NY, USA), p. 223–228, Association for Computing Machinery, 2004.
    [22] G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, and A. Zhu, “Approximation algorithms for k-anonymity,” Journal of Privacy Technology (JOPT), 2005.
    [23] K. LeFevre, D. J. DeWitt, and R. Ramakrishnan, “Mondrian multidimensional kanonymity,” in 22nd International Conference on Data Engineering (ICDE’06), pp. 25–25, 2006.
    [24] Y. Liang and R. Samavi, “Optimization-based k-anonymity algorithms,” Computers Security, vol. 93, p. 101753, 2020.
    [25] X. Hu, Z. Sun, Y. Wu, W. Hu, and J. Dong, “K-anonymity based on sensitive tuples,” in 2009 First International Workshop on Database Technology and Applications, pp. 91–94, 2009.
    [26] X. Xiao and Y. Tao, “Personalized privacy preservation,” in Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, SIGMOD ’06, (New York, NY, USA), p. 229–240, Association for Computing Machinery, 2006.
    [27] Ministry of the Interior. Republic of China(Taiwan), “Statistics(2019-2020).” Available at https://www.ris.gov.tw/app/en/3910.

    下載圖示
    QR CODE