首页 | 官方网站   微博 | 高级检索  
     

面向结构化数据集的敏感属性识别与分级算法
引用本文:何文竹,彭长根,王毛妮,丁兴,樊玫玫,丁红发.面向结构化数据集的敏感属性识别与分级算法[J].计算机应用研究,2020,37(10):3077-3082.
作者姓名:何文竹  彭长根  王毛妮  丁兴  樊玫玫  丁红发
作者单位:贵州大学 计算机科学与技术学院,贵阳 550025;贵州大学 公共大数据国家重点实验室,贵阳 550025;贵州大学 数学与统计学院,贵阳 550025;贵州财经大学 信息学院,贵阳550025
基金项目:国家自然科学基金;贵州省科技计划;贵州财经大学科研基金资助项目
摘    要:如何对生产环境中经代码混淆的结构化数据集的敏感属性(字段)进行自动化识别、分类分级,已成为对结构化数据隐私保护的瓶颈。提出一种面向结构化数据集的敏感属性自动化识别与分级算法,利用信息熵定义了属性敏感度,通过对敏感度聚类和属性间关联规则挖掘,将任意结构化数据集的敏感属性进行识别和敏感度量化;通过对敏感属性簇中属性间的互信息相关性和关联规则分析,对敏感属性进行分组并量化其平均敏感度,实现敏感属性的分类分级。实验表明,该算法可识别、分类、分级任意结构化数据集的敏感属性,效率和精确率更高;对比分析表明,该算法可同时实现敏感属性的识别与分级,无须预知属性特征、敏感特征字典,兼顾了属性间的相关性和关联关系。

关 键 词:隐私保护  敏感属性识别与分级  最大熵  关联规则  互信息
收稿时间:2019/5/16 0:00:00
修稿时间:2020/9/7 0:00:00

Sensitive attribute recognition and classification algorithm for structure dataset
He wenzhu,Peng changgen,Wang maoni,Ding xing,Fan meimei and Ding hongfa.Sensitive attribute recognition and classification algorithm for structure dataset[J].Application Research of Computers,2020,37(10):3077-3082.
Authors:He wenzhu  Peng changgen  Wang maoni  Ding xing  Fan meimei and Ding hongfa
Affiliation:Guzihou University,,,,,
Abstract:How to automatically identify and classify sensitive attributes(fields) of structured datasets, which are confused by code in the production environment, has become a bottleneck for structured data privacy protection. This paper proposed an automatic recognition and classification algorithm of sensitive attributes. This algorithm introduced information entropy to define the sensitivity of the attribute, identified and quantified sensitivity of the sensitive attributes by sensitivity clustering of attributes and association rules mining among attributes. Further, by analyzing the mutual information correlations and association rules among grouped the sensitive attribute clusters, the sensitive attributes and quantified the average sensitive metrics of these groups. Thus, this algorithm achieved the classifying of the sensitive attributes. Experiments show that the algorithm can identify the sensitive attributes of any structured dataset and classify the sensitive attributes, with higher efficiency and accuracy. Comparison shows that this algorithm can achieve both recognition and classification of sensitive attributes. It''s not necessary to know the characteristics of attributes and sensitive feature dictionary. Both of the correlation and association among attributes are took into account by this algorithm.
Keywords:privacy protection  sensitive attribute identification and classification  maximum entropy  association rule  mutual information
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号