版權說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權,請進行舉報或認領
文檔簡介
1、K-Anonymity and Other Cluster-Based MethodsGe RuanOct. 11,2007Data Publishing and Data PrivacyoSociety is experiencing exponential growth in the number and variety of data collections containing person-specific information.oThese collected information is valuable both in research and business. Data
2、sharing is common.oPublishing the data may put the respondents privacy in risk.oObjective:nMaximize data utility while limiting disclosure risk to an acceptable levelRelated WorksoStatistical DatabasesnThe most common way is adding noise and still maintaining some statistical invariant.Disadvantages
3、: odestroy the integrity of the dataRelated Works(Contd)oMulti-level DatabasesnData is stored at different security classifications and users having different security clearances. (Denning and Lunt)nEliminating precise inference. Sensitive information is suppressed, i.e. simply not released. (Su and
4、 Ozsoyoglu)Disadvantages:nIt is impossible to consider every possible attacknMany data holders share same data. But their concerns are different.nSuppression can drastically reduce the quality of the data.Related Works (Contd)oComputer SecuritynAccess control and authentication ensure that right peo
5、ple has right authority to the right object at right time and right place.nThats not what we want here. A general doctrine of data privacy is to release all the information as much as the identities of the subjects (people) are protected.K-Anonymity Sweeny came up with a formal protection model name
6、d k-anonymityoWhat is K-Anonymity?nIf the information for each person contained in the release cannot be distinguished from at least k-1 individuals whose information also appears in the release.nEx.If you try to identify a man from a release, but the only information you have is his birth date and
7、gender. There are k people meet the requirement. This is k-Anonymity.Classification of AttributesoKey Attribute: nName, Address, Cell Phonenwhich can uniquely identify an individual directlynAlways removed before release.oQuasi-Identifier: n5-digit ZIP code,Birth date, gendernA set of attributes tha
8、t can be potentially linked with external information to re-identify entitiesn87% of the population in U.S. can be uniquely identified based on these attributes, according to the Census summary data in 1991.nSuppressed or generalizedClassification of Attributes(Contd)DOBSexZipcodeDisease1/21/76Male5
9、3715Heart Disease4/13/86Female53715Hepatitis2/28/76Male53703Brochitis1/21/76Male53703Broken Arm4/13/86Female53706Flu2/28/76Female53706Hang NailNameDOBSexZipcodeAndre1/21/76Male53715Beth1/10/81Female55410Carol10/1/44Female90210Dan2/21/84Male02174Ellen4/19/72Female02237Hospital Patient DataVote Regist
10、ration DataoAndre has heart disease!Classification of Attributes(Contd)oSensitive Attribute: nMedical record, wage,etc.nAlways released directly. These attributes is what the researchers need. It depends on the requirement.K-Anonymity Protection ModeloPT: Private TableoRT,GT1,GT2: Released TableoQI:
11、 Quasi Identifier (Ai,Aj)o(A1,A2,An): AttributesLemma:Attacks Against K-AnonymityoUnsorted Matching AttacknThis attack is based on the order in which tuples appear in the released table.nSolution:oRandomly sort the tuples before releasing.Attacks Against K-Anonymity(Contd)oComplementary Release Atta
12、cknDifferent releases can be linked together to compromise k-anonymity.nSolution:oConsider all of the released tables before release the new one, and try to avoid linking. oOther data holders may release some data that can be used in this kind of attack. Generally, this kind of attack is hard to be
13、prohibited completely.Attacks Against K-Anonymity(Contd)oComplementary Release Attack (Contd)Attacks Against K-Anonymity(Contd)oComplementary Release Attack (Contd)Attacks Against K-Anonymity(Contd)oTemporal Attack (Contd)nAdding or removing tuples may compromise k-anonymity protection.Attacks Again
14、st K-Anonymity(Contd)ZipcodeAgeDisease476*2*Heart Disease476*2*Heart Disease476*2*Heart Disease4790*40Flu4790*40Heart Disease4790*40Cancer476*3*Heart Disease476*3*Cancer476*3*CancerA 3-anonymous patient tableBobZipcodeAge4767827CarlZipcodeAge4767336ok-Anonymity does not provide privacy if:nSensitive
15、 values in an equivalence class lack diversitynThe attacker has background knowledgeHomogeneity AttackBackground Knowledge AttackA. Machanavajjhala et al. l-Diversity: Privacy Beyond k-Anonymity. ICDE 2006l-DiversityoDistinct l-diversitynEach equivalence class has at least l well-represented sensiti
16、ve valuesnLimitation:oDoesnt prevent the probabilistic inference attacksoEx.In one equivalent class, there are ten tuples. In the “Disease” area, one of them is “Cancer”, one is “Heart Disease” and the remaining eight are “Flu”. This satisfies 3-diversity, but the attacker can still affirm that the
17、target persons disease is “Flu” with the accuracy of 70%.A. Machanavajjhala et al. l-Diversity: Privacy Beyond k-Anonymity. ICDE 2006l-Diversity(Contd)oEntropy l-diversitynEach equivalence class not only must have enough different sensitive values, but also the different sensitive values must be dis
18、tributed evenly enough.nIn the formal language of statistic, it means the entropy of the distribution of sensitive values in each equivalence class is at least log(l)nSometimes this maybe too restrictive. When some values are very common, the entropy of the entire table may be very low. This leads t
19、o the less conservative notion of l-diversity.A. Machanavajjhala et al. l-Diversity: Privacy Beyond k-Anonymity. ICDE 2006l-Diversity(Contd)oRecursive (c,l)-diversitynThe most frequent value does not appear too frequentlynr1DP2,QnGround distance for any pair of valuesDP,Q is dependent upon the groun
20、d distances.Earth Movers DistanceoFormulationnP=(p1,p2,pm), Q=(q1,q2,qm)ndij: the ground distance between element i of P and element j of Q.nFind a flow F=fij where fij is the flow of mass from element i of P to element j of Q that minimizes the overall work:subject to the constraints:Earth Movers D
21、istanceoExamplen3k,4k,5k and 3k,4k,5k,6k,7k,8k,9k,10k,11k nMove 1/9 probability for each of the following pairso3k-6k,3k-7k cost: 1/9*(3+4)/8o4k-8k,4k-9k cost: 1/9*(4+5)/8o5k-10k,5k-11k cost: 1/9*(5+6)/8nTotal cost: 1/9*27/8=0.375nWith P2=6k,8k,11k , we can get the total cost is 0.167 0.375. This ma
22、ke more sense than the other two distance calculation method.How to calculate EMDoEMD for numerical attributesnOrdered distancenOrdered-distance is a metricoNon-negative, symmetry, triangle inequalitynLet ri=pi-qi, then DP,Q is calculated as:|( , )1ijijordereddist v vm1121211111 ,(| . |.|)|11|mimjij
23、Drrrrrrrmm P QHow to calculate EMDoEMD for categorical attributesnEqual distancenEqual-distance is a metricnDP,Q is calculated as:( , )1ijequaldist v v11 ,|()()2iimiiiiiiipqpi qiDpqpqpq P QHow to calculate EMD(Contd)oEMD for categorical attributesnHierarchical distancenHierarchical distance is a met
24、ric( , )( , )ijijlevel v vhierarchicaldist v vHHow to calculate EMD(Contd)oEMD for categorical attributes()() 0_()|( )|C Child Nextra Cposextra Nextra C()() 0_()|( )|C Child Nextra Cnegextra Nextra C()cos ()min(_(),_()height Nt Nposextra Nnegextra NH ,cos ()NDt NP QnDP,Q is calculated as: ()if is a
25、leaf()( ) otherwiseiiC Child NpqNextra Nextra CExperimentsoGoalnTo show l-diversity does not provide sufficient privacy protection (the similarity attack).nTo show the efficiency and data quality of using t-closeness are comparable with other privacy measures.oSetupnAdult dataset from UC Irvine ML repositoryn30162 tuples, 9 attributes (2 sensitive attributes)nAlgorithm: IncognitoExperimentsoSimilarity attack (Occupation)n13 of 21 entropy 2-diversity tables are vulnerablen17 of 26 r
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經(jīng)權益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025版美容院會員積分體系合作協(xié)議4篇
- 2025年度教育培訓機構課程開發(fā)及師資培訓合同4篇
- 2025年成都美食研發(fā)上灶師父招聘與新品開發(fā)合同2篇
- 三方產(chǎn)品銷售合同范本(2024版)
- 二零二五年度商業(yè)地產(chǎn)租賃收益權轉讓合同3篇
- 2025年度智慧農(nóng)業(yè)項目采購合同解除協(xié)議2篇
- 二零二五年度鋼管車輛運輸合同車輛保險理賠與費用結算合同3篇
- 2025版動漫主題咖啡廳經(jīng)營管理協(xié)議3篇
- 二零二五年度車輛抵押抵押權轉讓合同范本3篇
- 2025年生態(tài)園區(qū)委托物業(yè)管理合同范本3篇
- 《天潤乳業(yè)營運能力及風險管理問題及完善對策(7900字論文)》
- 醫(yī)院醫(yī)學倫理委員會章程
- xx單位政務云商用密碼應用方案V2.0
- 農(nóng)民專業(yè)合作社財務報表(三張報表)
- 動土作業(yè)專項安全培訓考試試題(帶答案)
- 大學生就業(yè)指導(高職就業(yè)指導課程 )全套教學課件
- 死亡病例討論總結分析
- 第二章 會展的產(chǎn)生與發(fā)展
- 空域規(guī)劃與管理V2.0
- JGT266-2011 泡沫混凝土標準規(guī)范
- 商戶用電申請表
評論
0/150
提交評論