L-Diversity Privacy Beyond K-Anonymityl-多樣性超越K-匿名隱私_第1頁
L-Diversity Privacy Beyond K-Anonymityl-多樣性超越K-匿名隱私_第2頁
L-Diversity Privacy Beyond K-Anonymityl-多樣性超越K-匿名隱私_第3頁
L-Diversity Privacy Beyond K-Anonymityl-多樣性超越K-匿名隱私_第4頁
L-Diversity Privacy Beyond K-Anonymityl-多樣性超越K-匿名隱私_第5頁
已閱讀5頁,還剩34頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)

文檔簡介

L-Diversity:PrivacyBeyondK-AnonymityAshwinMachanavajjhala,JohannesGehrke,DanielKifer,MuthuramakrishnanVenkitasubramaniamCS295Dataprivacyandconfidentiality-ViragKothariOverviewIntroductionAttacksonk-AnonymityBayesOptimalPrivacyl-DiversityPrinciplel-DiversityInstantiationsMultipleSensitiveAttributesMonotonicityPropertyUtilityConclusionCS295DataprivacyandconfidentialityBackgroundLargeamountofperson-specificdatahasbeencollectedinrecentyearsBothbygovernmentsandbyprivateentitiesDataandknowledgeextractedbydataminingtechniquesrepresentakeyassettothesocietyAnalyzingtrendsandpatternsFormulatingpublicpoliciesLawsandregulationsrequirethatsomecollecteddatamustbemadepublicForexample,CensusdataCS295DataprivacyandconfidentialityWhatAboutPrivacy?Firstthought:anonymizethedataHow?Remove“personallyidentifyinginformation”(PII)Name,SocialSecuritynumber,phonenumber,email,address…AnythingthatidentifiesthepersondirectlyIsthisenough?CS295DataprivacyandconfidentialityRe-identificationbyLinkingNameZipcodeAgeSexAlice4767729FBob4798365MCarol4767722FDan4753223MEllen4678943FVoterregistrationdataQIDSAZipcodeAgeSexDisease4767729FOvarianCancer4760222FOvarianCancer4767827MProstateCancer4790543MFlu4790952FHeartDisease4790647MHeartDiseaseIDNameAliceBettyCharlesDavidEmilyFredMicrodataCS295DataprivacyandconfidentialityClassificationofAttributesKeyattributesName,address,phonenumber-uniquelyidentifying!AlwaysremovedbeforereleaseQuasi-identifiers(5-digitZIPcode,birthdate,gender)uniquelyidentify87%ofthepopulationintheU.S.CanbeusedforlinkinganonymizeddatasetwithotherdatasetsCS295DataprivacyandconfidentialitySensitiveattributesMedicalrecords,salaries,etc.Theseattributesiswhattheresearchersneed,sotheyarealwaysreleaseddirectlyNameDOBGenderZipcodeDiseaseAndre1/21/76Male53715HeartDiseaseBeth4/13/86Female53715HepatitisCarol2/28/76Male53703BrochitisDan1/21/76Male53703BrokenArmEllen4/13/86Female53706FluEric2/28/76Female53706HangNailKeyAttributeQuasi-identifierSensitiveattributeCS295DataprivacyandconfidentialityK-AnonymityTheinformationforeachpersoncontainedinthereleasedtablecannotbedistinguishedfromatleastk-1individualswhoseinformationalsoappearsinthereleaseExample:youtrytoidentifyamaninthereleasedtable,buttheonlyinformationyouhaveishisbirthdateandgender.Therearekmeninthetablewiththesamebirthdateandgender.Anyquasi-identifierpresentinthereleasedtablemustappearinatleastkrecordsCS295DataprivacyandconfidentialityAttacksonK-anonymityHomogeneityAttacksBackgroundKnowledgeAttacksCS295DataprivacyandconfidentialityHomogeneityAttacksSinceAliceisBob’sneighbor,sheknowsthatBobisa31-year-oldAmericanmalewholivesinthezipcode13053.Therefore,AliceknowsthatBob’srecordnumberis9,10,11,or12.ShecanalsoseefromthedatathatBobhascancer.CS295DataprivacyandconfidentialityOriginalTable4-anonymousTableBackgroundKnowledgeAttacksAliceknowsthatUmekoisa21year-oldJapanesefemalewhocurrentlylivesinzipcode13068.Basedonthisinformation,AlicelearnsthatUmeko’sinformationiscontainedinrecordnumber1,2,3,or4.Withadditionalinformation,UmekobeingJapaneseandAliceknowingthatJapanesehaveanextremelylowincidenceofheartdisease,AlicecanconcludedwithnearcertaintythatUmekohasaviralinfection.CS295DataprivacyandconfidentialityOriginalTable4-anonymousTableWeaknessesink-anonymoustablesGiventhesetwoweaknessesthereneedstobeastrongermethodtoensureprivacy.Basedonthis,theauthorsbegintobuildtheirsolution.CS295DataprivacyandconfidentialityAdversariesBackgroundKnowledgeTheadversaryhasaccesstoT*andknowsitwasderivedfromtableT.Thedomainofeachattributeisalsoknown.Theadversarymayalsohaveinstancelevelbackgroundknowledge.Theadversarymayalsoknowdemographicbackgrounddatasuchastheprobabilityofaconditiongivenanage.CS295DataprivacyandconfidentialityBayes-OptimalPrivacyModelsbackgroundknowledgeasaprobabilitydistributionovertheattributesandusesBayesianinferencetechniquestoreasonaboutprivacy.However,Bayes-OptimalPrivacyisonlyusedasastartingpointforadefinitionofprivacysothereare2simplifyingassumptionsmade.Tisasimplerandomsampleofalargerpopulation.AssumeasinglesensitivevalueCS295DataprivacyandconfidentialityPriorbeliefisdefinedas:Posteriorbeliefisdefinedas:CS295DataprivacyandconfidentialityPriorbeliefandposteriorbeliefareusedareusedtogaugetheattacker’ssuccess.CalculatingtheposteriorbeliefCS295DataprivacyandconfidentialityPrivacyPrinciplesCS295DataprivacyandconfidentialityPositiveDisclosure:PublishingthetableT?thatwasderivedfromTresultsinapositivedisclosureiftheadversarycancorrectlyidentifythevalueofasensitiveattributewithhighprobability.Negativedisclosure:PublishingthetableT?thatwasderivedfromTresultsinanegativedisclosureiftheadversarycancorrectlyeliminatesomepossiblevaluesofthesensitiveattribute(withhighprobability)CS295DataprivacyandconfidentialityDrawbackstoBayes-OptimalPrivacyInsufficientknowledgebecausethepublisherisunlikelytoknowthefulldistributionofsensitiveandnon-sensitiveattributesoverthefullpopulation.Thedatapublisherdoesnotknowtheknowledgeofawouldbeattacker.Instancelevelknowledgecannotbemodeled.TherearelikelytobemanyadversarieswithvaryinglevelsofknowledgeCS295DataprivacyandconfidentialityL-DiversityPrincipleTheorem3.1definesamethodofcalculatingtheobservedbeliefoftheadversaryInthecaseofpositivedisclosures,AlicewantstodetermineBob’ssensitiveattributewithaveryhighprobability.GivenTheorem3.1thiscanonlyhappenwhen:CS295DataprivacyandconfidentialityTheconditionofequation2canbesatisfiedbyalackofdiversityinthesensitiveattribute(s)and/orstrongbackgroundknowledge.

Lackofdiversityinthesensitiveattributecanbedescribedasfollows:

Equation3indicatesthatalmostalltupleshavethesamevalueasthesensitivevalueandthereforetheposteriorbeliefisalmost1.ToensurediversityandtoguardagainstEquation3istorequirethataq?-blockhasatleastl≥2differentsensitivevaluessuchthatthelmostfrequentvalues(intheq?-block)haveroughlythesamefrequency.Wesaythatsuchaq?-blockiswell-representedbylsensitivevalues.CS295DataprivacyandconfidentialityThisequationstatesthatBobwithquasi-identifiert[Q]=qismuchlesslikelytohavesensitivevalues′thananyotherindividualintheq?-block.CS295DataprivacyandconfidentialityAnattackermaystillbeabletousebackgroundknowledgewhenthefollowingistrueCS295DataprivacyandconfidentialitySupposeweconsideranequivalenceclassfortheexampleofbackgroundknowledgeattackshownearlier.HereAlicehasbackgroundknowledgethatJapanesepeoplearelesspronetoheartdisease.∴f(s′|q)=0(∵TheprobabilitythatUmekohasheartdiseasegivenhernonsensitiveattributeas‘Japanese’is0).Also,f(s′|q*)=2/4∴f(s′|q)/f(s′|q*)=0.

RevisitingtheexampleInspiteofsuchbackgroundknowledge,iftherearel“wellrepresented”sensitivevaluesinaq?-block,thenAliceneedsl?1damagingpiecesofbackgroundknowledgetoeliminatel?1possiblesensitivevaluesandinferapositivedisclosure!CS295DataprivacyandconfidentialityL-DiversityPrincipleGiventhepreviousdiscussions,wearriveatthel-Diversityprinciple:CS295DataprivacyandconfidentialityRevisitingtheexampleUsinga3-diversetable,wenolongerareabletotellifBob(a31yearoldAmericanfromzipcode13053)hascancer.WealsocannottellifUmeko(a21yearoldJapanesefromzipcode13068)hasaviralinfectionorcancer.CS295Dataprivacyandconfidentiality4-anonymoustable3diversetableDistinctl-DiversityEachequivalenceclasshasatleastlwell-representedsensitivevaluesDoesn’tpreventprobabilisticinferenceattacks10records8recordshaveHIV2recordshaveothervaluesCS295DataprivacyandconfidentialityL-DiversityInstantiationsEntropyl-DiversityRecursive(c,l)DiversityPositiveDisclosure-Recursive(c,l)-DiversityNegative/PositiveDisclosure-Recursive(c1,c2,l)-DiversityCS295DataprivacyandconfidentialityEntropyl-DiversityHereeveryq?-blockhasatleastldistinctvaluesforthesensitiveattributeThisimpliesthatforatabletobeentropyl-Diverse,theentropyoftheentiretablemustbeatleastlog(l).Therefore,entropyl-Diversitymaybetoorestrictivetobepractical.CS295DataprivacyandconfidentialityRecursive(c,l)DiversityLessrestrictivethanentropyl-diversityLets1,…,smbethepossiblevaluesofsensitiveattributeSinaq*-blockAssume,wesortthecountsn(q*,s1),...,n(q*,sm)indescendingorderwiththeresultingsequencer1,…,rm.Wecansayaq*-blockisrecursive(c,l)-diverseifr1<c(r2+….+rm)foraspecifiedconstantc.CS295DataprivacyandconfidentialityPositiveDisclosure-Recursive(c,l)-DiversitySomecasesofpositivedisclosuremaybeacceptablesuchaswhenmedicalconditionis“healthy”.Toallowthesevaluestheauthorsdefinepd-recursive(c,l)-diversity

CS295DataprivacyandconfidentialityNegative/PositiveDisclosure-Recursive(c1,c2,l)-DiversityNpd-recursive(c1,c2,l)-diversitypreventsnegativedisclosurebyrequiringattributesforwhichnegativedisclosureisnotallowedtooccur.MultipleSensitiveAttributesPreviousdiscussionsonlyaddressedsinglesensitiveattributes.SupposeSandVaretwosensitiveattributes,andconsidertheq*-blockwiththefollowingtuples:

{(q,s1,v1),(q,s1,v2),(q,s2,v3),(q,s3,v3)}.Thisq*-blockis3-diverse(actuallyrecursive(2,3)-diverse)withrespecttoS(ignoringV)and3-diversewithrespecttoV(ignoringS).However,ifweknowthatBobisinthisblockandhisvalueforSisnots1thenhisvalueforattributeVcannotbev1orv2,andthereforemustbev3.Toaddressthisproblemwecanaddtheadditionalsensitiveattributestothequasi-identifier.CS295DataprivacyandconfidentialityImplementingPrivacyPreservingDataPublishingDomaingeneralizationisusedtodefineageneralizationlattice.Fordiscussion,allnon-sensitiveattributesarecombinedintoamulti-dimensionalattribute(Q)wherethebottomelementonthelatticeisthedomainofQandthetopofthelatticeisthedomainwhereeachdimensionofQisgeneralizedtoasinglevalue.CS295DataprivacyandconfidentialityImplementingPrivacyDataPublishing(cont.)ThealgorithmforpublishingshouldfindthepointonthelatticewherethetableT*preservesprivacyandisusefulaspossible.Theusefulness(utility)oftableT*isdiminishedasthedatabecomesmoregeneralized,sothemostutilityisatthebottomofthelattice.CS295DataprivacyandconfidentialityMonotonicityPropertyMonotonicitypropertyisdescribedasastoppingpointinthelatticesearch

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論