數(shù)據(jù)與模型安全 課件 第8周:數(shù)據(jù)抽取和模型竊取_第1頁
數(shù)據(jù)與模型安全 課件 第8周:數(shù)據(jù)抽取和模型竊取_第2頁
數(shù)據(jù)與模型安全 課件 第8周:數(shù)據(jù)抽取和模型竊取_第3頁
數(shù)據(jù)與模型安全 課件 第8周:數(shù)據(jù)抽取和模型竊取_第4頁
數(shù)據(jù)與模型安全 課件 第8周:數(shù)據(jù)抽取和模型竊取_第5頁
已閱讀5頁,還剩55頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權,請進行舉報或認領

文檔簡介

Data

Extraction

and

Model

Stealing姜育剛,馬興軍,吳祖煊Recap:

week

7A

Brief

History

of

Backdoor

LearningBackdoor

AttacksBackdoor

DefensesFuture

ResearchThis

WeekData

Extraction

Attack

&

DefenseModel

Stealing

AttackFuture

ResearchThis

WeekData

Extraction

Attack

&

DefenseModel

Stealing

AttackFuture

ResearchData

Extraction

Attack通過模型逆向得到訓練數(shù)據(jù):8001/dss/imageClassify

TerminologyThe

following

terms

describe

the

same

thing:Data

Extraction

AttackData

Stealing

AttackTraining

Data

Extraction

AttackModel

Memorization

AttackModel

Inversion

AttackSecurity

ThreatsMysocialsecuritynumberis078-Personal

Info

LeakageSensitive

Info

LeakageThreats

to

National

SecurityIllegal

Data

Trading…Memorization

of

DNNsEvidence

1:

DNN

learns

different

levels

of

representationsMemorization

of

DNNsEvidence

2:

DNN

can

memorize

random

labels/pixels真實標簽隨機標簽亂序像素隨機像素高斯噪聲Zhang,Chiyuan,etal.“Understandingdeeplearningrequiresrethinkinggeneralization.”ICLR

2017.Memorization

of

DNNsEvidence

3:

The

success

of

GANs

and

diffusion

models/;

/

Intended

vs.

Unintended

MemorizationIntended

MemorizationTask-relatedStatisticsInputs

and

LabelsArpitetal.“Acloserlookatmemorizationindeepnetworks.”

ICML,2017.

Carlinietal.“Thesecretsharer:Evaluatingandtestingunintendedmemorizationinneuralnetworks.”USENIXSecurity,2019.第一層Filter正常CIFAR-10第一層Filter隨機標注CIFAR-10自然語言翻譯模型記憶:“我的社保號碼是xxxx”Unintended

MemorizationTask-irrelevant

but

memorizedEven

appear

only

a

few

times出現(xiàn)4次就能全記住現(xiàn)有數(shù)據(jù)竊取攻擊黑盒竊取主動測試:煤礦里的金絲雀“隨機號碼為****”“我的社保號碼為****”主動注入,然后先兆數(shù)據(jù)在語言模型中的“曝光度”(Exposure)Carlinietal.“Thesecretsharer:Evaluatingandtestingunintendedmemorizationinneuralnetworks.”USENIXSecurity,2019.意外記憶測試和量化:’先兆’黑盒竊取針對通用語言模型:逆向出大量的:名字、手機號、郵箱、社保號等大模型比小模型更容易記住這些信息即使只在一個文檔里出現(xiàn)也能被記住Carlini,Nicholas,etal.“Extractingtrainingdatafromlargelanguagemodels.”

USENIXSecurity,2021.訓練數(shù)據(jù)萃取攻擊Training

Data

Extraction

AttackDefinition

of

MemorizationCarlini,Nicholas,etal.“Extractingtrainingdatafromlargelanguagemodels.”

USENIXSecurity,2021.模型知識提取k-逼真記憶攻擊步驟Carlini,Nicholas,etal.“Extractingtrainingdatafromlargelanguagemodels.”

USENIXSecurity,2021.步驟1:生成大量文本;步驟2:文本篩選和確認實驗結果604條“意外”記憶只在一個文檔里出現(xiàn)的記憶模型越大記憶越強Memorization

ofDiffusion

Models美國馬里蘭大學和紐約大學聯(lián)合研究發(fā)現(xiàn),生成擴散模型會記憶原始訓練數(shù)據(jù),導致在特定文本提示下,泄露原始數(shù)據(jù)生成的:原始的:Memorization

ofDiffusion

ModelsDefinition

of

Replication:Wesaythatageneratedimagehasreplicatedcontentifitcontainsanobject(eitherintheforegroundorbackground)thatappearsidenticallyinatrainingimage,neglectingminorvariationsinappearancethatcouldresultfromdataaugmentation.Somepalli,Gowthami,etal."Diffusionartordigitalforgery?investigatingdatareplicationindiffusionmodels."

CVPR.2023.Memorization

ofDiffusion

ModelsSomepalli,Gowthami,etal."Diffusionartordigitalforgery?investigatingdatareplicationindiffusionmodels."

CVPR.2023.OriginalSegmixDiagonal

OutpaintingPatch

OutpaintingCreate

Synthetic

and

Real

DatasetsExisting

image

retrieval

datasets:OxfordParisINSTREGPR1200Memorization

ofDiffusion

ModelsSomepalli,Gowthami,etal."Diffusionartordigitalforgery?investigatingdatareplicationindiffusionmodels."

CVPR.2023.Train

Image

Retrieval

ModelsMemorization

ofDiffusion

ModelsSomepalli,Gowthami,etal."Diffusionartordigitalforgery?investigatingdatareplicationindiffusionmodels."

CVPR.2023.Similarity

metric:

inner

product

token-wise

inner

productDiffusion

model:

DDPMDataset:

Celeb-AThe

top-2

matches

of

diffusion

models

trained

on

300,

3000,

and

30000

images

(the

full

set

is

30000).Results:Green:

copyBlue:

close

but

no

exact

copyOthers:

similar

but

not

the

sameMemorization

ofDiffusion

ModelsSomepalli,Gowthami,etal."Diffusionartordigitalforgery?investigatingdatareplicationindiffusionmodels."

CVPR.2023.Gen-train

vs

train-train

similarity

score

distribution數(shù)據(jù)越少Copy越多Memorization

ofDiffusion

ModelsSomepalli,Gowthami,etal."Diffusionartordigitalforgery?investigatingdatareplicationindiffusionmodels."

CVPR.2023.Many

close

copy

but

no

exact

match

(similarity

score

<0.65)Case

study:

ImageNet

LDMMost

similar:

theatercurtain,peacock,andbananasLeast

similar:

sealion,bee,andswingMemorization

ofDiffusion

ModelsSomepalli,Gowthami,etal."Diffusionartordigitalforgery?investigatingdatareplicationindiffusionmodels."

CVPR.2023.Case

study:

StableDiffusionLAIONAestheticsv26+:

12M

imagesRandom

select

9000

images

as

source

and

use

their

captions

to

promptMemorization

ofDiffusion

ModelsSomepalli,Gowthami,etal."Diffusionartordigitalforgery?investigatingdatareplicationindiffusionmodels."

CVPR.2023.Case

study:

StableDiffusionSome

keywords

(those

in

red)

are

associated

with

certain

fixed

patterns.

Key

wordsMemorization

ofDiffusion

ModelsSomepalli,Gowthami,etal."Diffusionartordigitalforgery?investigatingdatareplicationindiffusionmodels."

CVPR.2023.Case

study:

StableDiffusionStyle

copying

using

text

prompt:

<Name

of

the

painting>

by

<name

of

the

artist>Memorization

of

Large

Language

Models

(LLMs)Shi,Weijia,etal."DetectingPretrainingDatafromLargeLanguageModels."

arXivpreprintarXiv:2310.16789

(2023).PretrainingdatadetectionMIN-K%PROBMemorization

of

Large

Language

Models

(LLMs)Shi,Weijia,etal."DetectingPretrainingDatafromLargeLanguageModels."

arXivpreprintarXiv:2310.16789

(2023).Detection

on

WIKIMIAA

dynamic

benchmark:

WIKIMIA白盒竊取白盒竊取需要利用梯度信息,也稱梯度逆向攻擊(Gradient

Inversion

Attack)針對梯度共享的訓練:分布式訓練聯(lián)邦學習并行訓練無中心化訓練兩種分布式訓練范式白盒竊取白盒竊取需要利用梯度信息,也稱梯度逆向攻擊(Gradient

Inversion

Attack)Zhang

et

al.“ASurveyonGradientInversion:Attacks,DefensesandFutureDirections.”

IJCAI

2022.迭代逆向(逐層)遞歸逆向逼近反推白盒竊?。旱嫦虻嫦颍和ㄟ^構造數(shù)據(jù)來接近真實梯度真實梯度,假設已知一次前傳兩次后傳生成數(shù)據(jù)產(chǎn)生的梯度

Zhang

et

al.“ASurveyonGradientInversion:Attacks,DefensesandFutureDirections.”

IJCAI

2022.白盒竊?。旱嫦蛞延泄ぷ鲄R總Zhang

et

al.“ASurveyonGradientInversion:Attacks,DefensesandFutureDirections.”

IJCAI

2022.白盒竊?。哼f歸逆向遞歸逆向:基于真實梯度追層逆向推導關鍵點:圖像大?。?2x32)Batch大?。ù蠖酁?)模型大小真實梯度,已知Zhang

et

al.“ASurveyonGradientInversion:Attacks,DefensesandFutureDirections.”

IJCAI

2022.白盒竊?。哼f歸逆向已有工作匯總Zhang

et

al.“ASurveyonGradientInversion:Attacks,DefensesandFutureDirections.”

IJCAI

2022.白盒防御已有工作匯總Zhang

et

al.“ASurveyonGradientInversion:Attacks,DefensesandFutureDirections.”

IJCAI

2022.This

WeekData

Extraction

Attack

&

DefenseModel

Stealing

AttackFuture

ResearchAI模型訓練代價高昂BERTGoogle$160萬大規(guī)模、高性能的AI模型訓練耗費巨大數(shù)據(jù)資源計算資源人力資源模型竊取的動機巨大的商業(yè)價值盡量保持模型性能不希望被發(fā)現(xiàn)寶貴的AI模型模型竊取為其所用模型竊取的方式輸入輸出模型微調(diào)模型剪枝竊取攻擊StealingmachinelearningmodelsviapredictionAPIs,

USENIXSecurity,

2016;

Practicalblack-boxattacksagainstmachinelearning,

ASIACCS,

2017;

Knockoffnets:Stealingfunctionalityofblack-boxmodels,

CVPR,

2019;

Maze:Data-free

modelstealing

attackusingzeroth-ordergradientestimation,

CVPR,

2021;基于方程式求解的攻擊攻擊思路示例基于方程式求解的攻擊Tramèr,Florian,etal."Stealingmachinelearningmodelsviaprediction{APIs}."

USENIXSecurity,2016.100%竊取某些商業(yè)模型所需的查詢數(shù)和時間基于方程式求解的攻擊:竊取參數(shù)攻擊算法參數(shù)個數(shù)為d通過d+1個輸入,構造d+1個下列方程

主要特點:針對傳統(tǒng)機器學習模型:SVM、LR、DT可精確求解,需要模型返回精確的置信度竊取得到的模型還可能泄露訓練數(shù)據(jù)(數(shù)據(jù)逆向攻擊)Tramèr,Florian,etal."Stealingmachinelearningmodelsviaprediction{APIs}."

USENIXSecurity,2016.基于方程式求解的攻擊:竊取超參Wang,Binghui,andNeilZhenqiangGong."Stealinghyperparametersinmachinelearning."

S&P,2018.攻擊思想:模型訓練完了的狀態(tài)應該是Loss梯度為0

基于替代模型的攻擊Orekondy

et

al."Knockoffnets:Stealingfunctionalityofblack-boxmodels."

CVPR,2019.攻擊思想:在查詢目標模型的過程中訓練一個替代模型模擬其行為基于替代模型的攻擊Orekondy

et

al."Knockoffnets:Stealingfunctionalityofblack-boxmodels."

CVPR,2019.Knockoff

Nets攻擊:“仿冒網(wǎng)絡”基于替代模型的攻擊Knockoff

Nets攻擊:攻擊流程采樣大量查詢樣本訓練替代模型強化學習,學習如何高效選擇樣本Orekondy

et

al."Knockoffnets:Stealingfunctionalityofblack-boxmodels."

CVPR,2019.基于替代模型的攻擊Jagielski,Matthew,etal.“Highaccuracyandhighfidelityextractionofneuralnetworks.”

USENIXSecurity,2020.高準確(accuracy)vs高保真(fidelity)竊取攻擊藍色:目標決策邊界橙色:高準確竊取綠色:高保真竊取基于替代模型的攻擊Jagielski,Matthew,etal.“Highaccuracyandhighfidelityextractionofneuralnetworks.”

USENIXSecurity,2020.高準確(accuracy)vs高保真(fidelity)竊取攻擊目標模型(黑盒)查詢圖片替代模型模型輸出作為標簽指導替代模型訓練

概率輸出類別輸出基于替代模型的攻擊Jagielski,Matthew,etal.“Highaccuracyandhighfidelityextractionofneuralnetworks.”

USENIXSecurity,2020.功能等同竊取FunctionallyEquivalentExtraction攻擊步驟:尋找在某個Neuron上,讓ReLU=0的關鍵點在關鍵點兩側探索邊界,確定對應權重只能竊取兩層網(wǎng)絡基于替代模型的攻擊Carlini

et

al."Cryptanalyticextractionofneuralnetworkmodels."

AnnualInternationalCryptologyConference,2020.加密分析竊取CryptanalyticExtraction思想:ReLU的二級導為0

&有限差分(finite

difference)ReLU=0基于替代模型的攻擊加密分析竊取CryptanalyticExtraction竊取0-deep神經(jīng)網(wǎng)絡:竊取1-deep神經(jīng)網(wǎng)絡:Carlini

et

al."Cryptanalyticextractionofneuralnetworkmodels."

AnnualInternationalCryptologyConference,2020.基于替代模型的攻擊Yuan,Xiaoyong,eta

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經(jīng)權益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論