課件-模式識(shí)別2013聚類分析_第1頁(yè)
課件-模式識(shí)別2013聚類分析_第2頁(yè)
課件-模式識(shí)別2013聚類分析_第3頁(yè)
課件-模式識(shí)別2013聚類分析_第4頁(yè)
課件-模式識(shí)別2013聚類分析_第5頁(yè)
已閱讀5頁(yè),還剩124頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

PatternDepartmentofIn ligentScienceandSchoolofAutomationCAOZHIChapter BasicProximityandClusteringClusteringSequentialClusteringk-Means,k-Me ds,Fuzzyk-MeansClusteringHierarchicalClusteringAlgorithmsSpectralClassificationClassificationandClusteringHardClustering:Eachpointbelongstoasingle

X{x1,x2,...,xNAnm-clusteringRofX,isdefinedasthepartitionofintomsets(clusters),C1,C2,…,Cm,so Ci,i1,2,...,m UCi CiCj,ij,i,j1,2,...,Clusteringtask

ProximityMeasure:similarordissimilar.

fiestheClusteringCriterion:Thisconsistsofacostfunctionorsometypeofrules.ClusteringAlgorithm:Thisconsistsofthesetofstepsfollowedtorevealthestructure,basedonthesimilaritymeasureandtheadoptedcriterion.ValidationoftheInterpretationoftheWhyWhyproximitymeasuresWhyWhyclusteringcriteriaBlueshark,sheep,Blueshark,sheep,cat,dogseagull,goldfish,frog,redmulletHowmammalsseagull,goldfish,frog,redmulletGoldGoldfish,redmullet,bluesharkSheep,sparrow,dog,cat,seagull,lizard,frog,Sheep,sparrow,dog,cat,seagull,lizard,frog,GoldGoldfish,redmullet,bluesharkSheep,sparrow,dog,Sheep,sparrow,dog,cat,seagull,lizard,viperWhyWhyvalidationoftheresults2or4WhyWhyclusteringalgorithmsNumberofpossibleLetQuestion:InhowmanywaystheNpointscanbeassignedintomgroups?

mS(N,m)

iS(15,3)2S(100,5)1068AwayConsideronlyasmallfractionofclusteringsofXandselecta“sensible”clusteringamongQuestion1:WhichfractionofclusteringsisQuestion2:What“sensible”Theanswerdependsonthespecificclusteringalgorithmandthespecificcriteriatobeadopted.Chapter BasicProximityandClusteringClusteringSequentialClusteringk-Means,k-Me ds,Fuzzyk-MeansClusteringHierarchicalClusteringAlgorithmsSpectralBetween–Dissimilaritymeasure(betweenvectorsofX)isafunctiond XXwiththefollowingd0: d0d(x,y),x,yd(x,x)d0,xd(x,y)d(y,x),x,yIfind(x,y)d0 andonly xd(x,z)d(x,y)d(y, x,y,z(triangulardiscalledametricdissimilaritySimilaritymeasure(betweenvectorsofX)isafunctions XXwiththefollowings0: s(x,y)s0,x,ys(x,x)s0,xs(x,y)s(y,x),x,yIfins(x,y)s0 andonly xs(x,y)s(y,z)[s(x,y)s(y,z)]s(x, x,y,zsiscalledametricsimilarityPROXIMITYPROXIMITYMEASURESBETWEENWeightedlpmetricldp(x,y)(w|xy|p)1/l Interestinginstancesareobtainedp=1(weightedManhattanp=2(weightedEuclideanp=∞(d(x,y)=max1ilwi|xi-yi|Other

X{x1,x2,...,xN}xi,xjXdd(xi,xj)(xixj (xixjV NN-(xx)(xiix1NNiMahalanobisSimilarityInner (x,y)xT(x(xx)'(yr(x,y)[(xx)'(xx)(yy)'(yy)]1/Tanimoto

yT

ll

sT(x,y)||x||2||y||2xTsT(,y)

d2(x,||x||||y1,Measures1,Measures2,Measuresmixed-valuedPROXIMITYFUNCTIONSBETWEENAVECTORANDASETLetX={x1,x2,…,xN}andCPROXIMITYFUNCTIONSBETWEENAVECTORANDASETAllpointsofCcontributetothedefinitionof(x,Maxproximity (x,C)maxyC(x,Minproximityps(x,C)minyC(x,Averageproximityps

(x,C)

(nCisthecardinalityofArepresentative(s)ofC,rC,contributestothedefinitionInthiscase:Typicalrepresentatives–Themeanmp

where isthecardinalityof1n C 1nd:adissimilarityd:adissimilaritymCC:d(mC,y)d(z,y),zC ThemedianmmedC:med(d(mmed,y)|yC)med(d(z,y)|yC),zCNOTE:Otherrepresentatives(e.g.,hyperplanes,hyperspheres)areusefulincertainapplications(e.g.,objectidentificationusingclusteringtechniques).PROXIMITYPROXIMITYFUNCTIONSBETWEENLetX={x1,…,xN},Di,DjXandni=|Di|,AllpointsofeachsetcontributetoMaxproximityfunction(measurebutnotmetric,onlyifisasimilaritymeasure) (D,D) (x, Minproximityfunction(measurebutnotmetric,onlyifisadissimilaritymeasure) (D,D) (x, jxDAverageproximityfunction(notameasure,evenjxD1ss(D,D)1

(x,n n

i EachsetDiisrepresentedbyitsrepresentativevectorMeanproximityfunction(itisameasureprovidedthatisa (D,D)(m,m ss(D,D)

(m,m

nin

NOTE:ProximityfunctionsbetweenavectorxandasetCmaybederivedfromtheabovefunctionsifwesetDi={x}.Differentchoicesofproximityfunctionsbetweensetsmayleadtototallydifferentclusteringresults.Differentproximitymeasuresbetweenvectorsinthesameproximityfunctionbetweensetsmayleadtototallydifferentclusteringresults.Theonlywaytoachieveaproperclusteringbytrialanderrortakingintoaccounttheopinionofanexpertinthefieldofapplication.ClusteringClusteringIntra-classdistanceis(asmallwithin-classInter-classdistanceis(alargebetween-class設(shè)有待分類的模式

x1x2,xN在某種相似性測(cè)cc基礎(chǔ)上被劃分 類, 類內(nèi)距離準(zhǔn)則函數(shù)JW定義為

mj

均值矢量。JJWj1 cnjx(j 2ij類內(nèi)距離準(zhǔn)則JWWcdn cdJWW= 2

xd (jxd

(jnj(nj1)x(j (j (j

(j ,式中,

表示

nj(n2

d2d

n NJ

(m1m2)(m1m2JJWBcnN(mj m)(mjm)Jn1NJBJB2和 j(jj(jmjmji(j(jnj1n(j) j)式中為 的模式均值矢 i1xjm(j i1xjm(jnn證明

STSWSBim)(i N1NSTjN(j m)(xi(jn(xi1 cn1(xn(j(jTim)(im)Nj jcnn(j)(j)j1Nnjimjim)(mm)(m jjj S S Tr[SW]min

Tr[SB]。利用線性代數(shù)有關(guān)JTr[S1SJTr[S1S1WB S2WB Tr[S1S3WT S4WTChapter BasicProximityandClusteringClusteringSequentialClusteringk-Means,k-Me ds,Fuzzyk-MeansClusteringHierarchicalClusteringAlgorithmsSpectralChapter BasicProximityandClusteringClusteringSequentialClusteringk-Means,k-Me ds,Fuzzyk-MeansClusteringHierarchicalClusteringAlgorithmsSpectralSEQUENTIALCLUSTERINGThecommontraitssharedbythesealgorithms–Oneorvery ssesonthedataare–Thenumberofclustersisnotknowna-priori,except(possibly)anupperbound,q.–TheclustersaredefinedwiththeaidAnappropria ydefineddistanced(x,C)ofapointfromacluster.AthresholdΘassociatedwiththeBasicSequentialClusteringAlgorithmm=1\{numberofFori=2toFindCk:d(xi,If(d(xi,Ck)>Θ)AND(m<q)oCk=CkWherenecessary,updateEndEnd(*)WhenthemeanvectormCisusedasrepresentativeoftheclusterCwithelements,theupdatinginthelightofanewvector mCnew=(nCmC+x)/ 1110

BABSYZYT=X 67 67 X 6798X YYYY9 9 876 X9 876Z1T=2 XTheorderofpresentationofthedatainthealgorithmplaysimportantroleintheclusteringresults.Differentorderofpresentationmayleadtototallydifferentclusteringresults,intermsofthenumberofclustersaswellastheclustersInBSASthedecisionforavectorxisreachedpriortothefinalclusterformation.BSASperformasinglepassonthedata.ItscomplexityisO(N).Ifclustersarerepresentedbypointrepresentatives,compactclustersarefavored.EstimatingthenumberofclustersinthedataLetBSAS(Θ)denotetheBSASalgorithmwhenthedissimilaritythresholdisΘ.ForΘ=atobstepRunstimesBSAS(Θ),eachtimepresentingthedatainadifferentorder.EstimatethenumberofclustersmΘ,asthemostfrequentnumberresultingfromthesrunsofBSAS(Θ).NextPlotmΘversusΘandidentifythenumberofclustersmastheonecorrespondingtothewidestflatregionintheabovegraph.NumberofNumberof5--55 -

30MBSAS,amodificationofPhasem=1\{numberofFori=2toFindCk:d(xi,If(d(xi,Ck)>Θ)AND(m<q)oEndEndMBSAS,amodificationofPhaseFori=1toIfxiisn’tclassifiedtoanyFindCk:d(xi,Ck=CkWherenecessary,update-EndEndMBSAS,amodificationofMBSASconsistsAclusterdeterminationphase(firstpassonthewhichisthesameasBSASwiththeexceptionthatnovectorisassignedto readyformedcluster. ofthisphase,eachclusterconsistsofasingleelement.Apatternclassificationphase(secondpassonthewhereeachoneoftheunassignedvectorisassignedtoitsclosestcluster.Chapter BasicProximityandClusteringClusteringSequentialClusteringk-Means,k-Me ds,Fuzzyk-MeansClusteringHierarchicalClusteringAlgorithmsSpectralChapter BasicProximityandClusteringClusteringSequentialClusteringk-Means,k-Me ds,Fuzzyk-MeansClusteringHierarchicalClusteringAlgorithmsSpectral三、三、k對(duì)于需要進(jìn)行設(shè)有N個(gè)樣本需要分到K個(gè)聚類中,K均值算JJk1 ||xmkn其 表示樣

被歸類到第k類的時(shí)候?yàn)?,否則為mknrmknrnkn固

J

求導(dǎo)并令導(dǎo)數(shù)等于零,得到

最小三、三、kZ(0),Z(0),Z(0),...Z(0),t12kmin[d(t)ji1,2,...N,X(til:任選k:

對(duì)待分類的模式特征矢量集{Xi}則分劃給k類中的某一類,即 Z(tj1n(tXijjx( ZZ(t1)Z(t)jjj(4)否則,t=t+1,Goto,則結(jié)束樣本序特征0101212367特征001112226686789789896777788899第一步:令k=2,選初始聚類中心 (1)

(0,0)T;

(1)

(1,0)X x20

2 x72 x3

X x 第二步x1Z1(1)

0

01 Z21

()-( 因?yàn)閤1 x1Z2

Z1x2x21Z(1)101(0x2Z

(1)

(1 21x2Z1

2x2Z(1)2

所以x2Z21x3 Z1x3

=1

x3Z2

2 x3Z1x4

=2

x4Z2

x4Z2

x5、x6...x20與第二個(gè)聚類中1212

x5x6...x20都屬

Z2G1(1)x1x3G2(1)(x2,x4x

,...x20

2, X

x20 Z Z

ZZ1x3

543 x7

x G1(2)(x1,x2,...,x8),N1G2(2)(x9,x10,...,x20),N212第三步:更新聚類中1i1Z(3) x1(xxx...x1i1N1

Z(3) x1(x N2 N2

XZZ2

x20

6543Z2Z211x3

x7

0x

第四步:因Zj(3Zj(2j12,轉(zhuǎn)第二第二重新計(jì)算

x2,...,x20到Z1(3),Z2(3)的距離,分別把x1x2x20歸于最近的那個(gè)聚類中心,重新分為二類G1(4)(x1,x2x8G2(4)(x9,x10,...,x20),N18,N212第三步:更新聚類中(1.25,1.13T(7.67,7.33T三、k均值聚類算算算法收斂性證三、k均值聚類算初始初始化狀態(tài)三、k均值聚類算迭代一次三、k均值聚類算再次迭三、k均值聚類算迭代結(jié)三、k均值聚類算初始初始狀三、k均值聚類算收斂收斂狀態(tài),不三、k均值聚類 算法改進(jìn)的出發(fā) 將模式隨機(jī)分成kk用相距最遠(yuǎn)的k個(gè)特征點(diǎn)作為 算法的性能與k及聚類中心初始值、(3)k(a)按先驗(yàn)知識(shí)確定kJkChapter BasicProximityandClusteringClusteringSequentialClusteringk-Means,k-Me ds,Fuzzyk-MeansClusteringHierarchicalClusteringAlgorithmsSpectralk均值聚類算法的類心是:knrnknKK中值聚類算法的類心是從當(dāng)前類中選取這樣——它到該類其他兩兩者的差異K中值K中值聚類算法的準(zhǔn)“中值”,令表示所有類的“中值”集合,定義 待聚類的樣本集X中包含點(diǎn)的集合,定義IX為不包含Jrd(,xixiIXrr若d(xi,xj)min d(xi,xqiChapter BasicProximityandClusteringClusteringSequentialClusteringk-Means,k-Me ds,Fuzzyk-MeansClusteringHierarchicalClusteringAlgorithmsSpectral五、模糊C-均值算模糊}模糊C-均值算法(FCM}將N個(gè)n

X{xX

,,

分成C矩陣U(uij)NC 表示

表示樣

xi

類的程度,它((a):uij(b):0uijN(c):uijjCJJ(u,Z)m j um2 ,為1m 2izj)'V(xizjzZ{z1,z2,,推五、模糊C-均值算(3)FCM確定類別數(shù)C,參數(shù)m,矩陣V和適當(dāng)小

U(0,U(U(k(k計(jì) 時(shí)的{z

kU(k)為U(k1)i=1至Ii{jIi{j|dij Ii{1,2,c}計(jì)

的新隸屬度 2uij

ij)m1

0,jI,

,Ii如,Ii

k

否則

z(kz(k)jNux) umm Nj

||U(k)U(k1)||

Chapter BasicProximityandClusteringClusteringSequentialClusteringk-Means,k-Me ds,Fuzzyk-MeansClusteringHierarchicalClusteringAlgorithmsSpectral應(yīng)用背景1,某次國(guó)際會(huì)議收到上千篇投稿,需要將每個(gè)研究方向的分發(fā)給對(duì)應(yīng)方向的審稿,人工2,需要將不同類別的放在一類中,人工如六、層次聚類算計(jì)算機(jī)如何“讀的信息和詞的語(yǔ)義是聯(lián)系 計(jì)算機(jī)如何“讀”文分詞中國(guó)航 應(yīng)邀 與太 開計(jì)算機(jī)如何“讀”文分詞

中國(guó)/航天 /應(yīng)邀/到 /與/太空 /開二義發(fā)展--中--國(guó)家,發(fā)展—中國(guó)——大學(xué)城—書店 大學(xué)—城—書

最長(zhǎng)匹計(jì)算機(jī)如何“讀”文分詞

統(tǒng)計(jì)語(yǔ)言模1,A2,B1,B2,B3,...BmC1,C2,C3

P(A1,A2,A3,...Ak)P(B1,B2,B3,...BmP(A1,A2,A3,...Ak)P(C1,C2, 1,A2,計(jì)算機(jī)如何“讀”文分詞

技術(shù)已經(jīng)成中文分詞技術(shù)顆粒局限顆粒此地安能居住,其人好

計(jì)算機(jī)如何“讀”文分詞—詞的重要性度“原子能/的/包含這三個(gè)詞多的網(wǎng)頁(yè)比包含他們少的網(wǎng)頁(yè)更 詞頻TF:詞的出現(xiàn)次數(shù)/網(wǎng)頁(yè)的總字?jǐn)?shù)N個(gè)詞:TF1+TF2+ +計(jì)算機(jī)如何“讀”文分詞—詞的重要性度“原子能/的/直接計(jì)算詞頻 1:非實(shí)詞的貢獻(xiàn) 給詞頻TF賦予權(quán)重N個(gè)詞:TF1*IDF1+TF2*IDF2+ +計(jì)算機(jī)如何“讀”文分詞—詞的重要性度量—文檔的數(shù)字化統(tǒng)計(jì)的詞匯計(jì)算文檔每個(gè)詞的TF-IDF值,并計(jì)算機(jī)如何“讀”文分詞—詞的重要性度量—文檔的數(shù)字化表 相性度六、層次聚類通用合并方法(GeneralizedAgglomerativeScheme,初始選擇初始聚類為0={{x1},…,{x令重復(fù)執(zhí)行以下g(C,C)rg(Cr,Cs), gisag(C,C)rg(Cr,Cs), gisa ijrg(C,C),rsgisa 直到所有的向量全被加入到單一聚類中六、層次聚類N(NN(Nt)(Nt2ttNC2N N2kk(N1)N(N6也就是說(shuō)合并算法的全部運(yùn)算量

N

的數(shù)量級(jí)上對(duì)通用合并方法中的一些變量給出如下:X{x1,x2,...xN待分類的樣本:X{x1,x2,...xNx[x,x,...x i

,這表明樣本向量具有l(wèi)維特(轉(zhuǎn)置 測(cè)度矩陣(相似或不相似)P(X)NxN矩陣,它的第(ij)xixjsxixj)或不相似度d(xi,xj)例:有樣本X={x1x2x3x4其中x1=[11]Tx2=[21]Tx3=[54]Tx4=[6x5=[6.5,D(X)7.400.181P(X)0151050P'(X)1101使用歐式距離時(shí)的不相似矩陣使用Tanimotog(g(C,C)dss(C,Cij j{{x1,x2},{x3},{x4},{x5}}{{x1,x2},{x3},{x4,x5}}{{x1,x2},{x3,x4,x5}}{{x1,x2,x3,x4,x5}}

x1 x2 x3 x4 1SimilaritySimilarity0

x 012DissimilarityDissimilarity456789 合并算法主要分為兩類以下討論中我基于矩陣?yán)碚摰乃惴∟個(gè)樣本集X組成的NN不相似矩陣在t級(jí)兩個(gè)聚CiCj合并成聚Cq時(shí),通過(guò)以下步驟從不相似Pt-1Pt:刪去對(duì)應(yīng)于CiCj類對(duì)應(yīng)的兩行和兩列d(Cq,Cs)=f(d(CI,Cs),d(Cj,Cs),d(Ci,C將其作為新增d(Cq,Cs)aid(Ci,Cs)ajd(Cj,CS)bd(Ci,Cj)c|d(Ci,Cs)d(Cj,Cs)當(dāng)a,b,c取不同值時(shí)可得到以下不同算法單連接算(ai=1/2aj=1/2b=0c=-1/2).d(Cq,Cs)min{d(Ci,Cs),d(Cj,CS完全連接算(ai=1/2aj=1/2b=0c=1/2).d(Cq,Cs)max{d(Ci,Cs),d(Cj,CS矩陣更新算法(MatrixUpdatingAlgorithmScheme,初始選擇初始聚類為0={{x1},…,{x令重復(fù)執(zhí)行以下d(Cid(Ci,Cj)d(Cr,Cs選擇CiCj,使之合并Ci,Cj,得到聚類Cq,t=(t-1-{Ci,Cj{C根據(jù)上述步驟從不相似矩陣Pt-1得到直到形成聚類N-1即所有的向量全被加入到單一例d(Cq,Csmin{d(Ci,Csd(Cj,CS)},d(d(Cq,Cs)max{d(Ci,Cs),d(Cj,CS Chapter BasicProximityandClusteringClusteringSequentialClusteringk-Means,k-Me ds,Fuzzyk-MeansClusteringHierarchicalClusteringAlgorithmsSpectralc七、譜聚類算1、當(dāng)前受到廣泛關(guān)注的聚類算3、有很多相關(guān)理論支撐c七、譜聚類算 通俗的解釋:如果我們把一個(gè)東西看成是一些分就把這個(gè)向量v拉長(zhǎng)了c倍。對(duì)新的向量x,可分解為這些向量的組合,x=a1v1+a2v2+ 以分解:AxA(a1v1a2v2a1c1v1a2c2所以,矩陣的圖論的基本定義(只涉及與譜聚類算法相關(guān)概念接頂點(diǎn)vivj的邊用eij(vi,vj)表示。譜聚類的基本定義 令樣本 i創(chuàng)建一圖 i中的 ,且圖G是無(wú)向連通用權(quán)值W(i,j)為圖的每個(gè)邊賦權(quán)

,其中w用來(lái)度G中各頂點(diǎn)vi和vj之間的相似性。權(quán)值的集合定義N*N維的相似矩陣W,它帶有元素

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論