XML代價(jià)估計(jì)方法研究綜述課件_第1頁(yè)
XML代價(jià)估計(jì)方法研究綜述課件_第2頁(yè)
XML代價(jià)估計(jì)方法研究綜述課件_第3頁(yè)
XML代價(jià)估計(jì)方法研究綜述課件_第4頁(yè)
XML代價(jià)估計(jì)方法研究綜述課件_第5頁(yè)
已閱讀5頁(yè),還剩40頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

XML代價(jià)估計(jì)方法研究綜述

XMLGroup

XML代價(jià)估計(jì)方法研究綜述

1Outline研究代價(jià)估計(jì)的必要性xml中代價(jià)估計(jì)研究的難點(diǎn)背景知識(shí)介紹現(xiàn)有方法分類Outline研究代價(jià)估計(jì)的必要性2研究代價(jià)估計(jì)的必要性查詢優(yōu)化的基礎(chǔ)不同分支對(duì)查詢目標(biāo)的選擇率不同,如果選擇性低的分支先于選擇性高的分支執(zhí)行,就會(huì)有效的減少中間結(jié)果從而提供查詢效率結(jié)構(gòu)連接在XML中是一個(gè)很重要的操作,連接的順序的選擇需要計(jì)算不同連接順序的代價(jià)及早給用戶提供反饋信息研究代價(jià)估計(jì)的必要性查詢優(yōu)化的基礎(chǔ)3xml中代價(jià)估計(jì)研究的難點(diǎn)XML數(shù)據(jù)的不規(guī)則性是對(duì)傳統(tǒng)統(tǒng)計(jì)信息方法的重要挑戰(zhàn),具體表現(xiàn)在以下幾個(gè)方面:路徑依賴性,如同為name結(jié)點(diǎn),在person下和在city下出現(xiàn)意義就完全不同結(jié)構(gòu)相關(guān)性嵌套在不同祖先下面的同類結(jié)點(diǎn)的個(gè)數(shù)差別可能很大,如book結(jié)點(diǎn)下的author個(gè)數(shù)是不確定的值相關(guān)性//purchase/Item[type=‘book’][price>100]XML的有序性制約了轉(zhuǎn)換規(guī)則的靈活性所有這些問(wèn)題,都使得在xml中采用傳統(tǒng)的代價(jià)估計(jì)方法不切實(shí)際,會(huì)帶來(lái)很大的誤差。針對(duì)xml數(shù)據(jù)的特點(diǎn),我們應(yīng)該尋求一種新的代價(jià)估計(jì)方法。xml中代價(jià)估計(jì)研究的難點(diǎn)XML數(shù)據(jù)的不規(guī)則性是對(duì)傳統(tǒng)統(tǒng)計(jì)信4BackgroundKnowledgeXMLDataModelAlarge,node-labeledtreeT(V,E)d0a1a2a3p4p5n6t14k15y131999y16t17k18k19n7b9p8y20t21k22y24t25k26n10b12p9t23t26200220012000ExampleXMLDocumentTreeBackgroundKnowledgeXMLData5BackgroundKnowledgeXMLDataModelAlarge,node-labeledtreeT(V,E)d0a1a2a3p4p5n6t14k15y131999y16t17k18k19n7b9p8y20t21k22y24t25k26n10b12p9t23t26200220012000ExampleXMLDocumentBackgroundKnowledgeXMLData6BackgroundKnowledgeXMLQueryModelAnode-labeledquerytreeTQEachnodeofTQislabeledwithavariablenameqiinQEachedge(qi,qj)ofTQisannotatedwithanXPathexpressionpath(qi,qj)thatdescribesthespecificstructuralconstraintsspecifiedinQQueryFragment:for$bindoc(“bib.xml”)/bib/book$ain$b/author$tin$b/titlebibbookauthortitleroot$b$a$tTwigQueryModelBackgroundKnowledgeXMLQuery7現(xiàn)有估計(jì)方法的分類路徑選擇性計(jì)算Twig匹配個(gè)數(shù)統(tǒng)計(jì)值謂詞選擇率估計(jì)結(jié)構(gòu)連接代價(jià)估計(jì)現(xiàn)有估計(jì)方法的分類路徑選擇性計(jì)算8Overviewdatatreedifferentsummarizationmethodsbasedon:pathallm-lengthpathF/B-stabilityLore[McHughetal,VLDB99]pruningPathTree,MarkovTable[Aboulnagaetal,VLDB01]XSketch[Polyzotisetal,VLDB02]typeXSketches[Polyzotisetal,SIGMOD02]+value+twigXSketches[Polyzotisetal,SIGMOD04]RegionCode2DModel1DModelStatiX[Freireetal,SIGMOD02]PHHistogram[WuY]EDBT2002Interva&PositionModel[H.Lu]SIGMOD2003Overviewdatatreedifferentsum9路徑選擇性計(jì)算問(wèn)題描述/a/b[c/d//e][g//e/f]//*/*/e/f路徑選擇性計(jì)算問(wèn)題描述/a/b[c/d//e][g//10PathTree&MarkovTableAshrafAboulnaga,AlaaR.Alameldeen,andJeffryF.Naughton.EstimatingtheselectivityofXMLpathexpressionsforInternetscaleapplications.VLDB2001PathTree&MarkovTableAshra11PathTreeConstructionStartfromanoriginalpathtreeCountofnodesinabsolutepathsAggregatethetreetofitinthelimitedspaceSibling-*EstimationMatchthequeryagainstthepathtree,matching*asthelastresort.Needtodecidewhethertousetotalcountoraveragecount.PathTreeConstruction12AnXMLdocumentanditspathtree<A><B></B><B><D></D></B><C><D></D><E></E><E></E><E></E></C></A>A1B2C1D1D1E3Estimation:count(A/B/D)=1count(A/C/D)=1AnXMLdocumentanditspatht13AggregatePathTreeA1C9B13G10F15H6K12E5D7K11J4I2AggregatePathTreeA1C9B13G10F14AggregatePathTreeA1C9B13G10F15H6K12E5D7K11J4I2紅色結(jié)點(diǎn)表示那些不頻繁出現(xiàn)的結(jié)點(diǎn),需要?jiǎng)h除,從而保證path樹(shù)能夠放入內(nèi)存AggregatePathTreeA1C9B13G10F15Sibling-*SummarizationA1C9B13*F15*K*f=23n=2683把查詢和path樹(shù)匹配需要決定用總數(shù)還是平均數(shù)估計(jì)方法Estimation:count(//C/H/K)=11count(//C/*/K)=23Sibling-*SummarizationA1C9B1316MarkovTable構(gòu)造一個(gè)表,存儲(chǔ)長(zhǎng)度<=m的不同路徑出現(xiàn)的次數(shù),其中m>=2。當(dāng)查詢路徑長(zhǎng)度<=m時(shí),可以直接從表中讀出結(jié)果,否則,用公式計(jì)算。例如,m=3,查詢?yōu)?/A/B/C/D,則當(dāng)表不能裝入內(nèi)存的時(shí)候,進(jìn)行剪枝操作。f(B/C/D)f(B/C)f(A/B/C/D)≈f(A/B/C)MarkovTable構(gòu)造一個(gè)表,存儲(chǔ)長(zhǎng)度<=m的不同路徑17MarkovTable::Exampleabbcddeeefda1a/b2b2a/c1c1b/d2d3c/d1e3c/e3f1c/f1abdde12213cf11m=2b2c/e3d3*3/3e3c/*2/2a/b2*/*1/1b/d2suffix-*Q1:a/c/eMarkovTable::Exampleabbcdde18Twig匹配個(gè)數(shù)統(tǒng)計(jì)問(wèn)題描述TwigQuery(basicbuildingblockofdeclarativequerylanguagesforXML)FOR$fINdocument(“person.xml”)//department/facultyFOR$rin$f/RA,FOR$tin$f/TADepartmentFacultyRATA$f$r$tTwig匹配個(gè)數(shù)統(tǒng)計(jì)問(wèn)題描述FOR$fINdocum19XSketch方法

N.Polyzotis,M.Garofalakis.StatisticalSynopsesforGraph-StructuredXMLDatabases,InProc.ofACMSIGMODConf.2002N.PolyzotisandM.Garofalakis.Structureandvaluesynopsesforxmldatagraphs.InProceedingsof28thVLDBConf.,2002.

N.Polyzotis,M.Garofalakis,andYannisIosnnidis.SelectivityEstimationforXMLTwigs.InProc.ofthe20thIntl.Conf.onDataEngineering,2004N.PolyzotisandM.GarofalakisandY.Ioannidis.ApproximateXMLQueryAnswers.InProc.ACMSIGMOD2004XSketch方法N.Polyzotis,M.Garof20d0a1a2a3p4p5n6t14k15y131999y16t17k18k19n7b9p8y20t21k22y24t25k26n10b12p9t23t26200220012000D(1)A(3)N(3)P(4)B(2)Y(4)K(6)T(6)H(Y)B/FB/FB/FB/FBB/FFFXSketch方法d0a1a2a3p4p5n6t14k15y131999y1621B/Fstability

Definition:LetU,VbesetsofelementsinanXMLdataTreeT.WesaythatVisbackward-stable(B-stale)withrespecttoU,iffforeachv∈Vthereexistsau∈Usuchthatedge(u,v)isinT.Similarly,Uissaidtobeforward-stable(F-stable)withrespecttoV,iffforeachu∈Uthereexistsav∈Vsuchthattheedge(u,v)isinG.B/FstabilityDefinition:22D(1)A(3)N(3)P(4)B(2)Y(4)K(6)T(6)H(Y)B/FB/FB/FB/FBB/FFFXSketch方法HowtoestimatethecardinalityofxpathquerysuchasA/P/KorA/[p]?Utilizingedgestability,Wecangiveanaccuratematchnumber:Count(A/P/K)=6Count(A/[p])=3ButwhataboutA/P/Twhichdoesn’tconformtobackwardstability?2assumptions

Pathindependence

Backward-edgeuniformityCount(A/P/T)=f(A/P/T)*count(T)f(A/P/T)=f(A/P)*f(P/T|A/P)=f(P/T|A/P)=f(P/T)≈count(P)/(count(B)+count(P))D(1)A(3)N(3)P(4)B(2)Y(4)K(6)T(23XSketch方法(處理值謂詞)v1v3v5v2v4BBFFSTN(v5)Dep(v5)={v1,v4}v1v4Histogramatv5H(σ1,σ4)=count(v1{σ1}[v2]/v3[v4{σ4}]/v5)v3v5count(v1{σ1}[v2]/v3{σ3}[v4{σ4}]/v5)=H(σ1,σ4)*count(v3{σ3})/count(v3)A1[Edge-ValueIndependenceAcrossNon-SatbleEdges]f(u/v|v{σ})≈f(u/v)A2[Value-IndependenceOutsideCorrelationScope]XSketch方法(處理值謂詞)v1v3v5v2v4BBFF24XSkecth方法(處理twig)ProblemwiththeformermodelLetusseeasimpleexamplefort0inA,t1int0/B,t2int0/Cra1a2b1c1b2c2ra1a2b1c1b2c2100×10×10010100×10010×10RA(2)B(110)C(110)B/FB/FB/F(a)(b)(c)StructuralXSketch2000bindingtuplesonthefirstdocument10100tuplesontheseconddocumentXSkecth方法(處理twig)Problemwith25XSkecth方法(處理twig)Singe-pathXSketchmodeldosenotstoredetailedenoughinformationtocapturethecorrelationpatternsbetweenmultiplepathexpressions.IfnodeArecordsatwo-dimensionaldistributionfA(b,c)forthefractionofelementsinAthathavebchildreninBandcchildreninC.CBCCElementsfp10010a10.510100a20.52×(0.5×100×10+0.5×10×100)CBCCElementsfp100100a10.51010a20.52×(0.5×100×100+0.5×10×10)XSkecth方法(處理twig)Singe-pathXS26XSkecth方法(處理twig)CKCYElementsfp21p40.25P54×(0.25×2×1×10+0.25×1×1+0.5×1)=5CPCN21210.2521110.511p8,p9AnotherExampleEdge-distribution:fP(CY,CK,CP,CN)PYPKAPANXSkecth方法(處理twig)CKCYElementsf27Statix方法

FreireJ,JayantR,RamanathM,RoyPandSimeonJ.StatiX:MakingXMLCount.In:Proc.ofACMSIGMOD2002,USA.Statix方法FreireJ,JayantR,R28Statix方法Constructioneveryinstancenodeisassignedauniqueid(typeid+sequentialnodeid)duringparsingandvalidation,meanwhilegatheringstatisticsConstructingthehistogramsHistogramBuildhistogramforeachtypeMightcontainhistogramdescribingthedistributionoftheparentsofatypeValuehistogramcanbebuiltfortypesthataredefinedintermsofbasetypes(eg.integer)EstimationConvertthequeryintoSQL(i.e.,relationaljoinonIDs)UsestandardhistogrammultiplicationStatix方法Construction29StatiX::ExampleFOR$sindocument(“imdb.xml”)/show,

$rin$s/reviewWHERE$s/year<1992RETURN$s/title,$rSELECTtitle,reviewFROMShow,ReviewWHEREShow.year<1992AND

Show.ID=Review.ParIDSTATShow{CARD:5ID:1—6}STATShow_Year{VALUE:1990—2001BUCKET_NUM:2BUCKETS:{[1990,1994):3;[1994,2001):2}}STATReview{CARD:8ID:30—38

PARENTHISTOGRAMShow{BUCKET_NUM:3BUCKETS:{[1,2):4;[2,3):1;[3,5):3}}}

STATAka{CARD:6ID:6-12PARENTHISTOGRAMShow_Episode{BUCKET_NUM:3BUCKETS:{[1,2):1;[2,6):0;[6,12):7}}3×1/2×8/5=2.4StatiX::ExampleFOR$sindoc30SummaryQuery//*BranchValueCorrelationSchemaStatiXSimpleTwig+ValueNYYYNYPathTreeSimplePathNYNNNNMarkovTableSimplePathNYNNNNXSketchSimpleTwigNNYNNNXSketchesSimpleTwig+ValueNNYYY(mDmethods)NSummaryQuery//*BranchValueCorr31結(jié)構(gòu)連接代價(jià)估計(jì)問(wèn)題描述給定任意的祖先節(jié)點(diǎn)集合A和后代節(jié)點(diǎn)集合D,計(jì)算A//D結(jié)果集合的大小,估計(jì)匹配的祖先-后代的結(jié)果個(gè)數(shù)當(dāng)同一祖先有多個(gè)后代,或者同一后代有多個(gè)祖先時(shí),節(jié)點(diǎn)對(duì)個(gè)數(shù)可能遠(yuǎn)遠(yuǎn)大于祖先或后代的個(gè)數(shù)。應(yīng)用連接順序的選擇結(jié)構(gòu)連接代價(jià)估計(jì)問(wèn)題描述32結(jié)構(gòu)連接代價(jià)估計(jì)Existedwork2Dmodel:pHhistogram[WuY,PatelJandJagadishH.V.EstimatingAnswerSizesforXMLQueries.In:Proc.oftheEDBT2002.]IntervalandPositionalmodel[W.Wang,H.Jiang,H.Lu,andJ.F.Xu.ContainmentjoinSizeEstimation:ModelsandMethods.In:Proc.OfACMSIGMOD2003]結(jié)構(gòu)連接代價(jià)估計(jì)Existedwork33AbsoluteRegionCode(start,end)<contact><name>blah</name><phone><office>1234</office><home>5678</home><mobile>0000</mobile></phone></contact>contactnamephoneblahofficehomemobile123456780000(1,16)(2,4)(3,3)(5,15)(6,8)(7,7)(9,11)(10,10)(12,14)(13,13)234a.start<d.startandd.end<a.endAbsoluteRegionCode(start,e34pHHistogramForbiddenRegionsForanodeR1和R2是結(jié)點(diǎn)A的ForbiddenRegion兩個(gè)結(jié)點(diǎn)的(start,end)滿足:(1)nooverlap(2)nested不可能有部分重疊的(partiallyoverlap)情況pHHistogramForbiddenRegions35pHHistogramEstP[A]=HistP1[A]×{1/4×HistP2[A]+HistP2[B]+HistP2[C]+HistP2[E]+1/2(HistP2[D]+HistP2[F])}Ancestor-basedJoinEstimationEstP[A]=HistP2[A]×{1/4×HistP1[A]+HistP1[F]+HistP2[G]+HistP1[H]}Descendant-basedJoinEstimationpHHistogramEstP[A]=HistP1[A]×36pHHistogram355137102120300000000est=3*(12+1+2+0.25*5)=48.75AGIHstartendstartstartendendForoff-diagonalcells:pHforApHforDpHHistogram35513710212030000037Interval&PositionModelsMapintonewproblemsunder2newlyproposedmodels.IntervalmodelandPositionmodelHistogrambasedmethodthatisadaptivetothedata:PLhistogramAssumption:1DuniformofthedescendantsetEstimationisrobustwhencorrelationislowSamplingbasedmethodthatcapturesthecorrelation:IMSamplingandPMSamplingAssumption:HeightoftheXMLdatatreeisasmallconstantBothhavetheoreticalboundsontheaccuracyoftheestimationInterval&PositionModelsMap38Inte

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論