




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
XML代價(jià)估計(jì)方法研究綜述
XMLGroup
Outline研究代價(jià)估計(jì)的必要性xml中代價(jià)估計(jì)研究的難點(diǎn)背景知識(shí)介紹現(xiàn)有方法分類研究代價(jià)估計(jì)的必要性查詢優(yōu)化的基礎(chǔ)不同分支對(duì)查詢目標(biāo)的選擇率不同,如果選擇性低的分支先于選擇性高的分支執(zhí)行,就會(huì)有效的減少中間結(jié)果從而提供查詢效率結(jié)構(gòu)連接在XML中是一個(gè)很重要的操作,連接的順序的選擇需要計(jì)算不同連接順序的代價(jià)及早給用戶提供反饋信息xml中代價(jià)估計(jì)研究的難點(diǎn)XML數(shù)據(jù)的不規(guī)則性是對(duì)傳統(tǒng)統(tǒng)計(jì)信息方法的重要挑戰(zhàn),具體表現(xiàn)在以下幾個(gè)方面:路徑依賴性,如同為name結(jié)點(diǎn),在person下和在city下出現(xiàn)意義就完全不同結(jié)構(gòu)相關(guān)性嵌套在不同祖先下面的同類結(jié)點(diǎn)的個(gè)數(shù)差別可能很大,如book結(jié)點(diǎn)下的author個(gè)數(shù)是不確定的值相關(guān)性//purchase/Item[type=‘book’][price>100]XML的有序性制約了轉(zhuǎn)換規(guī)則的靈活性所有這些問題,都使得在xml中采用傳統(tǒng)的代價(jià)估計(jì)方法不切實(shí)際,會(huì)帶來很大的誤差。針對(duì)xml數(shù)據(jù)的特點(diǎn),我們應(yīng)該尋求一種新的代價(jià)估計(jì)方法。BackgroundKnowledgeXMLDataModelAlarge,node-labeledtreeT(V,E)d0a1a2a3p4p5n6t14k15y131999y16t17k18k19n7b9p8y20t21k22y24t25k26n10b12p9t23t26200220012000ExampleXMLDocumentTreeBackgroundKnowledgeXMLDataModelAlarge,node-labeledtreeT(V,E)d0a1a2a3p4p5n6t14k15y131999y16t17k18k19n7b9p8y20t21k22y24t25k26n10b12p9t23t26200220012000ExampleXMLDocumentBackgroundKnowledgeXMLQueryModelAnode-labeledquerytreeTQEachnodeofTQislabeledwithavariablenameqiinQEachedge(qi,qj)ofTQisannotatedwithanXPathexpressionpath(qi,qj)thatdescribesthespecificstructuralconstraintsspecifiedinQQueryFragment:for$bindoc(“bib.xml”)/bib/book$ain$b/author$tin$b/titlebibbookauthortitleroot$b$a$tTwigQueryModel現(xiàn)有估計(jì)方法的分類路徑選擇性計(jì)算Twig匹配個(gè)數(shù)統(tǒng)計(jì)值謂詞選擇率估計(jì)結(jié)構(gòu)連接代價(jià)估計(jì)Overviewdatatreedifferentsummarizationmethodsbasedon:pathallm-lengthpathF/B-stabilityLore[McHughetal,VLDB99]pruningPathTree,MarkovTable[Aboulnagaetal,VLDB01]XSketch[Polyzotisetal,VLDB02]typeXSketches[Polyzotisetal,SIGMOD02]+value+twigXSketches[Polyzotisetal,SIGMOD04]RegionCode2DModel1DModelStatiX[Freireetal,SIGMOD02]PHHistogram[WuY]EDBT2002Interva&PositionModel[H.Lu]SIGMOD2003路徑選擇性計(jì)算問題描述/a/b[c/d//e][g//e/f]//*/*/e/fPathTree&MarkovTableAshrafAboulnaga,AlaaR.Alameldeen,andJeffryF.Naughton.EstimatingtheselectivityofXMLpathexpressionsforInternetscaleapplications.VLDB2001PathTreeConstructionStartfromanoriginalpathtreeCountofnodesinabsolutepathsAggregatethetreetofitinthelimitedspaceSibling-*EstimationMatchthequeryagainstthepathtree,matching*asthelastresort.Needtodecidewhethertousetotalcountoraveragecount.AnXMLdocumentanditspathtree<A><B></B><B><D></D></B><C><D></D><E></E><E></E><E></E></C></A>A1B2C1D1D1E3Estimation:count(A/B/D)=1count(A/C/D)=1AggregatePathTreeA1C9B13G10F15H6K12E5D7K11J4I2AggregatePathTreeA1C9B13G10F15H6K12E5D7K11J4I2紅色結(jié)點(diǎn)表示那些不頻繁出現(xiàn)的結(jié)點(diǎn),需要?jiǎng)h除,從而保證path樹能夠放入內(nèi)存Sibling-*SummarizationA1C9B13*F15*K*f=23n=2683把查詢和path樹匹配需要決定用總數(shù)還是平均數(shù)估計(jì)方法Estimation:count(//C/H/K)=11count(//C/*/K)=23MarkovTable構(gòu)造一個(gè)表,存儲(chǔ)長(zhǎng)度<=m的不同路徑出現(xiàn)的次數(shù),其中m>=2。當(dāng)查詢路徑長(zhǎng)度<=m時(shí),可以直接從表中讀出結(jié)果,否則,用公式計(jì)算。例如,m=3,查詢?yōu)?/A/B/C/D,則當(dāng)表不能裝入內(nèi)存的時(shí)候,進(jìn)行剪枝操作。f(B/C/D)f(B/C)f(A/B/C/D)≈f(A/B/C)MarkovTable::Exampleabbcddeeefda1a/b2b2a/c1c1b/d2d3c/d1e3c/e3f1c/f1abdde12213cf11m=2b2c/e3d3*3/3e3c/*2/2a/b2*/*1/1b/d2suffix-*Q1:a/c/eTwig匹配個(gè)數(shù)統(tǒng)計(jì)問題描述TwigQuery(basicbuildingblockofdeclarativequerylanguagesforXML)FOR$fINdocument(“person.xml”)//department/facultyFOR$rin$f/RA,FOR$tin$f/TADepartmentFacultyRATA$f$r$tXSketch方法
N.Polyzotis,M.Garofalakis.StatisticalSynopsesforGraph-StructuredXMLDatabases,InProc.ofACMSIGMODConf.2002N.PolyzotisandM.Garofalakis.Structureandvaluesynopsesforxmldatagraphs.InProceedingsof28thVLDBConf.,2002.
N.Polyzotis,M.Garofalakis,andYannisIosnnidis.SelectivityEstimationforXMLTwigs.InProc.ofthe20thIntl.Conf.onDataEngineering,2004N.PolyzotisandM.GarofalakisandY.Ioannidis.ApproximateXMLQueryAnswers.InProc.ACMSIGMOD2004d0a1a2a3p4p5n6t14k15y131999y16t17k18k19n7b9p8y20t21k22y24t25k26n10b12p9t23t26200220012000D(1)A(3)N(3)P(4)B(2)Y(4)K(6)T(6)H(Y)B/FB/FB/FB/FBB/FFFXSketch方法B/Fstability
Definition:LetU,VbesetsofelementsinanXMLdataTreeT.WesaythatVisbackward-stable(B-stale)withrespecttoU,iffforeachv∈Vthereexistsau∈Usuchthatedge(u,v)isinT.Similarly,Uissaidtobeforward-stable(F-stable)withrespecttoV,iffforeachu∈Uthereexistsav∈Vsuchthattheedge(u,v)isinG.D(1)A(3)N(3)P(4)B(2)Y(4)K(6)T(6)H(Y)B/FB/FB/FB/FBB/FFFXSketch方法HowtoestimatethecardinalityofxpathquerysuchasA/P/KorA/[p]?Utilizingedgestability,Wecangiveanaccuratematchnumber:Count(A/P/K)=6Count(A/[p])=3ButwhataboutA/P/Twhichdoesn’tconformtobackwardstability?2assumptions
Pathindependence
Backward-edgeuniformityCount(A/P/T)=f(A/P/T)*count(T)f(A/P/T)=f(A/P)*f(P/T|A/P)=f(P/T|A/P)=f(P/T)≈count(P)/(count(B)+count(P))XSketch方法(處理值謂詞)v1v3v5v2v4BBFFSTN(v5)Dep(v5)={v1,v4}v1v4Histogramatv5H(σ1,σ4)=count(v1{σ1}[v2]/v3[v4{σ4}]/v5)v3v5count(v1{σ1}[v2]/v3{σ3}[v4{σ4}]/v5)=H(σ1,σ4)*count(v3{σ3})/count(v3)A1[Edge-ValueIndependenceAcrossNon-SatbleEdges]f(u/v|v{σ})≈f(u/v)A2[Value-IndependenceOutsideCorrelationScope]XSkecth方法(處理twig)ProblemwiththeformermodelLetusseeasimpleexamplefort0inA,t1int0/B,t2int0/Cra1a2b1c1b2c2ra1a2b1c1b2c2100×10×10010100×10010×10RA(2)B(110)C(110)B/FB/FB/F(a)(b)(c)StructuralXSketch2000bindingtuplesonthefirstdocument10100tuplesontheseconddocumentXSkecth方法(處理twig)Singe-pathXSketchmodeldosenotstoredetailedenoughinformationtocapturethecorrelationpatternsbetweenmultiplepathexpressions.IfnodeArecordsatwo-dimensionaldistributionfA(b,c)forthefractionofelementsinAthathavebchildreninBandcchildreninC.CBCCElementsfp10010a10.510100a20.52×(0.5×100×10+0.5×10×100)CBCCElementsfp100100a10.51010a20.52×(0.5×100×100+0.5×10×10)XSkecth方法(處理twig)CKCYElementsfp21p40.25P54×(0.25×2×1×10+0.25×1×1+0.5×1)=5CPCN21210.2521110.511p8,p9AnotherExampleEdge-distribution:fP(CY,CK,CP,CN)PYPKAPANStatix方法
FreireJ,JayantR,RamanathM,RoyPandSimeonJ.StatiX:MakingXMLCount.In:Proc.ofACMSIGMOD2002,USA.Statix方法Constructioneveryinstancenodeisassignedauniqueid(typeid+sequentialnodeid)duringparsingandvalidation,meanwhilegatheringstatisticsConstructingthehistogramsHistogramBuildhistogramforeachtypeMightcontainhistogramdescribingthedistributionoftheparentsofatypeValuehistogramcanbebuiltfortypesthataredefinedintermsofbasetypes(eg.integer)EstimationConvertthequeryintoSQL(i.e.,relationaljoinonIDs)UsestandardhistogrammultiplicationStatiX::ExampleFOR$sindocument(“imdb.xml”)/show,
$rin$s/reviewWHERE$s/year<1992RETURN$s/title,$rSELECTtitle,reviewFROMShow,ReviewWHEREShow.year<1992AND
Show.ID=Review.ParIDSTATShow{CARD:5ID:1—6}STATShow_Year{VALUE:1990—2001BUCKET_NUM:2BUCKETS:{[1990,1994):3;[1994,2001):2}}STATReview{CARD:8ID:30—38
PARENTHISTOGRAMShow{BUCKET_NUM:3BUCKETS:{[1,2):4;[2,3):1;[3,5):3}}}
STATAka{CARD:6ID:6-12PARENTHISTOGRAMShow_Episode{BUCKET_NUM:3BUCKETS:{[1,2):1;[2,6):0;[6,12):7}}3×1/2×8/5=2.4SummaryQuery//*BranchValueCorrelationSchemaStatiXSimpleTwig+ValueNYYYNYPathTreeSimplePathNYNNNNMarkovTableSimplePathNYNNNNXSketchSimpleTwigNNYNNNXSketchesSimpleTwig+ValueNNYYY(mDmethods)N結(jié)構(gòu)連接代價(jià)估計(jì)問題描述給定任意的祖先節(jié)點(diǎn)集合A和后代節(jié)點(diǎn)集合D,計(jì)算A//D結(jié)果集合的大小,估計(jì)匹配的祖先-后代的結(jié)果個(gè)數(shù)當(dāng)同一祖先有多個(gè)后代,或者同一后代有多個(gè)祖先時(shí),節(jié)點(diǎn)對(duì)個(gè)數(shù)可能遠(yuǎn)遠(yuǎn)大于祖先或后代的個(gè)數(shù)。應(yīng)用連接順序的選擇結(jié)構(gòu)連接代價(jià)估計(jì)Existedwork2Dmodel:pHhistogram[WuY,PatelJandJagadishH.V.EstimatingAnswerSizesforXMLQueries.In:Proc.oftheEDBT2002.]IntervalandPositionalmodel[W.Wang,H.Jiang,H.Lu,andJ.F.Xu.ContainmentjoinSizeEstimation:ModelsandMethods.In:Proc.OfACMSIGMOD2003]AbsoluteRegionCode(start,end)<contact><name>blah</name><phone><office>1234</office><home>5678</home><mobile>0000</mobile></phone></contact>contactnamephoneblahofficehomemobile123456780000(1,16)(2,4)(3,3)(5,15)(6,8)(7,7)(9,11)(10,10)(12,14)(13,13)234a.start<d.startandd.end<a.endpHHistogramForbiddenRegionsForanodeR1和R2是結(jié)點(diǎn)A的ForbiddenRegion兩個(gè)結(jié)點(diǎn)的(start,end)滿足:(1)nooverlap(2)nested不可能有部分重疊的(partiallyoverlap)情況pHHistogramEstP[A]=HistP1[A]×{1/4×HistP2[A]+HistP2[B]+HistP2[C]+HistP2[E]+1/2(HistP2[D]+HistP2[F])}Ancestor-basedJoinEstimationEstP[A]=HistP2[A]×{1/4×HistP1[A]+HistP1[F]+HistP2[G]+HistP1[H]}Descendant-basedJoinEstimationpHHistogram355137102120300000000est=3*(12+1+2+0.25*5)=48.75AGIHstartendstartstartendendForoff-diagonalcells:pHforApHforDInterval&PositionModelsMapintonewproblemsunder2newlyproposedmodels.IntervalmodelandPositionmodelHistogrambasedmethodthatisadaptivetothedata:PLhistogramAssumption:1DuniformofthedescendantsetEstimationisrobustwhencorrelationislowSamplingbasedmethodthatcapturesthecorrelation:IMSamplingandPMSamplingAssumption:HeightoftheXMLdatatreeisasmallco
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 許昌學(xué)院《食品包裝工藝學(xué)》2023-2024學(xué)年第二學(xué)期期末試卷
- 重慶資源與環(huán)境保護(hù)職業(yè)學(xué)院《企業(yè)價(jià)值評(píng)估》2023-2024學(xué)年第二學(xué)期期末試卷
- 廣東碧桂園職業(yè)學(xué)院《對(duì)比語言學(xué)》2023-2024學(xué)年第二學(xué)期期末試卷
- 天津理工大學(xué)《商務(wù)禮儀實(shí)訓(xùn)》2023-2024學(xué)年第二學(xué)期期末試卷
- 天津醫(yī)科大學(xué)臨床醫(yī)學(xué)院《無機(jī)非金屬材料生產(chǎn)設(shè)備》2023-2024學(xué)年第二學(xué)期期末試卷
- 湖南網(wǎng)絡(luò)工程職業(yè)學(xué)院《建筑工程計(jì)量學(xué)》2023-2024學(xué)年第二學(xué)期期末試卷
- 上海農(nóng)林職業(yè)技術(shù)學(xué)院《商務(wù)溝通方法與技能》2023-2024學(xué)年第二學(xué)期期末試卷
- 濱州學(xué)院《投資理財(cái)》2023-2024學(xué)年第二學(xué)期期末試卷
- 懷化師范高等??茖W(xué)?!吨袑W(xué)生物教育技術(shù)》2023-2024學(xué)年第二學(xué)期期末試卷
- 建設(shè)終止合同范本
- 《血透患教》課件
- app 購(gòu)買合同范例
- 高二上學(xué)期物理(理科)期末試題(含答案)
- 2024年房地產(chǎn)經(jīng)紀(jì)人《房地產(chǎn)經(jīng)紀(jì)專業(yè)基礎(chǔ)》考前沖刺必會(huì)試題庫(kù)300題(含詳解)
- 礦山生態(tài)修復(fù)工程不穩(wěn)定斜坡治理工程設(shè)計(jì)
- 躲避球運(yùn)動(dòng)用球項(xiàng)目評(píng)價(jià)分析報(bào)告
- 風(fēng)機(jī)盤管更換施工方案
- 河道整治與生態(tài)修復(fù)工程監(jiān)理規(guī)劃
- 建設(shè)工程招標(biāo)代理合同(GF-2005-0215)(標(biāo)準(zhǔn)版)
- 剪映專業(yè)版教學(xué)課件
- 公司新建電源及大用戶并網(wǎng)管理辦法
評(píng)論
0/150
提交評(píng)論