![XML代價(jià)估計(jì)方法研究綜述課件_第1頁(yè)](http://file4.renrendoc.com/view/0f1f774110a5ce9718c8fb0d6afe3637/0f1f774110a5ce9718c8fb0d6afe36371.gif)
![XML代價(jià)估計(jì)方法研究綜述課件_第2頁(yè)](http://file4.renrendoc.com/view/0f1f774110a5ce9718c8fb0d6afe3637/0f1f774110a5ce9718c8fb0d6afe36372.gif)
![XML代價(jià)估計(jì)方法研究綜述課件_第3頁(yè)](http://file4.renrendoc.com/view/0f1f774110a5ce9718c8fb0d6afe3637/0f1f774110a5ce9718c8fb0d6afe36373.gif)
![XML代價(jià)估計(jì)方法研究綜述課件_第4頁(yè)](http://file4.renrendoc.com/view/0f1f774110a5ce9718c8fb0d6afe3637/0f1f774110a5ce9718c8fb0d6afe36374.gif)
![XML代價(jià)估計(jì)方法研究綜述課件_第5頁(yè)](http://file4.renrendoc.com/view/0f1f774110a5ce9718c8fb0d6afe3637/0f1f774110a5ce9718c8fb0d6afe36375.gif)
版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
XML代價(jià)估計(jì)方法研究綜述
XMLGroup
XML代價(jià)估計(jì)方法研究綜述
1Outline研究代價(jià)估計(jì)的必要性xml中代價(jià)估計(jì)研究的難點(diǎn)背景知識(shí)介紹現(xiàn)有方法分類Outline研究代價(jià)估計(jì)的必要性2研究代價(jià)估計(jì)的必要性查詢優(yōu)化的基礎(chǔ)不同分支對(duì)查詢目標(biāo)的選擇率不同,如果選擇性低的分支先于選擇性高的分支執(zhí)行,就會(huì)有效的減少中間結(jié)果從而提供查詢效率結(jié)構(gòu)連接在XML中是一個(gè)很重要的操作,連接的順序的選擇需要計(jì)算不同連接順序的代價(jià)及早給用戶提供反饋信息研究代價(jià)估計(jì)的必要性查詢優(yōu)化的基礎(chǔ)3xml中代價(jià)估計(jì)研究的難點(diǎn)XML數(shù)據(jù)的不規(guī)則性是對(duì)傳統(tǒng)統(tǒng)計(jì)信息方法的重要挑戰(zhàn),具體表現(xiàn)在以下幾個(gè)方面:路徑依賴性,如同為name結(jié)點(diǎn),在person下和在city下出現(xiàn)意義就完全不同結(jié)構(gòu)相關(guān)性嵌套在不同祖先下面的同類結(jié)點(diǎn)的個(gè)數(shù)差別可能很大,如book結(jié)點(diǎn)下的author個(gè)數(shù)是不確定的值相關(guān)性//purchase/Item[type=‘book’][price>100]XML的有序性制約了轉(zhuǎn)換規(guī)則的靈活性所有這些問(wèn)題,都使得在xml中采用傳統(tǒng)的代價(jià)估計(jì)方法不切實(shí)際,會(huì)帶來(lái)很大的誤差。針對(duì)xml數(shù)據(jù)的特點(diǎn),我們應(yīng)該尋求一種新的代價(jià)估計(jì)方法。xml中代價(jià)估計(jì)研究的難點(diǎn)XML數(shù)據(jù)的不規(guī)則性是對(duì)傳統(tǒng)統(tǒng)計(jì)信4BackgroundKnowledgeXMLDataModelAlarge,node-labeledtreeT(V,E)d0a1a2a3p4p5n6t14k15y131999y16t17k18k19n7b9p8y20t21k22y24t25k26n10b12p9t23t26200220012000ExampleXMLDocumentTreeBackgroundKnowledgeXMLData5BackgroundKnowledgeXMLDataModelAlarge,node-labeledtreeT(V,E)d0a1a2a3p4p5n6t14k15y131999y16t17k18k19n7b9p8y20t21k22y24t25k26n10b12p9t23t26200220012000ExampleXMLDocumentBackgroundKnowledgeXMLData6BackgroundKnowledgeXMLQueryModelAnode-labeledquerytreeTQEachnodeofTQislabeledwithavariablenameqiinQEachedge(qi,qj)ofTQisannotatedwithanXPathexpressionpath(qi,qj)thatdescribesthespecificstructuralconstraintsspecifiedinQQueryFragment:for$bindoc(“bib.xml”)/bib/book$ain$b/author$tin$b/titlebibbookauthortitleroot$b$a$tTwigQueryModelBackgroundKnowledgeXMLQuery7現(xiàn)有估計(jì)方法的分類路徑選擇性計(jì)算Twig匹配個(gè)數(shù)統(tǒng)計(jì)值謂詞選擇率估計(jì)結(jié)構(gòu)連接代價(jià)估計(jì)現(xiàn)有估計(jì)方法的分類路徑選擇性計(jì)算8Overviewdatatreedifferentsummarizationmethodsbasedon:pathallm-lengthpathF/B-stabilityLore[McHughetal,VLDB99]pruningPathTree,MarkovTable[Aboulnagaetal,VLDB01]XSketch[Polyzotisetal,VLDB02]typeXSketches[Polyzotisetal,SIGMOD02]+value+twigXSketches[Polyzotisetal,SIGMOD04]RegionCode2DModel1DModelStatiX[Freireetal,SIGMOD02]PHHistogram[WuY]EDBT2002Interva&PositionModel[H.Lu]SIGMOD2003Overviewdatatreedifferentsum9路徑選擇性計(jì)算問(wèn)題描述/a/b[c/d//e][g//e/f]//*/*/e/f路徑選擇性計(jì)算問(wèn)題描述/a/b[c/d//e][g//10PathTree&MarkovTableAshrafAboulnaga,AlaaR.Alameldeen,andJeffryF.Naughton.EstimatingtheselectivityofXMLpathexpressionsforInternetscaleapplications.VLDB2001PathTree&MarkovTableAshra11PathTreeConstructionStartfromanoriginalpathtreeCountofnodesinabsolutepathsAggregatethetreetofitinthelimitedspaceSibling-*EstimationMatchthequeryagainstthepathtree,matching*asthelastresort.Needtodecidewhethertousetotalcountoraveragecount.PathTreeConstruction12AnXMLdocumentanditspathtree<A><B></B><B><D></D></B><C><D></D><E></E><E></E><E></E></C></A>A1B2C1D1D1E3Estimation:count(A/B/D)=1count(A/C/D)=1AnXMLdocumentanditspatht13AggregatePathTreeA1C9B13G10F15H6K12E5D7K11J4I2AggregatePathTreeA1C9B13G10F14AggregatePathTreeA1C9B13G10F15H6K12E5D7K11J4I2紅色結(jié)點(diǎn)表示那些不頻繁出現(xiàn)的結(jié)點(diǎn),需要?jiǎng)h除,從而保證path樹(shù)能夠放入內(nèi)存AggregatePathTreeA1C9B13G10F15Sibling-*SummarizationA1C9B13*F15*K*f=23n=2683把查詢和path樹(shù)匹配需要決定用總數(shù)還是平均數(shù)估計(jì)方法Estimation:count(//C/H/K)=11count(//C/*/K)=23Sibling-*SummarizationA1C9B1316MarkovTable構(gòu)造一個(gè)表,存儲(chǔ)長(zhǎng)度<=m的不同路徑出現(xiàn)的次數(shù),其中m>=2。當(dāng)查詢路徑長(zhǎng)度<=m時(shí),可以直接從表中讀出結(jié)果,否則,用公式計(jì)算。例如,m=3,查詢?yōu)?/A/B/C/D,則當(dāng)表不能裝入內(nèi)存的時(shí)候,進(jìn)行剪枝操作。f(B/C/D)f(B/C)f(A/B/C/D)≈f(A/B/C)MarkovTable構(gòu)造一個(gè)表,存儲(chǔ)長(zhǎng)度<=m的不同路徑17MarkovTable::Exampleabbcddeeefda1a/b2b2a/c1c1b/d2d3c/d1e3c/e3f1c/f1abdde12213cf11m=2b2c/e3d3*3/3e3c/*2/2a/b2*/*1/1b/d2suffix-*Q1:a/c/eMarkovTable::Exampleabbcdde18Twig匹配個(gè)數(shù)統(tǒng)計(jì)問(wèn)題描述TwigQuery(basicbuildingblockofdeclarativequerylanguagesforXML)FOR$fINdocument(“person.xml”)//department/facultyFOR$rin$f/RA,FOR$tin$f/TADepartmentFacultyRATA$f$r$tTwig匹配個(gè)數(shù)統(tǒng)計(jì)問(wèn)題描述FOR$fINdocum19XSketch方法
N.Polyzotis,M.Garofalakis.StatisticalSynopsesforGraph-StructuredXMLDatabases,InProc.ofACMSIGMODConf.2002N.PolyzotisandM.Garofalakis.Structureandvaluesynopsesforxmldatagraphs.InProceedingsof28thVLDBConf.,2002.
N.Polyzotis,M.Garofalakis,andYannisIosnnidis.SelectivityEstimationforXMLTwigs.InProc.ofthe20thIntl.Conf.onDataEngineering,2004N.PolyzotisandM.GarofalakisandY.Ioannidis.ApproximateXMLQueryAnswers.InProc.ACMSIGMOD2004XSketch方法N.Polyzotis,M.Garof20d0a1a2a3p4p5n6t14k15y131999y16t17k18k19n7b9p8y20t21k22y24t25k26n10b12p9t23t26200220012000D(1)A(3)N(3)P(4)B(2)Y(4)K(6)T(6)H(Y)B/FB/FB/FB/FBB/FFFXSketch方法d0a1a2a3p4p5n6t14k15y131999y1621B/Fstability
Definition:LetU,VbesetsofelementsinanXMLdataTreeT.WesaythatVisbackward-stable(B-stale)withrespecttoU,iffforeachv∈Vthereexistsau∈Usuchthatedge(u,v)isinT.Similarly,Uissaidtobeforward-stable(F-stable)withrespecttoV,iffforeachu∈Uthereexistsav∈Vsuchthattheedge(u,v)isinG.B/FstabilityDefinition:22D(1)A(3)N(3)P(4)B(2)Y(4)K(6)T(6)H(Y)B/FB/FB/FB/FBB/FFFXSketch方法HowtoestimatethecardinalityofxpathquerysuchasA/P/KorA/[p]?Utilizingedgestability,Wecangiveanaccuratematchnumber:Count(A/P/K)=6Count(A/[p])=3ButwhataboutA/P/Twhichdoesn’tconformtobackwardstability?2assumptions
Pathindependence
Backward-edgeuniformityCount(A/P/T)=f(A/P/T)*count(T)f(A/P/T)=f(A/P)*f(P/T|A/P)=f(P/T|A/P)=f(P/T)≈count(P)/(count(B)+count(P))D(1)A(3)N(3)P(4)B(2)Y(4)K(6)T(23XSketch方法(處理值謂詞)v1v3v5v2v4BBFFSTN(v5)Dep(v5)={v1,v4}v1v4Histogramatv5H(σ1,σ4)=count(v1{σ1}[v2]/v3[v4{σ4}]/v5)v3v5count(v1{σ1}[v2]/v3{σ3}[v4{σ4}]/v5)=H(σ1,σ4)*count(v3{σ3})/count(v3)A1[Edge-ValueIndependenceAcrossNon-SatbleEdges]f(u/v|v{σ})≈f(u/v)A2[Value-IndependenceOutsideCorrelationScope]XSketch方法(處理值謂詞)v1v3v5v2v4BBFF24XSkecth方法(處理twig)ProblemwiththeformermodelLetusseeasimpleexamplefort0inA,t1int0/B,t2int0/Cra1a2b1c1b2c2ra1a2b1c1b2c2100×10×10010100×10010×10RA(2)B(110)C(110)B/FB/FB/F(a)(b)(c)StructuralXSketch2000bindingtuplesonthefirstdocument10100tuplesontheseconddocumentXSkecth方法(處理twig)Problemwith25XSkecth方法(處理twig)Singe-pathXSketchmodeldosenotstoredetailedenoughinformationtocapturethecorrelationpatternsbetweenmultiplepathexpressions.IfnodeArecordsatwo-dimensionaldistributionfA(b,c)forthefractionofelementsinAthathavebchildreninBandcchildreninC.CBCCElementsfp10010a10.510100a20.52×(0.5×100×10+0.5×10×100)CBCCElementsfp100100a10.51010a20.52×(0.5×100×100+0.5×10×10)XSkecth方法(處理twig)Singe-pathXS26XSkecth方法(處理twig)CKCYElementsfp21p40.25P54×(0.25×2×1×10+0.25×1×1+0.5×1)=5CPCN21210.2521110.511p8,p9AnotherExampleEdge-distribution:fP(CY,CK,CP,CN)PYPKAPANXSkecth方法(處理twig)CKCYElementsf27Statix方法
FreireJ,JayantR,RamanathM,RoyPandSimeonJ.StatiX:MakingXMLCount.In:Proc.ofACMSIGMOD2002,USA.Statix方法FreireJ,JayantR,R28Statix方法Constructioneveryinstancenodeisassignedauniqueid(typeid+sequentialnodeid)duringparsingandvalidation,meanwhilegatheringstatisticsConstructingthehistogramsHistogramBuildhistogramforeachtypeMightcontainhistogramdescribingthedistributionoftheparentsofatypeValuehistogramcanbebuiltfortypesthataredefinedintermsofbasetypes(eg.integer)EstimationConvertthequeryintoSQL(i.e.,relationaljoinonIDs)UsestandardhistogrammultiplicationStatix方法Construction29StatiX::ExampleFOR$sindocument(“imdb.xml”)/show,
$rin$s/reviewWHERE$s/year<1992RETURN$s/title,$rSELECTtitle,reviewFROMShow,ReviewWHEREShow.year<1992AND
Show.ID=Review.ParIDSTATShow{CARD:5ID:1—6}STATShow_Year{VALUE:1990—2001BUCKET_NUM:2BUCKETS:{[1990,1994):3;[1994,2001):2}}STATReview{CARD:8ID:30—38
PARENTHISTOGRAMShow{BUCKET_NUM:3BUCKETS:{[1,2):4;[2,3):1;[3,5):3}}}
STATAka{CARD:6ID:6-12PARENTHISTOGRAMShow_Episode{BUCKET_NUM:3BUCKETS:{[1,2):1;[2,6):0;[6,12):7}}3×1/2×8/5=2.4StatiX::ExampleFOR$sindoc30SummaryQuery//*BranchValueCorrelationSchemaStatiXSimpleTwig+ValueNYYYNYPathTreeSimplePathNYNNNNMarkovTableSimplePathNYNNNNXSketchSimpleTwigNNYNNNXSketchesSimpleTwig+ValueNNYYY(mDmethods)NSummaryQuery//*BranchValueCorr31結(jié)構(gòu)連接代價(jià)估計(jì)問(wèn)題描述給定任意的祖先節(jié)點(diǎn)集合A和后代節(jié)點(diǎn)集合D,計(jì)算A//D結(jié)果集合的大小,估計(jì)匹配的祖先-后代的結(jié)果個(gè)數(shù)當(dāng)同一祖先有多個(gè)后代,或者同一后代有多個(gè)祖先時(shí),節(jié)點(diǎn)對(duì)個(gè)數(shù)可能遠(yuǎn)遠(yuǎn)大于祖先或后代的個(gè)數(shù)。應(yīng)用連接順序的選擇結(jié)構(gòu)連接代價(jià)估計(jì)問(wèn)題描述32結(jié)構(gòu)連接代價(jià)估計(jì)Existedwork2Dmodel:pHhistogram[WuY,PatelJandJagadishH.V.EstimatingAnswerSizesforXMLQueries.In:Proc.oftheEDBT2002.]IntervalandPositionalmodel[W.Wang,H.Jiang,H.Lu,andJ.F.Xu.ContainmentjoinSizeEstimation:ModelsandMethods.In:Proc.OfACMSIGMOD2003]結(jié)構(gòu)連接代價(jià)估計(jì)Existedwork33AbsoluteRegionCode(start,end)<contact><name>blah</name><phone><office>1234</office><home>5678</home><mobile>0000</mobile></phone></contact>contactnamephoneblahofficehomemobile123456780000(1,16)(2,4)(3,3)(5,15)(6,8)(7,7)(9,11)(10,10)(12,14)(13,13)234a.start<d.startandd.end<a.endAbsoluteRegionCode(start,e34pHHistogramForbiddenRegionsForanodeR1和R2是結(jié)點(diǎn)A的ForbiddenRegion兩個(gè)結(jié)點(diǎn)的(start,end)滿足:(1)nooverlap(2)nested不可能有部分重疊的(partiallyoverlap)情況pHHistogramForbiddenRegions35pHHistogramEstP[A]=HistP1[A]×{1/4×HistP2[A]+HistP2[B]+HistP2[C]+HistP2[E]+1/2(HistP2[D]+HistP2[F])}Ancestor-basedJoinEstimationEstP[A]=HistP2[A]×{1/4×HistP1[A]+HistP1[F]+HistP2[G]+HistP1[H]}Descendant-basedJoinEstimationpHHistogramEstP[A]=HistP1[A]×36pHHistogram355137102120300000000est=3*(12+1+2+0.25*5)=48.75AGIHstartendstartstartendendForoff-diagonalcells:pHforApHforDpHHistogram35513710212030000037Interval&PositionModelsMapintonewproblemsunder2newlyproposedmodels.IntervalmodelandPositionmodelHistogrambasedmethodthatisadaptivetothedata:PLhistogramAssumption:1DuniformofthedescendantsetEstimationisrobustwhencorrelationislowSamplingbasedmethodthatcapturesthecorrelation:IMSamplingandPMSamplingAssumption:HeightoftheXMLdatatreeisasmallconstantBothhavetheoreticalboundsontheaccuracyoftheestimationInterval&PositionModelsMap38Inte
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 中外合資生產(chǎn)合同范本(新能源)
- 互聯(lián)網(wǎng)接入服務(wù)合同:聯(lián)通電信合作
- 中海地產(chǎn)銷售合同范本
- 人事代理委托合同書及細(xì)則
- 合伙人合同協(xié)議書范本
- 個(gè)人獨(dú)資企業(yè)股權(quán)轉(zhuǎn)讓合同范本
- 交通事故賠償協(xié)商合同協(xié)議
- 個(gè)人土地轉(zhuǎn)讓合同書模板
- 專利維權(quán)訴訟代理合同書格式
- 個(gè)人與銀行質(zhì)押合同樣本
- 2024年江蘇省高考政治試卷(含答案逐題解析)
- 執(zhí)業(yè)醫(yī)師資格考試《臨床執(zhí)業(yè)醫(yī)師》 考前 押題試卷(一)絕密1
- 2024七年級(jí)數(shù)學(xué)上冊(cè)第六章幾何圖形初步綜合與實(shí)踐設(shè)計(jì)學(xué)校田徑運(yùn)動(dòng)會(huì)比賽場(chǎng)地課件新版新人教版
- 全國(guó)網(wǎng)約車出租車駕駛員公共題模擬考試題及答案
- 新人教版一年級(jí)數(shù)學(xué)下冊(cè)全冊(cè)教案(表格式)
- 簡(jiǎn)易三方換地協(xié)議書范本
- 2025屆廣東省深圳羅湖區(qū)四校聯(lián)考九上數(shù)學(xué)期末綜合測(cè)試試題含解析
- 飛鼠養(yǎng)殖技術(shù)指導(dǎo)
- 2024年襄陽(yáng)漢江檢測(cè)有限公司招聘筆試參考題庫(kù)附帶答案詳解
- 醫(yī)院檢驗(yàn)科安全風(fēng)險(xiǎn)評(píng)估報(bào)告表單
- 高一北師大版歷史必修一知識(shí)點(diǎn)總結(jié)9篇
評(píng)論
0/150
提交評(píng)論