多元統(tǒng)計(jì)分析課件_第1頁
多元統(tǒng)計(jì)分析課件_第2頁
多元統(tǒng)計(jì)分析課件_第3頁
多元統(tǒng)計(jì)分析課件_第4頁
多元統(tǒng)計(jì)分析課件_第5頁
已閱讀5頁,還剩71頁未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

Prefacetothe1stEditionMostoftheobservablephenomena[fi'n?min?]intheempirical([em'pirik?l]經(jīng)驗(yàn))sciencesareofamultivariatenature.Infinancialstudies,assetsinstockmarketsareobservedsimultaneouslyandtheirjointdevelopmentisanalyzedtobetterunderstandgeneraltendencies(趨勢)andtotrackindices(路燈).Theunderlyingtheoreticalstructureoftheseandmanyotherquantitativestudiesofappliedsciencesismultivariate.ThisbookonAppliedMultivariateStatisticalAnalysispresentsthetoolsandconceptsofmultivariatedataanalysiswithastrongfocusonapplications.Theaimofthebookistopresentmultivariatedataanalysisinawaythatisunderstandablefornon-mathematiciansandpractitionerswhoare(面對)bystatisticaldataanalysis.Thisisachievedbyfocusingonthepracticalrelevanceandthroughthee-bookcharacterofthistext.Allpracticalexamplesmayberecalculatedandmodifiedbythereaderusingastandardwebbrowserandwithoutreferenceorapplicationofanyspecificsoftware.Prefacetothe1stEditionMoMostoftheobservablephenomena[fi'n?min?]intheempirical([em'pirik?l]經(jīng)驗(yàn))sciencesareofamultivariatenature.Theunderlyingtheoreticalstructureoftheseandmanyotherquantitativestudiesofappliedsciencesismultivariate.ThisbookonAppliedMultivariateStatisticalAnalysispresentsthetoolsandconceptsofmultivariate[,m?lti've?reit]dataanalysiswithastrongfocusonapplications.MostoftheobservablephenomeThebookisdividedintothreemainparts.Thefirstpartisdevotedtographicaltechniquesdescribingthedistributionsofthevariablesinvolved.Thesecondpartdealswithmultivariaterandomvariablesandpresentsfromatheoreticalpointofviewdistributions,estimatorsandtestsforvariouspracticalsituations.

Thelastpartisonmultivariatetechniquesandintroducesthereadertothewideselectionoftoolsavailableformultivariatedataanalysis.Alldatasetsaregivenintheappendixandaredownloadablefrom.Thetextcontainsawidevarietyofexercisesthesolutionsofwhicharegiveninaseparatetextbook.Inadditionafullsetoftransparenciesonisprovidedmakingiteasierforaninstructortopresentthematerialsinthisbook.Alltransparenciescontainhyperlinkstothestatisticalwebservicesothatstudentsandinstructorsalikemayrecomputeallexamplesviaastandardwebbrowser.Thebookisdividedintothree1-2weekUNIT-IDescriptiveTechniques(描述技術(shù))1Comparison(對照)

ofBatches1.1Boxplots41.2Histograms101.3Scatterplots171.4DataSet-BostonHousing351-2weekUNIT-IDescriptive1ComparisonofBatchesMultivariatestatisticalanalysisisconcernedwithanalyzingandunderstandingdatainhighdimensions.Wesupposethatwearegivenaset{xi}ni=1ofnobservationsofavariablevectorXinRp.Thatis,wesupposethateachobservationxihaspdimensions:xi=(xi1,xi2,...,xip),andthatitisanobservedvalueofavariablevectorX∈Rp.Therefore,Xiscomposedofprandomvariables:X=(X1,X2,...,Xp)whereXj,forj=1,...,p,isaone-dimensionalrandomvariable.1ComparisonofBatches1ComparisonofBatchesMultivariatestatisticalanalysisisconcernedwithanalyzingandunderstandingdatainhighdimensions.

Howdowebegintoanalyzethiskindofdata?Beforeweinvestigatequestionsonwhatinferenceswecanreachfromthedata,weshouldthinkabouthowtolookatthedata.Thisinvolvesdescriptivetechniques.Questionsthatwecouldanswerbydescriptivetechniquesare:AretherecomponentsofXthataremorespreadoutthanothers?AretheresomeelementsofXthatindicatesubgroupsofthedata?ArethereoutliersinthecomponentsofX?How“normal”isthedistributionofthedata?1ComparisonofBatches多元統(tǒng)計(jì)分析課件1.1Boxplots1ComparisonofBatches1.1Boxplots1Comparis多元統(tǒng)計(jì)分析課件多元統(tǒng)計(jì)分析課件Genuine['d?enjuin]真正的GenuineX6X6X1X1Themedianandmeanbarsaremeasuresoflocations.Therelativelocationofthemedian(andthemean)intheboxisameasureofskewness.Thelengthoftheboxandwhiskersareameasureofspread.Thelengthofthewhiskersindicatethetaillengthofthedistribution.Theoutlyingpointsareindicatedwitha“★”or“●”dependingoniftheyareoutsideofFUL±1.5dForFUL±3dF

respectively.Theboxplotsdonotindicatemultimodalityorclusters.Ifwecomparetherelativesizeandlocationoftheboxes,wearecomparingdistributions.SummaryThemedianandmeanbarsaremReadingmaterial21.datacapacity數(shù)據(jù)容量[k?'p?siti]22.datahandling數(shù)據(jù)處理['h?ndli?]23.datareduction數(shù)據(jù)縮減分析[ri'd?k??n]24.datatransformation數(shù)據(jù)變換25.densityfunction密度函數(shù)26.description描述27.descriptive描述性的28.deviationfromaverage均值離差[,di:vi'ei??n]背離29.Df.Fit擬合差值30.df.(degreeoffreedom)自由度31.distributionshape分布形狀[?eip]32.doublelogarithmic雙對數(shù)[,l?:g?'riemik]33.eigenvector特征向量['aig?n,vekt?(r)]34.errorofestimate估計(jì)誤差['estimeit]35.estimation估計(jì)量[esti'mei??n]重音差別36.Euclideandistance歐式距離[ju:'klidi?n]37.expectedvalue期望值[iks'pektid]38.experimentalsampling實(shí)驗(yàn)抽樣[ik,speri'ment?l]['sɑ:mpli?]39.explanatoryvariable說明變量[ik'spl?n?t?ri]['v??ri?bl]40.exploreSummarize探索—摘要[ik'spl?:]['s?m?raiz]Readingmaterial21.datacapac1.2Histogramsh=0.4Diagonal1.2Histogramsh=0.4DiagonalHistogramsaredensity([‘dens?t?])(密度)estimates(['estimeits]概算).Adensityestimategivesagoodimpressionofthedistributionofthedata.Incontrasttoboxplots,densityestimatesshowpossiblemultimodality(多模式;綜合[,m?ltim?'d?liti])ofthedata.Theideaistolocallyrepresentthedatadensitybycountingthenumberofobservationsinasequenceofconsecutive(連續(xù)的)intervals(bins)(箱)withorigin([‘?r???n]起源、原點(diǎn))x0.LetBj(x0,h)denote([di'n?ut],指示,表示)thebinoflengthhwhichistheelementofabingridstartingatx0:

Bj(x0,h)=[x0+(j?1)h,x0+jh),j∈Z,where[.,.)(squarebrackets)denotesaleftclosedandrightopeninterval([‘?nt?rv?l]間隔

,右開區(qū)間).Histogramsaredensity([‘denIf{xi}n

i=1isani.i.d.samplewithdensityf,thehistogramisdefinedasfollows:Insum(1.7)thefirstindicatorfunctionI{xi∈Bj(x0,h)}countsthenumberofobservationsfallingintobinBj(x0,h).ThesecondindicatorfunctionIisresponsiblefor“l(fā)ocalizing”([l?uk?‘lizi?]局限)thecountsaroundx.Theparameterhisasmoothingorlocalizingparameterandcontrolsthewidth([widθ])ofthehistogrambins.Anhthatistoolargeleadstoverybigblocksandthustoaveryunstructuredhistogram.Ontheotherhand,anhthatistoosmallgivesaveryvariableestimatewithmanyunimportantpeaks.If{xi}ni=1isani.i.d.sampH=0.1H=0.2H=0.3Diagonal[dai'?g?nl]adj.對角線的,斜的

n.對角線,斜線H=0.4H=0.1H=0.2H=0.3DiagonalH=0.4TheeffectofhisgivenindetailinFigure1.6.Itcontainsthehistogram(upperleft)forthediagonalofthecounterfeitbanknotesforx0=137.8(theminimumoftheseobservations)andh=0.1.Increasinghtoh=0.2andusingthesameorigin,x0=137.8,resultsinthehistogramshowninthelowerleftofthefigure.Thisdensityhistogramissomewhatsmootherduetothelargerh.Thebinwidthisnextsettoh=0.3(upperright).Fromthishistogram,onehastheimpressionthatthedistributionofthediagonalisbimodalwithpeaksatabout138.5and139.9.Thedetectionofmodesrequiresafinetuningofthebinwidth.Usingmethodsfromsmoothingmethodology([meθ?‘d?l?d?i],n.方法學(xué))onecanfindan“optimal”binwidthhfornobservations:Theeffectofhisgivenindecounterfeit['kaunt?fit]adj.假冒的,假裝的counterfeit['kaunt?fit]InFigure1.7,weshowhistogramswithx0=137.65(upperleft),x0=137.75(lowerleft),withx0=137.85(upperright),andx0=137.95(lowerright).Allthegraphshavebeenscaledequallyonthey-axistoallowcomparison.Oneseesthat—despitethefixedbinwidthh—theinterpretationisnotfacilitated([f?'siliteitid]vt.使容易).Theshiftoftheoriginx0(to4differentlocations)created4differenthistograms.Thispropertyofhistogramsstronglycontradictsthegoalofpresentingdatafeatures.InFigure1.7,weshowhistogrModesofthedensityaredetectedwithahistogram.Modescorrespondtostrongpeaksinthehistogram.Histogramswiththesamehneednotbeidentical.Theyalsodependontheoriginx0ofthegrid.Theinfluenceoftheoriginx0isdrastic.Changingx0createsdifferentlookinghistograms.Theconsequenceofanhthatistoolargeisanunstructuredhistogramthatistooflat.Abinwidthhthatistoosmallresultsinanunstablehistogram.Thereisan“optimal”h=(24/n)1/3.Itisrecommendedtouseaveragedhistograms.Theyarekerneldensities.SummaryModesofthedensityaredetec1.4ScatterplotsScatterplotsarebivariateortrivariateplotsofvariables(['v??ri?bl])againsteachother.Theyhelpusunderstandrelationshipsamongthevariablesofadataset.Adownward-sloping([sl?upi?])scatterindicatesthatasweincreasethevariableonthehorizontalaxis,thevariableontheverticalaxisdecreases([‘di:kri:s]vt.減少).Ananalogous([?'n?l?g?s]adj.類似的)statementcanbemadeforupward-slopingscatters.

1.4ScatterplotsScatterplotsa多元統(tǒng)計(jì)分析課件Figure1.12plotsthe5thcolumn(upperinnerframe)ofthebankdataagainstthe6thcolumn(diagonal).Thescatterisdownward-sloping.Aswealreadyknowfromtheprevioussectiononmarginalcomparisonagoodseparationbetweengenuineandcounterfeitbanknotesisvisibleforthediagonalvariable.Thesub-cloudintheupperhalf(circles)ofFigure1.12correspondstothetruebanknotes.Asnotedbefore,thisseparationisnotdistinct(adj.清楚的、明顯),sincethetwogroupsoverlap([,?uv?'l?p]vt.重疊)somewhat.多元統(tǒng)計(jì)分析課件多元統(tǒng)計(jì)分析課件Draftman繪圖員DraftmanScatterplotsintwoandthreedimensionshelpsinidentifyingseparatedpoints,outliersorsub-clusters.Scatterplotshelpusinjudgingpositiveornegativedependencies.Draftmanscatterplotmatriceshelpdetectstructuresconditionedonvaluesofothervariables.Asthebrushofascatterplotmatrixmovesthroughapointcloud,wecanstudyconditionaldependence.SummaryScatterplotsintwoandthree1.8DataSetBostonHousingDataSet1.8DataSetBostonHousingDat多元統(tǒng)計(jì)分析課件Variable['v??ri?bl]adj.可變的,易變的,不定的n.變量,可變物Variable多元統(tǒng)計(jì)分析課件FirstStep:NewWords第一類高頻詞160個1.absolutedeviation絕對離差['?bs?lu:t][,di:vi'ei??n]2.absoluteresiduals絕對殘差['rezidju:l]3.amonggroups組間[gru:p]4.analysisofcorrelation相關(guān)分析[?'n?l?sis][,k?r?'lei??n]5.analysisofcovariance協(xié)方差分析[k?u'v??ri?ns]6.analysisofregression回歸分析[ri'gre??n]7.BayesianestimationBeyes估計(jì)[b'eis][esti'mei??n]8.bivariate雙變量的[bai'v?riit]9.bivariateCorrelate二變量相關(guān)10.boxplot箱線圖FirstStep:NewWords1.absolu11.canonicalcorrelation典型相關(guān)[k?'n?nik?l]12.categoricalvariable分類變量[,k?ti'g?rikl]['v??ri?bl]13.centraltendency集中趨勢['sentr?l]['tend?nsi]14.chancestatistics隨機(jī)統(tǒng)計(jì)量[t??ns;t?ɑ:ns][st?'tistiks]15.chancevariable隨機(jī)變量16.classifiedvariable分類變量['kl?sifaid]17.coefficientofskewness偏度系數(shù)[k?ui'fi??nt]['skju:nes]18.confidencelimit置信限['k?nfid?ns]['limit]19.cumulativeprobability累計(jì)概率['kju:mjul?tiv][,pr?b?'biliti]20.curvature曲率['k?:v?t??]

11.canonicalcorrelation典型相關(guān)[21.datacapacity數(shù)據(jù)容量22.datahandling數(shù)據(jù)處理23.datareduction數(shù)據(jù)縮減分析24.datatransformation數(shù)據(jù)變換25.densityfunction密度函數(shù)26.description描述27.descriptive描述性的28.deviationfromaverage離均差29.Df.Fit擬合差值30.df.(degreeoffreedom)自由度31.distributionshape分布形狀32.doublelogarithmic雙對數(shù)33.eigenvector特征向量34.errorofestimate估計(jì)誤差35.estimation估計(jì)量36.Euclideandistance歐式距離37.expectedvalue期望值38.experimentalsampling實(shí)驗(yàn)抽樣39.explanatoryvariable說明變量40.exploreSummarize探索—摘要21.datacapacity數(shù)據(jù)容量22.data41.extremevalue極值[iks'tri:m]['v?lju:]42.factorscore因子得分['f?kt?][sk?:]43.factorialdesigns因子設(shè)計(jì)[f?k't?:ri?l][di'zain]44.factorialexperiment因子實(shí)驗(yàn)[f?k't?:ri?l][iks'perim?nt]45.finitepopulation有限總體['fainait][,p?pju'lei??n]46.finite-sample有限樣本['s?mpl]47.F-testF檢驗(yàn)[test]48.function函數(shù)['f??k??n]49.functionrelationship函數(shù)關(guān)系['f??k??n][ri'lei??n?ip]50.gammadistribution伽馬分布['g?m?][,distri'bju:??n]51.geometricmean幾何均值[d?i?'metrik][mi:n]52.goodness-of-fit擬合優(yōu)度['gudnis][fit]53.groupaverages分組平均[gru:p]['?v?rid?]54.groupeddata分組資料['deit?]55.groupedmedian組中值['mi:di?n]41.extremevalue極值[iks'tri:m]56.hypothesis假設(shè)[hai'p?θisis]57.hypothesistest假設(shè)檢驗(yàn)[hai'p?θisis][test]58.hypotheticaluniverse假設(shè)總體['haip?u'θetik?l]['ju:niv?:s]59.impossibleevent不可能事件[im'p?s?bl][i'vent]60.independentsamples獨(dú)立樣本[,indi'pend?nt]['s?mpl]61.independentvariable自變量['v??ri?bl]62.infinitelygreat無窮大['infinitli][greit]63.interclasscorrelation組內(nèi)相關(guān)['int?'klɑ:s][,k?:ri'lei??n]64.inter-itemcorrelation樣本內(nèi)相關(guān)['ait?m][,k?:ri'lei??n]65.itemmeans樣本均值['ait?m][mi:n]56.hypothesis假設(shè)[hai'p?θisis]5Prefacetothe1stEditionMostoftheobservablephenomena[fi'n?min?]intheempirical([em'pirik?l]經(jīng)驗(yàn))sciencesareofamultivariatenature.Infinancialstudies,assetsinstockmarketsareobservedsimultaneouslyandtheirjointdevelopmentisanalyzedtobetterunderstandgeneraltendencies(趨勢)andtotrackindices(路燈).Theunderlyingtheoreticalstructureoftheseandmanyotherquantitativestudiesofappliedsciencesismultivariate.ThisbookonAppliedMultivariateStatisticalAnalysispresentsthetoolsandconceptsofmultivariatedataanalysiswithastrongfocusonapplications.Theaimofthebookistopresentmultivariatedataanalysisinawaythatisunderstandablefornon-mathematiciansandpractitionerswhoare(面對)bystatisticaldataanalysis.Thisisachievedbyfocusingonthepracticalrelevanceandthroughthee-bookcharacterofthistext.Allpracticalexamplesmayberecalculatedandmodifiedbythereaderusingastandardwebbrowserandwithoutreferenceorapplicationofanyspecificsoftware.Prefacetothe1stEditionMoMostoftheobservablephenomena[fi'n?min?]intheempirical([em'pirik?l]經(jīng)驗(yàn))sciencesareofamultivariatenature.Theunderlyingtheoreticalstructureoftheseandmanyotherquantitativestudiesofappliedsciencesismultivariate.ThisbookonAppliedMultivariateStatisticalAnalysispresentsthetoolsandconceptsofmultivariate[,m?lti've?reit]dataanalysiswithastrongfocusonapplications.MostoftheobservablephenomeThebookisdividedintothreemainparts.Thefirstpartisdevotedtographicaltechniquesdescribingthedistributionsofthevariablesinvolved.Thesecondpartdealswithmultivariaterandomvariablesandpresentsfromatheoreticalpointofviewdistributions,estimatorsandtestsforvariouspracticalsituations.

Thelastpartisonmultivariatetechniquesandintroducesthereadertothewideselectionoftoolsavailableformultivariatedataanalysis.Alldatasetsaregivenintheappendixandaredownloadablefrom.Thetextcontainsawidevarietyofexercisesthesolutionsofwhicharegiveninaseparatetextbook.Inadditionafullsetoftransparenciesonisprovidedmakingiteasierforaninstructortopresentthematerialsinthisbook.Alltransparenciescontainhyperlinkstothestatisticalwebservicesothatstudentsandinstructorsalikemayrecomputeallexamplesviaastandardwebbrowser.Thebookisdividedintothree1-2weekUNIT-IDescriptiveTechniques(描述技術(shù))1Comparison(對照)

ofBatches1.1Boxplots41.2Histograms101.3Scatterplots171.4DataSet-BostonHousing351-2weekUNIT-IDescriptive1ComparisonofBatchesMultivariatestatisticalanalysisisconcernedwithanalyzingandunderstandingdatainhighdimensions.Wesupposethatwearegivenaset{xi}ni=1ofnobservationsofavariablevectorXinRp.Thatis,wesupposethateachobservationxihaspdimensions:xi=(xi1,xi2,...,xip),andthatitisanobservedvalueofavariablevectorX∈Rp.Therefore,Xiscomposedofprandomvariables:X=(X1,X2,...,Xp)whereXj,forj=1,...,p,isaone-dimensionalrandomvariable.1ComparisonofBatches1ComparisonofBatchesMultivariatestatisticalanalysisisconcernedwithanalyzingandunderstandingdatainhighdimensions.

Howdowebegintoanalyzethiskindofdata?Beforeweinvestigatequestionsonwhatinferenceswecanreachfromthedata,weshouldthinkabouthowtolookatthedata.Thisinvolvesdescriptivetechniques.Questionsthatwecouldanswerbydescriptivetechniquesare:AretherecomponentsofXthataremorespreadoutthanothers?AretheresomeelementsofXthatindicatesubgroupsofthedata?ArethereoutliersinthecomponentsofX?How“normal”isthedistributionofthedata?1ComparisonofBatches多元統(tǒng)計(jì)分析課件1.1Boxplots1ComparisonofBatches1.1Boxplots1Comparis多元統(tǒng)計(jì)分析課件多元統(tǒng)計(jì)分析課件Genuine['d?enjuin]真正的GenuineX6X6X1X1Themedianandmeanbarsaremeasuresoflocations.Therelativelocationofthemedian(andthemean)intheboxisameasureofskewness.Thelengthoftheboxandwhiskersareameasureofspread.Thelengthofthewhiskersindicatethetaillengthofthedistribution.Theoutlyingpointsareindicatedwitha“★”or“●”dependingoniftheyareoutsideofFUL±1.5dForFUL±3dF

respectively.Theboxplotsdonotindicatemultimodalityorclusters.Ifwecomparetherelativesizeandlocationoftheboxes,wearecomparingdistributions.SummaryThemedianandmeanbarsaremReadingmaterial21.datacapacity數(shù)據(jù)容量[k?'p?siti]22.datahandling數(shù)據(jù)處理['h?ndli?]23.datareduction數(shù)據(jù)縮減分析[ri'd?k??n]24.datatransformation數(shù)據(jù)變換25.densityfunction密度函數(shù)26.description描述27.descriptive描述性的28.deviationfromaverage均值離差[,di:vi'ei??n]背離29.Df.Fit擬合差值30.df.(degreeoffreedom)自由度31.distributionshape分布形狀[?eip]32.doublelogarithmic雙對數(shù)[,l?:g?'riemik]33.eigenvector特征向量['aig?n,vekt?(r)]34.errorofestimate估計(jì)誤差['estimeit]35.estimation估計(jì)量[esti'mei??n]重音差別36.Euclideandistance歐式距離[ju:'klidi?n]37.expectedvalue期望值[iks'pektid]38.experimentalsampling實(shí)驗(yàn)抽樣[ik,speri'ment?l]['sɑ:mpli?]39.explanatoryvariable說明變量[ik'spl?n?t?ri]['v??ri?bl]40.exploreSummarize探索—摘要[ik'spl?:]['s?m?raiz]Readingmaterial21.datacapac1.2Histogramsh=0.4Diagonal1.2Histogramsh=0.4DiagonalHistogramsaredensity([‘dens?t?])(密度)estimates(['estimeits]概算).Adensityestimategivesagoodimpressionofthedistributionofthedata.Incontrasttoboxplots,densityestimatesshowpossiblemultimodality(多模式;綜合[,m?ltim?'d?liti])ofthedata.Theideaistolocallyrepresentthedatadensitybycountingthenumberofobservationsinasequenceofconsecutive(連續(xù)的)intervals(bins)(箱)withorigin([‘?r???n]起源、原點(diǎn))x0.LetBj(x0,h)denote([di'n?ut],指示,表示)thebinoflengthhwhichistheelementofabingridstartingatx0:

Bj(x0,h)=[x0+(j?1)h,x0+jh),j∈Z,where[.,.)(squarebrackets)denotesaleftclosedandrightopeninterval([‘?nt?rv?l]間隔

,右開區(qū)間).Histogramsaredensity([‘denIf{xi}n

i=1isani.i.d.samplewithdensityf,thehistogramisdefinedasfollows:Insum(1.7)thefirstindicatorfunctionI{xi∈Bj(x0,h)}countsthenumberofobservationsfallingintobinBj(x0,h).ThesecondindicatorfunctionIisresponsiblefor“l(fā)ocalizing”([l?uk?‘lizi?]局限)thecountsaroundx.Theparameterhisasmoothingorlocalizingparameterandcontrolsthewidth([widθ])ofthehistogrambins.Anhthatistoolargeleadstoverybigblocksandthustoaveryunstructuredhistogram.Ontheotherhand,anhthatistoosmallgivesaveryvariableestimatewithmanyunimportantpeaks.If{xi}ni=1isani.i.d.sampH=0.1H=0.2H=0.3Diagonal[dai'?g?nl]adj.對角線的,斜的

n.對角線,斜線H=0.4H=0.1H=0.2H=0.3DiagonalH=0.4TheeffectofhisgivenindetailinFigure1.6.Itcontainsthehistogram(upperleft)forthediagonalofthecounterfeitbanknotesforx0=137.8(theminimumoftheseobservations)andh=0.1.Increasinghtoh=0.2andusingthesameorigin,x0=137.8,resultsinthehistogramshowninthelowerleftofthefigure.Thisdensityhistogramissomewhatsmootherduetothelargerh.Thebinwidthisnextsettoh=0.3(upperright).Fromthishistogram,onehastheimpressionthatthedistributionofthediagonalisbimodalwithpeaksatabout138.5and139.9.Thedetectionofmodesrequiresafinetuningofthebinwidth.Usingmethodsfromsmoothingmethodology([meθ?‘d?l?d?i],n.方法學(xué))onecanfindan“optimal”binwidthhfornobservations:Theeffectofhisgivenindecounterfeit['kaunt?fit]adj.假冒的,假裝的counterfeit['kaunt?fit]InFigure1.7,weshowhistogramswithx0=137.65(upperleft),x0=137.75(lowerleft),withx0=137.85(upperright),andx0=137.95(lowerright).Allthegraphshavebeenscaledequallyonthey-axistoallowcomparison.Oneseesthat—despitethefixedbinwidthh—theinterpretationisnotfacilitated([f?'siliteitid]vt.使容易).Theshiftoftheoriginx0(to4differentlocations)created4differenthistograms.Thispropertyofhistogramsstronglycontradictsthegoalofpresentingdatafeatures.InFigure1.7,weshowhistogrModesofthedensityaredetectedwithahistogram.Modescorrespondtostrongpeaksinthehistogram.Histogramswiththesamehneednotbeidentical.Theyalsodependontheoriginx0ofthegrid.Theinfluenceoftheoriginx0isdrastic.Changingx0createsdifferentlookinghistograms.Theconsequenceofanhthatistoolargeisanunstructuredhistogramthatistooflat.Abinwidthhthatistoosmallresultsinanunstablehistogram.Thereisan“optimal”h=(24/n)1/3.Itisrecommendedtouseaveragedhistograms.Theyarekerneldensities.SummaryModesofthedensityaredetec1.4ScatterplotsScatterplotsarebivariateortrivariateplotsofvariables(['v??ri?bl])againsteachother.Theyhelpusunderstandrelationshipsamongthevariablesofadataset.Adownward-sloping([sl?upi?])scatterindicatesthatasweincreasethevariableonthehorizontalaxis,thevariableontheverticalaxisdecreases([‘di:kri:s]vt.減少).Ananalogous([?'n?l?g?s]adj.類似的)statementcanbemadeforupward-slopingscatters.

1.4ScatterplotsScatterplotsa多元統(tǒng)計(jì)分析課件Figure1.12plotsthe5thcolumn(upperinnerframe)ofthebankdataagainstthe6thcolumn(diagonal).Thescatterisdownward-sloping.Aswealreadyknowfromtheprevioussectiononmarginalcomparisonagoodseparationbetweengenuineandcounterfeitbanknotesisvisibleforthediagonalvariable.Thesub-cloudintheupperhalf(circles)ofFigure1.12correspondstothetruebanknotes.Asnotedbefore,thisseparationisnotdistinct(adj.清楚的、明顯),sincethetwogroupsoverlap([,?uv?'l?p]vt.重疊)somewhat.多元統(tǒng)計(jì)分析課件多元統(tǒng)計(jì)分析課件Draftman繪圖員DraftmanScatterplotsintwoandthreedimensionshelpsinidentifyingseparatedpoints,outliersorsub-clusters.Scatterplotshelpusinjudgingpositiveornegativedependencies.Draftmanscatterplotmatriceshelpdetectstructuresconditionedonvaluesofothervariables.Asthebrushofascatterplotmatrixmovesthroughapointcloud,wecanstudyconditionaldependence.SummaryScatterplotsintwoandthree1.8DataSetBostonHousingDataSet1.8DataSetBostonHousingDat多元統(tǒng)計(jì)分析課件Variable['v??r

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論