R語言線性回歸案例數據分析可視化報告(附代碼數據)_第1頁
R語言線性回歸案例數據分析可視化報告(附代碼數據)_第2頁
R語言線性回歸案例數據分析可視化報告(附代碼數據)_第3頁
R語言線性回歸案例數據分析可視化報告(附代碼數據)_第4頁
R語言線性回歸案例數據分析可視化報告(附代碼數據)_第5頁
已閱讀5頁,還剩21頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

【原創(chuàng)】【原創(chuàng)】R語言案例數據分析報告論文〔附代碼數據〕有問題到淘寶找“大數據部落”就可以了RR語言線性回歸案例數據分析可視化報告30個職業(yè)棒球大聯盟球隊的數據,并檢查一個賽季的得分哪個變量〔假設有的話〕可以幫助我們最好地推想一個賽季中球隊的得分狀況。數據load(load(“more/mlb11.RData“)at_bats繪制這種關系作為推想。關系看起來是線性的嗎?假設你知道一個團隊的at_batsat_bats,你會習慣使用線性模型來推想運行次數嗎?散點圖plotplot(mlb11$at_bat,mlb11$runs,xlab=“at_bat“,ylab=“runs“,main=“at_bat“,frame.plot=TRUE,col=“red“)abline(lm(mlb11$runs~mlb11$at_bat))..假設關系看起來是線性的,我們可以用相關系數來量化關系的強度。corcor(mlb11$runs,mlb11$at_bats)##[1]0.610627.殘差平方和.殘差平方和回想一下我們描述單個變量分布的方式。能夠描述兩個數值變量〔例如上面的runandat_bats〕的關系也是有用的。度以及任何不尋常的觀看。plot_ss(x=mlb11$at_bats,y=mlb11$runs)#Clicktwopointstomakealine.##Call:##lm(formula=y~x,data=pts)######Coefficients:(Intercept)x##-2789.24290.6305####SumofSquares:123721.9AfterrunningAfterrunningthiscommand,you’llbepromptedtoclicktwopointsontheplottodefinealine.Onceyou’vedonethat,thelineyouspecifiedwillbeshowninblackandtheresidualsinblue.Notethatthereare30residuals,oneforeachofthe30observations.Recallthattheresidualsarethedifferencebetweentheobservedvaluesandthevaluespredictedbytheline:ei=yiy^iei=yiy^iTheThemostcommonwaytodolinearregressionistoselectthelinethatminimizesthesumofsquaredresiduals.Tovisualizethesquaredresiduals,youcanreruntheplotcommandandaddaddtheargumentshowSquares=TRUE.plot_ss(x=mlb11$at_bats,y=mlb11$runs,showSquares=TRUE)##Clicktwopointstomakealine.##Call:####lm(formula=y~x,data=pts)######Coefficients:(Intercept)x##-2789.24290.6305####SumofSquares:123721.9NotethattheoutputNotethattheoutputfromtheplot_ssfunctionprovidesyouwiththeslopeandinterceptofyourlineaswellasthesumofsquares.3. Usingplot_ss,choosealinethatdoesagoodjobofminimizingthesumofsquares.Runthefunctionseveraltimes.Whatwasthesmallestsumofsquaresthatyougot?Howdoesitcomparetoyourneighbors?Answer:Thesmallestsumofsquaresis123721.9.Itexplainsthedispersionfrommean.ThelinearmodelItisrathercumbersometotrytogetthecorrectleastsquaresline,i.e.thelinethatminimizesthesumofsquaredresiduals,throughtrialanderror.InsteadwecanusethelmfunctioninRtofitthelinearmodel(a.k.a.regressionline).m1<-lm(runs~at_bats,data=mlb11)Thefirstargumentinthefunctionlmisaformulathattakestheformy~x.Hereitcanbereadthatthatwewanttomakealinearmodelofrunsasafunctionofat_bats.ThesecondargumentspecifiesspecifiesthatRshouldlookinthemlb11dataframetofindtherunsandat_batsvariables.TheTheoutputoflmisanobjectthatcontainsalloftheinformationweneedaboutthelinearmodelthatwasjustfit.Wecanaccessthisinformationusingthesummaryfunction.summary(m1)####Call:##lm(formula=runs~at_bats,data=mlb11)####Residuals:##Min1QMedian3QMax####-125.58-47.05-16.5954.40176.87##Coefficients:## EstimateStd.ErrortvaluePr(>|t|)##(Intercept)-2789.2429 853.6957-3.2670.002871**##at_bats 0.6305 0.1545 4.0800.000339***##---####Signif.codes:0”***”0.001”**”0.01”*”0.05”.”0.1””1####Residualstandarderror:66.47on28degreesoffreedom##MultipleR-squared:0.3729,AdjustedR-squared:0.3505##F-statistic:16.65on1and28DF,ue:0.0003388p-valLet’sconsiderLet’sconsiderthisoutputpiecebypiece.First,theformulausedtodescribethemodelisshownatthetop.Aftertheformulayoufindthefive-numbersummaryoftheresiduals.Thekey;itsfirstcolumndisplaysthelinearmodel’sy-interceptandandthecoefficientofat_bats.Withthistable,wecanwritedowntheleastsquaresregressionregressionlineforthelinearmodel:y^=2789.2429+0.6305?atbatsy^=?2789.2429+0.6305?atbatsOnelastpieceofinformationwewilldiscussfromthesummaryoutputistheMultipleR-squared,ormoresimply,R2R2.TheR2R2valuerepresentstheproportionofvariabilityintheresponsevariablethatisexplainedbytheexplanatoryvariable.Forthismodel,37.3%ofthevariabilityinrunsisexplainedbyat-bats.4. Fit4. Fitanewmodelthatuseshomerunstopredictruns.UsingtheestimatesfromtheRoutput,writeoutput,writetheequationoftheregressionline.Whatdoestheslopetellusinthecontextoftherelationshipbetweensuccessofateamanditshomeruns?Answer:homerunshaspositiverelationshipwithruns,whichmeans1homerunsincrease1.835timesruns.home.runs<-home.runs<-lm(runs~homeruns,data=mlb11)home.runs####Call:##lm(formula=runs~homeruns,data=mlb11)######Coefficients:(Intercept)homeruns##415.2391.835PredictionPredictionandpredictionerrorsLet’screateascatterplotwiththeleastsquareslinelaidontop.plot(mlb11$runs~mlb11$at_bats)abline(m1)Thefunctionablineplotsalinebasedonitsslopeandintercept.Here,weusedashortcutbyprovidingthemodelm1,whichcontainsbothparameterestimates.Thislinecanbeusedtopredictyyatanyvalueofxx.Whenpredictionsaremadeforvaluesofxxthatarebeyondtherangeoftheobserveddata,itisreferredtoasextrapolatioandisnotusuallyrecommended.However,predictionsmadewithintherangeofthedataaremorereliable.They’realsousedtocomputetheresiduals.5. Ifateammanagersawtheleastsquaresregressionlineandnottheactualdata,howmanyrunsmanyrunswouldheorshepredictforateamwith5,578at-bats?Isthisanoverestimateoranunderestimate,andbyhowmuch?Inotherwords,whatistheresidualforthisprediction?pred##[1]728.1323residual=residual=0.63058*(5578)residual##[1]3517.375ModelModeldiagnosticsToassesswhetherthelinearmodelisreliable,weneedtocheckfor(1)linearity,(2)nearlynormalresiduals,and(3)constantvariability.Linearity:Youalreadycheckediftherelationshipbetweenrunsandat-batsislinearusingascatterplot.Weshouldalsoverifythisconditionwithaplotoftheresidualsvs.at-bats.Recallthatanycodefollowinga#isintendedtobeacommentthathelpsunderstandthecodebutisignoredbyR.plot(m1$residuals~mlb11$at_bats)abline(h=0,lty=3)ashedlineaty=0#addsahorizontald6.6.Isthereanyapparentpatternintheresidualsplot?Whatdoesthisindicateaboutthelinearityoftherelationshipbetweenrunsandat-bats?Answer:Answer:theresidualshasnormallinearityoftherelationshipbetweenrunsansat-bats,whichmeanmeanis0.Nearlynormalresidua:Tocheckthiscondition,wecanlookatahistogramhist(m1$residuals)qqnormqqnorm(m1$residuals)qqline(m1$residuals)#addsdiagonallinetothenormalprobplot7.7.Basedonthehistogramandthenormalprobabilityplot,doesthenearlynormalresidualsconditionappeartobemet?Answer:Answer:Yes.It’snearlynormal.8. Based8. Basedontheplotin(1),doestheconstantvariabilityconditionappeartobemet?Answer:Yes,thepointsconstantlyaroundtheleastsquaresline.1. 1. Chooseanothertraditionalvariablefrommlb11thatyouthinkmightbeagoodpredictorpredictorofruns.Produceascatterplotofthetwovariablesandfitalinearmodel.Ataaglance,doesthereseemtobealinearrelationship?Answer:Answer:Yes,thescatterplotshowstheyhavealinearrelationship..1.1.Howdoesthisrelationshipcomparetotherelationshipbetweenrunsandat_bats?UsetheR22valuesfromthetwomodelsummariestocompare.Doesyourvariableseemtopredictrunsbetterthanat_bats?Howcanyoutell?plotplot(mlb11$hits,mlb11$runs,xlab=“hits“,ylab=“runs“,main=“hitsvsruns“,frame.plot=TRUE,col=“red“)abline(lm(mlb11$runs~mlb11$hits))m2m2<-lm(runs~hits,data=mlb11)summary(m2)####Call:##lm(formula=runs~hits,data=mlb11)####Residuals:##Min1QMedian3QMax##93-103.718-27.179-5.23319.322140.6####Coefficients:## EstimateStd.ErrortvaluePr(>|t|)##(Intercept)-375.5600 151.1806-2.4840.0192*##hits 0.7589 0.1071 7.0851.04e-07***##---##Signif.codes:0”***”0.001”**”0.01”*”0.05”.”0.1””1####Residualstandarderror:50.23on28degreesoffreedom####MultipleR-squared:0.6419,AdjustedR-squared:0.6292##F-statistic:ue:1.043e-0750.2on1and28DF,p-val1. Now1. Nowthatyoucansummarizethelinearrelationshipbetweentwovariables,investigatetherelationshipsbetweenrunsandeachoftheotherfivetraditionalvariables.Whichvariablevariablebestpredictsruns?Supportyourconclusionusingthegraphicalandnumericalmethodswe’vediscussed(forthesakeofconciseness,onlyincludeoutputforforthebestvariable,notallfive).Answer:TheAnswer:Thenew_obsisthebestpredictsrunssinceithassmallestStd.Error,whichthepointsareonorveryclosetotheline.par(mfrow=c(2,3))plot(mlb11$hits,mlb11$runs,xlab=“hits“,ylab=“runs“,frame.plot=TRUE,col=“blue“)plot(mlb11$bat_avg,mlb11$runs,xlab=“batblue“)plot(mlb11$new_slug,mlb11$runs,xlab=“new_slug“,ylab=“runs“,frame.plot=TRUE,col=“blue“)plot(mlb11$new_onbase,mlb11$runs,xlab=“new_onbase“,ylab=“runs“,frame.plot=TRUE,col=“blue“)plot(mlb11$new_obs,mlb11$runs,xlab=“newblue“)1$new_obs)for(iin1:5){print(summary(lms))}####Call:##l)=l[,i]~l[,i+1],data=######Min1QMedian3QMax##93-103.718-27.179-5.23319.322140.6####Coefficients:## EstimateStd.ErrortvaluePr(>|t|)##(Intercept)-375.5600 151.1806-2.4840.0192*##l[,i+1] 0.7589 0.1071 7.0851.04e-07***##---##Signif.codes:0”***”0.001”**”0.01”*”0.05”.”0.1””1####Residualstandarderror:50.23on28degreesoffreedom##MultipleR-squared:0.6419,AdjustedR-squared:0.6292##F-statistic:50.2on1and28DF,p-value:1.043e-07######Call:##lm(formula=l[,i]~l[,i+1],data=l)####Residuals:##Min1QMedian3QMax##-27.855-8.8401.14110.08621.899####Coefficients:##(>|t|)EstimateStd.ErrortvaluePr##(Intercept)32e-06***-312.151.0-6.121.##l[,i+1]6750.9199.833.79<2e-16***##---##Signif.codes:0”***”0.001”**”0.01”*”0.05”.”0.1””1####Residualstandarderror:13.71on28degreesoffreedom##MultipleR-squared:0.9761,AdjustedR-squared:0.9752##F-statistic:1142on1and28DF,p-value:<2.2e-16######Call:##l)##lm(formula=l[,i]~l[,i+1],data=##Residuals:##Min1QMedian3QMax##-0.0120811-0.0038072-0.00076230.00505690.0142072####Coefficients:## EstimateStd.ErrortvaluePr(>|t|)##(Intercept)0.11038 0.01851 5.9622.02e-06***##l[,i+1] 0.36244 0.04630 7.8281.58e-08***##---##Signif.codes:0”***”0.001”**”0.01”*”0.05”.”0.1””1####Residualstandarderror:0.007263on28degreesoffreedom##MultipleR-squared:0.6864,AdjustedR-squared:0.6752##F-statistic:61.28on1and28DF,p-value:1.582e-08######Call:##lm(formula=l[,i]~l[,i+1],data=l)####Residuals:## Min 1Q Median Max##-0.035295-0.0084810.0001560.010515####Coefficients:## EstimateStd.ErrortvaluePr(>|t|)##(Intercept)-0.20671 0.06434-3.2130.0033**##l[,i+1] 1.88957 0.20239 9.4203.54e-10***##---##Signif.codes:0”***”0.001”**”0.01”*”0.05”.”0.1””1####Residualstandarderror:0.01452on28degreesoffreedom##MultipleR-squared:0.7601,AdjustedR-squared:0.7516##F-statistic:88.74on1and28DF,p-value:3.538e-10######Call:##l)lm(formula=l[,i]~l[,i+1],data=####Residuals:1QMedian1QMedian3Q-0.00425990.0009995Max##-0.0074684180.0127444####Coefficients:## EstimateStd.ErrortvaluePr(>|t|)##(Intercept)0.10243 0.01535 6.6743.05e-07***##l[,i+1] 0.30321 0.0213114.2292.42e-14***##---##Signif.codes:0”***”0.001”**”0.01”*”0.05”.”0.1””1######Residualstandarderror:0.004768on28de

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論