Ng機(jī)器學(xué)習(xí)課程_第1頁(yè)
Ng機(jī)器學(xué)習(xí)課程_第2頁(yè)
Ng機(jī)器學(xué)習(xí)課程_第3頁(yè)
Ng機(jī)器學(xué)習(xí)課程_第4頁(yè)
Ng機(jī)器學(xué)習(xí)課程_第5頁(yè)
已閱讀5頁(yè),還剩15頁(yè)未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

Ng機(jī)器學(xué)習(xí)課程(/materials.html)Notes理論要點(diǎn),并且給出所有課程exercise的作業(yè)code和實(shí)驗(yàn)結(jié)果分析?!庇斡臼怯螘?huì)的“,希望通過(guò)這個(gè)系列可以深刻理解機(jī)器學(xué)習(xí)算法,并且自己動(dòng)手寫(xiě)出work高效的機(jī)器學(xué)習(xí)算法code應(yīng)用到真實(shí)數(shù)據(jù)集做實(shí)驗(yàn),理論和實(shí)戰(zhàn)兼?zhèn)洹art1LinearRegression1.SupervisedLearning在SuperviseLearning的Setting中,我們有若干訓(xùn)練數(shù)據(jù)(xA(i),yA(i))i=1,...,m,這里i用于indextrainingexample。監(jiān)督學(xué)習(xí)的任務(wù)就是要找到一個(gè)函數(shù)(又稱為模型或者假設(shè)hypothesis)H:X->Y,使得h(x)是相應(yīng)值y的好的預(yù)測(cè)。整個(gè)過(guò)程可以描述為下圖Trainin?setLeiminsalgorithmkah―predictedylfT>iprodizlcd當(dāng)待預(yù)測(cè)的目標(biāo)變量是連續(xù)型數(shù)據(jù)時(shí),我們稱之為回歸(regression)問(wèn)題;當(dāng)待預(yù)測(cè)的目標(biāo)變量是離散型數(shù)據(jù)時(shí),我們稱之為分類(classification)問(wèn)題。因此回歸問(wèn)題和分類問(wèn)題是監(jiān)督學(xué)習(xí)針對(duì)連續(xù)型數(shù)據(jù)預(yù)測(cè)和離散型數(shù)據(jù)預(yù)測(cè)的兩種典型學(xué)習(xí)問(wèn)題。2LinearRegression一般而言,我們會(huì)用feature向量來(lái)描述訓(xùn)練數(shù)據(jù)X,我們用x_jAi來(lái)表示,其中j用于indexfeature,i用于index訓(xùn)練樣本。在監(jiān)督學(xué)習(xí)里面,我們需要找到一個(gè)最佳的預(yù)測(cè)函數(shù)h(x),比如我們可以選取feature的線性組合函數(shù)遍3)..目fi+血工]+贏翊那么我們的問(wèn)題就變成了要尋找最優(yōu)的參^theta可以使得預(yù)測(cè)的error最小。把這個(gè)函數(shù)用向量表示機(jī)工)=而=射H機(jī)器學(xué)習(xí)里面一般默認(rèn)變量為列向量,因此這里是參數(shù)向量theta的轉(zhuǎn)置矩陣。同時(shí)我們還加上7”feature即)^0=1以便方便表示成為向量乘積的形式。為了尋找最優(yōu)的參麴theta,我們可以最小化errorfunction即costfunction期=;史施函}-鈉"-1=1這個(gè)就是least-squarescostfunction,通過(guò)最小化這個(gè)函數(shù)來(lái)尋找最優(yōu)參數(shù)。3LMS算法為了尋找最優(yōu)參數(shù),我們可以隨機(jī)初始化,然后沿著梯度慢慢改變參數(shù)值(需要改變theta所有維),觀察costfunction值的變化,這就是梯度下降法的思想。假設(shè)我們只有一個(gè)訓(xùn)練樣本(x,y),對(duì)參數(shù)\theta_j求偏導(dǎo)數(shù)有斜)=汐"拭ZOtfj=〔蛔工)F、壽(立練-y)=儂誠(chéng)工)一#)咨我們可以得到下面的參數(shù)updaterule%:=Gj+"舟一柘(工國(guó)))率.其中\(zhòng)alpha叫l(wèi)earningrate,用于調(diào)節(jié)每次迭代參數(shù)變化的大小,這就是LMS(leastmeansquares)算法。用直觀的角度去理解,如果我們看到一個(gè)訓(xùn)練樣本滿足yA(i)-h(x(i))等于0,那么說(shuō)明參數(shù)就不必再更新;反之,如果預(yù)測(cè)值error較大,那么參數(shù)的變化也需要比較大。如果我們有多個(gè)訓(xùn)練樣本,比如有m個(gè)樣本,每個(gè)樣本用n個(gè)feature來(lái)描述,那么GD的updaterule需要對(duì)n個(gè)feature對(duì)應(yīng)的n個(gè)參數(shù)都做更新,有兩種更新方式:batchgradient

descent和stochastic/incrementalgradientdescent。對(duì)于前者,每次更新一輪參數(shù)\theta_j(注意n個(gè)參數(shù)需要同步更新才算完成一輪)需要都需要考慮所有的m個(gè)訓(xùn)練樣本,即Repeatuntilconvergence{%:=%+口(衲訃一曲3嚀)工¥(forevery)也就是每更新一個(gè)\theta_j我們需要計(jì)算所有m個(gè)訓(xùn)練樣本的predictionerror然后求和。而后者更新一輪參^\theta_j我們只需要考慮一個(gè)訓(xùn)練樣本,然后逐個(gè)考慮完所有樣本(因此是incremental的)即Loop《fori=Itoni,{&j;=+q(鈉"—頑h"〕))矽)(Eoreveryj).當(dāng)訓(xùn)練樣本sizem非常大時(shí),顯然stochastic/incrementalgradientdescent會(huì)更有優(yōu)勢(shì),因?yàn)槊扛乱惠唴?shù)不需要掃描所有的訓(xùn)練樣本。我們也可以把costfunction寫(xiě)出矩陣相乘的形式,即令則有則有-g⑴-(上同件_5E⑴一仇(工同)一胛)因此代價(jià)函數(shù)J可以寫(xiě)成11m-礦即-切=豆£姻凹—網(wǎng)』(=1=?我們將J(\theta)對(duì)向量\theta求梯度(對(duì)于向量求導(dǎo),得到的是梯度,是有方向的,這里需要用到matrixcalculus,比標(biāo)量形式下求導(dǎo)麻煩一些,詳見(jiàn)NG課程notes),令梯度為0可以直接得到極值點(diǎn),也就是唯一全局最優(yōu)解情形下的最值點(diǎn)(normalequations)B=這樣可以避免迭代求解,直接得到最優(yōu)的參^\theta值。3編程實(shí)戰(zhàn)(注:本部分編程習(xí)題全部來(lái)自AndrewNg機(jī)器學(xué)習(xí)網(wǎng)上公開(kāi)課)單變量的LinearRegression在單變量的LinearRegression中,每個(gè)訓(xùn)練樣本只用一個(gè)feature來(lái)描述,例如某個(gè)卡車租賃公司分店的利潤(rùn)和當(dāng)?shù)厝丝诳偭康年P(guān)系,給定若干人口總量和利潤(rùn)的訓(xùn)練樣本,要求進(jìn)行LinearRegression得到一條曲線,然后根據(jù)曲線對(duì)新的城市人口總量條件下進(jìn)行利潤(rùn)的預(yù)測(cè)。主程序如下[plain]viewplaincopy%%Initializationclear;closeall;clc3.%%====================Part1:BasicFunction====================%CompletewarmUpExercise.mfprintf('RunningwarmUpExercise...\n');fprintf('5x5IdentityMatrix:\n');warmUpExercise()9.fprintf('Programpaused.Pressentertocontinue.\n');pause;12.13.%%=======================Part2:Plotting=======================fprintf('PlottingData...\n')data=load('ex1data1.txt');X=data(:,1);y=data(:,2);m=length(y);%numberoftrainingexamples19.%PlotData%Note:YouhavetocompletethecodeinplotData.mplotData(X,y);23.fprintf('Programpaused.Pressentertocontinue.\n');pause;26.%%===================Part3:Gradientdescent===================fprintf('RunningGradientDescent...\n')29.X=[ones(m,1),data(:,1)];%Addacolumnofonestoxtheta=zeros(2,1);%initializefittingparameters32.%Somegradientdescentsettingsiterations=1500;alpha=0.01;36.%computeanddisplayinitialcostcomputeCost(X,y,theta)39.%rungradientdescenttheta=gradientDescent(X,y,theta,alpha,iterations);42.%printthetatoscreenfprintf('Thetafoundbygradientdescent:');fprintf('%f%f\n',theta(1),theta(2));46.%Plotthelinearfitholdon;%keeppreviousplotvisibleplot(X(:,2),X*theta,'-')legend('Trainingdata','Linearregression')holdoff%don'toverlayanymoreplotsonthisfigure52.%Predictvaluesforpopulationsizesof35,000and70,000predictl=[1,3.5]*theta;fprintf('Forpopulation=35,000,wepredictaprofitof%f\n',...predict1*10000);predict2=[1,7]*theta;fprintf('Forpopulation=70,000,wepredictaprofitof%f\n',...predict2*10000);60.fprintf('Programpaused.Pressentertocontinue.\n');pause;63.%%=============Part4:VisualizingJ(theta_0,theta_1)=============fprintf('VisualizingJ(theta_0,theta_1)...\n')66.%GridoverwhichwewillcalculateJtheta0_vals=linspace(-10,10,100);theta1_vals=linspace(-1,4,100);70.%initializeJvalstoamatrixof0'sJ_vals=zeros(length(theta0_vals),length(theta1_vals));73.%FilloutJ_valsfori=1:length(theta0_vals)forj=1:length(theta1_vals)t=[theta0_vals(i);theta1_vals(j)];J_vals(i,j)=computeCost(X,y,t);endend81.82.%Becauseofthewaymeshgridsworkinthesurfcommand,weneedto%transposeJ_valsbeforecallingsurf,orelsetheaxeswillbeflippedJ_vals=J_vals';%Surfaceplotfigure;surf(theta0_vals,theta1_vals,J_vals)xlabel('\theta_0');ylabel('\theta_1');90.%Contourplotfigure;%PlotJ_valsas15contoursspacedlogarithmicallybetween0.01and100contour(theta0_vals,theta1_vals,J_vals,logspace(-2,3,20))xlabel('\theta_0');ylabel('\theta_1');holdon;plot(theta(1),theta(2),'rx','MarkerSize',10,'LineWidth',2);首先load進(jìn)訓(xùn)練數(shù)據(jù),并且visualize出來(lái)

0-彎*、X-板X(qián)X5IIIIIIIII4&B倡12Uw侶2D:2224population然后需要實(shí)現(xiàn)兩個(gè)函數(shù)computeCost和graientDescent,分別計(jì)算代價(jià)函數(shù)和對(duì)參數(shù)按照梯度方向進(jìn)行更新,結(jié)合LinearRegression代價(jià)函數(shù)計(jì)算公式和參數(shù)更新Rule,我們可以實(shí)現(xiàn)如下viewplaincopyfunctionJ=computeCost(X,y,theta)%COMPUTECOSTComputecostforlinearregression%J=COMPUTECOST(X,y,theta)computesthecostofusingthetaasthe%parameterforlinearregressiontofitthedatapointsinXandy5.%Initializesomeusefulvaluesm=length(y);%numberoftrainingexamples8.%YouneedtoreturnthefollowingvariablescorrectlyJ=0;11.%======================YOURCODEHERE======================%Instructions:Computethecostofaparticularchoiceoftheta%YoushouldsetJtothecost.15.J=1/(2*m)*(X*theta-y)'*(X*theta-y);17.%=========================================================================19.end實(shí)現(xiàn)的時(shí)候要注意X是m行2列,theta是2行1列,y是m行1列。由于matlab默認(rèn)矩陣是叉乘,要注意保證相乘的矩陣的維數(shù)滿足叉乘的要求。參數(shù)更新函數(shù)如下[plain]viewplaincopyfunction[theta,J_history]=gradientDescent(X,y,theta,alpha,num_iters)%GRADIENTDESCENTPerformsgradientdescenttolearntheta%theta=GRADIENTDESENT(X,y,theta,alpha,num_iters)updatesthetaby%takingnum_itersgradientstepswithlearningratealpha5.%Initializesomeusefulvaluesm=length(y);%numberoftrainingexamplesJ_history=zeros(num_iters,1);9.foriter=1:num_iters11.%======================YOURCODEHERE======================%Instructions:Performasinglegradientstepontheparametervector%theta.%%Hint:Whiledebugging,itcanbeusefultoprintoutthevalues%ofthecostfunction(computeCost)andgradienthere.%%BatchgradientdescentUpdate=0;fori=1:mUpdate=Update+alpha/m*(y(i)-X(i,:)*theta)*X(i,:)';endtheta=theta+Update;25.%============================================================27.%SavethecostJineveryiterationJ_history(iter)=computeCost(X,y,theta);30.end32.end這里用的是BatchGradientDescent也就是每更新一次參數(shù)都需要掃描所有1^個(gè)訓(xùn)練樣本。Update就是每次參數(shù)的變化量,需要對(duì)所有trainingexample的訓(xùn)練誤差進(jìn)行求和。每次更新參數(shù)后重新計(jì)算代價(jià)函數(shù),把所有歷史的cost記錄保存在J_history中。經(jīng)過(guò)1500次迭代,我們可以輸出求的的參數(shù)theta,畫(huà)出擬合的曲線,并且對(duì)新的人口來(lái)預(yù)測(cè)利潤(rùn)值,即[plain]viewplaincopy1:RunningGradientDescent?ans=7276.Thetafoundbygradientdescent:-3.6302911.166362Forpopulation=35,000,wepredictaprofitof4519.767868Forpopulation=70,000,wepredict7276.Thetafoundbygradientdescent:-3.6302911.166362Forpopulation=35,000,wepredictaprofitof4519.767868Forpopulation=70,000,wepredictaprofitof45342.450129Programpaused.Pressentertocontinue.11擬合出的曲線如下圖25IIIIIIII」,XTrainingdataLinearregression24下面這張圖是在(theta_0,theta_1)上的投影等高線圖,紅叉處就是GD收斂到的最小值處對(duì)于linearregression只有全局最優(yōu)解,所以這個(gè)也是我們想要的最優(yōu)參數(shù)。多變量的LinearRegression如果每個(gè)訓(xùn)練樣本用多個(gè)feature來(lái)描述,這就是多變量的LinearRegression問(wèn)題。比如我們想根據(jù)房子的面積和臥室個(gè)數(shù)來(lái)預(yù)測(cè)房子的價(jià)格,那么現(xiàn)在每個(gè)訓(xùn)練樣本就是用2個(gè)feature來(lái)描述。主程序如下[plain]viewplaincopy%%Initialization2.%%================Part1:FeatureNormalization================4.%%ClearandCloseFiguresclear;closeall;clc7.fprintf('Loadingdata...\n');9.%%LoadDatadata=load('ex1data2.txt');X=data(:,1:2);y=data(:,3);m=length(y);15.%Printoutsomedatapointsfprintf('First10examplesfromthedataset:\n');fprintf('x=[%.0f%.0f],y=%.0f\n',[X(1:10,:)y(1:10,:)]');19.fprintf('Programpaused.Pressentertocontinue.\n');pause;22.%Scalefeaturesandsetthemtozeromeanfprintf('NormalizingFeatures...\n');25.[Xmusigma]=featureNormalize(X);27.%AddintercepttermtoXX=[ones(m,1)X];30.31.%%================Part2:GradientDescent================33.%======================YOURCODEHERE======================%Instructions:Wehaveprovidedyouwiththefollowingstarter%codethatrunsgradientdescentwithaparticular%learningrate(alpha).%%Yourtaskistofirstmakesurethatyourfunctions-%computeCostandgradientDescentalreadyworkwith%thisstartercodeandsupportmultiplevariables.%%Afterthat,tryrunninggradientdescentwith%differentvaluesofalphaandseewhichonegives%youthebestresult.%%Finally,youshouldcompletethecodeattheend%topredictthepriceofa1650sq-ft,3brhouse.%%Hint:Byusingthe'holdon'command,youcanplotmultiple%graphsonthesamefigure.%%Hint:Atprediction,makesureyoudothesamefeaturenormalization.%55.fprintf('Runninggradientdescent...\n');57.%Choosesomealphavaluealpha=0.01;num_iters=1000;61.%InitThetaandRunGradientDescenttheta=zeros(3,1);[theta,J_history]=gradientDescentMulti(X,y,theta,alpha,num_iters);65.%Plottheconvergencegraphfigure;plot(1:numel(J_history),J_history,'-b','LineWidth',2);xlabel('Numberofiterations');ylabel('CostJ');71.%Displaygradientdescent'sresultfprintf('Thetacomputedfromgradientdescent:\n');fprintf('%f\n',theta);fprintf('\n');76.%Estimatethepriceofa1650sq-ft,3brhouse%======================YOURCODEHERE======================%RecallthatthefirstcolumnofXisall-ones.Thus,itdoes%notneedtobenormalized.x_predict=[116503];fori=2:3x_predict(i)=(x_predict(i)-mu(i-1))/sigma(i-1);endprice=x_predict*theta;86.%============================================================88.fprintf(['Predictedpriceofa1650sq-ft,3brhouse'...'(usinggradientdescent):\n$%f\n'],price);91.fprintf('Programpaused.Pressentertocontinue.\n');pause;94.%%================Part3:NormalEquations================96.

00101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131fprintf('Solvingwithnormalequations...");%======================YOURCODEHERE======================.%Instructions:Thefollowingcodecomputestheclosedform.%solutionforlinearregressionusingthenormal.%equations.Youshouldcompletethecodein.%normalEqn.m.%.%Afterdoingso,youshouldcompletethiscode.%topredictthepriceofa1650sq-ft,3brhouse..%.%%LoadData.data=csvread('ex1data2.txt');.X=data(:,1:2);.y=data(:,3);.m=length(y);..%AddintercepttermtoX.X=[ones(m,1)X];..%Calculatetheparametersfromthenormalequation.theta=normalEqn(X,y);..%Displaynormalequation'sresult.fprintf('Thetacomputedfromthenormalequations:\n');.fprintf('%f\n',theta);.fprintf('\n');...%Estimatethepriceofa1650sq-ft,3brhouse.%======================YOURCODEHERE======================.x_predict=[116503];.price=x_predict*theta;.%============================================================

132.fprintf(['Predictedpriceofa1650sq-ft,3brhouse'(usingnormalequations):\n$%f\n'],price);FeatureNormalization通過(guò)觀察feature的特征可以知道,房子的面積的數(shù)值大約是臥室個(gè)數(shù)數(shù)值的1000倍左右,當(dāng)遇到不同feature的數(shù)值范圍差異非常顯著的情況,需要先進(jìn)行featurenormalization,這樣可以加快learning算法的收斂。要進(jìn)行FeatureNormalization,需要首先對(duì)每一列feature值計(jì)算均值\mu和標(biāo)準(zhǔn)差'sigma,然后normalization/scale之后的feature值x'與原始feature值x滿足x'=(x-\mu)八sigma。即把原始的feature減去均值然后除以標(biāo)準(zhǔn)差。因此我們可以這樣實(shí)現(xiàn)featurenormalization的函數(shù)GradientDescent這一步同樣需要實(shí)現(xiàn)計(jì)算代價(jià)函數(shù)和更新參數(shù)的函數(shù),對(duì)于多變量的線性回歸,其代價(jià)函數(shù)也可以寫(xiě)成如下的向量化表示的形式皿=£W*即Fwhere—-S⑴產(chǎn)-(把)丁-,A礦=,-■■—(網(wǎng))T一一-舟上面給出的單變量情形的代價(jià)函數(shù)和參數(shù)updaterule同樣適用于多變量情形,只是現(xiàn)在X有很多列,同樣支持。注意這個(gè)時(shí)候沒(méi)有辦法^\theta_0,\theta_1,\theta_2)上面可視化代價(jià)函數(shù)J,一共有四維。但是可以畫(huà)出代價(jià)函數(shù)J隨迭代次數(shù)的變化曲線如下Numberofiterations這里設(shè)置的learningrate\alpha=0.01,迭代1000次,可以看出在400次左右時(shí)代價(jià)函數(shù)J就幾乎收斂,不再變化。我們也可以調(diào)節(jié)learningrate\alpha,選取合適的learningrate很重要,選得太小收斂很慢,選得太大有可能無(wú)法收斂(每次迭代參數(shù)變化太大沒(méi)法找到極值點(diǎn))Ng建議選取\alpha時(shí)按照l(shuí)ogscale,比如不斷除以3,0.3,0.1,0.03,0.01...NormalEquationsAlternately,我們也可以直接用下面這個(gè)公式來(lái)計(jì)算最優(yōu)M\theta,推導(dǎo)過(guò)程是代價(jià)函數(shù)對(duì)參數(shù)向量\theta求導(dǎo)數(shù),令導(dǎo)數(shù)為0.9={XTX}-iXry.函數(shù)實(shí)現(xiàn)如下[plain]viewplaincopyfunction[theta]=normalEqn(X,y)%NORMALEQ

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論