




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)
文檔簡介
1、Chapter 18 Learning From Observations2OutlineIntroductionForms of learningInductive learningLearning decision treesEnsemble learningWhy learning works: computational learning theory3IntroductionAn idea: Percepts should be also used to improve the agents ability to act in the future. Learning helps t
2、o realize the idea.When learning takes places?The agent observes its interactions with the world.The agent observes its own decision-making processes.4IntroductionThe range of learning: from trivial memorization of experiences to the creation of entire scientific theories. We describes inductive lea
3、rning from observations in this chapter. How to learn simple theories in propositional logic.A theoretical analysis that explains why inductive learning works.5Forms of learningLearning agent Performance element Decides what actions to take Learning elementModifies the performance element so as to m
4、ake better decisions.Machine learning element researchers have come up with a large variety of learning elements.6Forms of learningThree major issues of designing a learning element:Which components of the performance element are to be learned.What feedback is available to learn these components.Wha
5、t representation is used for the components.7Forms of learningMany ways are available to build the performance element of an agentThe components of these agents A direct mapping from conditions on the current state to actions.A means to infer relevant properties of the world from the percept sequenc
6、e.Information about the way the world evolves and about the results of possible actions the agent can take.8Forms of learningUtility information indicating the desirability of world statesAction-value information indicating the desirability of actions.Goals that describe classes of states whose achi
7、evement maximizes the agents utility.Those components can be learned from appropriatefeedback. 9Forms of learningExamples Example 1:An agent training to become a taxi driver.“Brake!” - learn a condition-action rule for when to brake.Example 2:An agent learning whether the image contains a busSeeing
8、many camera images that it is told contain buses - learn to recognize them.10Forms of learningExample 3: An agent trying action and observing the resultsBraking hard on a wet road- learn the effects of its actionExample 4: An agent learning a functionReceives no tip from passengers who have been tho
9、roughly shaken up during the trip -learn a useful component of its overall utility function.11Forms of learningThe most important factor in determining the nature of the learning problem that the agent faces The type of feedback available for learning Three cases in the field of machine learning: Su
10、pervised Learning a function from examples of its inputs and outputs. In Example 1 and 2, a teacher provided the correct output value of the examples .12Forms of learningIn Example 3,the output value was available directly from the agents percepts.In fully observable environments, an agent can obser
11、ve the effects of its actions and hence can use supervised method to learn to predict them.In partially observable environments, the problem is more difficult, because the immediate effects might be invisible 13Forms of learningUnsupervisedLearning patterns in the input when no specific output value
12、s are supplied.Example: A taxi agent might gradually develop a concept of “good traffic days” and “bad traffic days ” without ever being given labeled examples of each.A purely unsupervised learning agent cannot learn what to do ,because it has no information as to what constitutes a correct action
13、or a desirable state.14Forms of learningReinforcement learningMore general than the other two categories.Learning from reinforcement without being told what to do by a teacher.Example : the lack of a tip at the end of the journey give the agent some indication that its behavior is undesirable.Typica
14、lly includes the subproblem of learning how the environment works.15Forms of learningThe representation of the learned information also plays a very important role in determining how the learning algorithm must work.Examples Linear weighted polynomials for utility functions in game-playing programsP
15、ropositional and first-order logical sentences for all of the components in a logical agentProbabilistic descriptions such as Bayesian networks for the inferential components of a decision-theoretic agent.Effective learning algorithms have been devised for all of these. 16Forms of learningAnother ma
16、jor factor in the design of learning systems: the availability of prior knowledge.The majority of learning research in AI, computer science ,and psychology has studied the case in which the agent begins with no knowledge at all about what it is trying to learn. Most human learning takes place in the
17、 context of a good deal of background knowledge. Whatever the truth, there is no doubt that prior knowledge can help enormously in learning. 17Inductive learning An algorithm for deterministic supervised learningInput the correct value of the unknown function for particular inputs and must try to re
18、cover the unknown function or something close to it. More formally, an example is a pair (x,f(x). where x is the input and f(x) is the output of the function applied to x.18Inductive learning Given a collection of examples of f, return a function h that approximates f. The function h is called a hyp
19、othesis. Choosing h is the fundamental problem of inductionA good hypothesis will generalize well.Difficult to distinguish whether any particular h is a good approximation of f 19Inductive learningExample: Fitting a function of a single variable to some data points20Inductive learning Figure 18.1(a)
20、 shows some data with exact fit by a straight line (a polynomial degree 1).Figure 18.1(b) shows a high-degree polynomial that is also consistent with the same data.Figure 18.1(c) shows a second data set. There is no consistent straight line for this data set; It requires a degree-6 polynomial (with
21、7parameters )for an exact fit. Figure 18.1(d) shows that dada in (c) can be fit exactly by a simple function of the form ax+b+csinx21Inductive learning The example shows the importance of the choice of hypothesis space.A learning problem is realizable if the hypothesis space contains the true functi
22、on, otherwise ,it is unrealizable.Sometimes the true function is not known, so we dont know whether a given learning problem is realizable.22Inductive learning Two approaches to solve the problemUse prior knowledge to derive a hypothesis space in which we know the true function must lie. Use the lar
23、gest possible hypothesis space. 23Learning decision trees Decision tree inductionOne of the simplest learning algorithmMost successful forms of learning algorithm.A good introduction to the area of inductive learning Easy to implement. 24Learning decision trees Performance elementsA decision tree In
24、put : an object or situation described by a set of attributes. Attributes can be discrete or continuous. Return: a “decision”, the predicted output value for the input.Learning categoryClassification learning: earning a discrete-valued function. Attributes can be discrete or continuous.Regression: l
25、earning a continuous functionClassification in this chapter is Boolean classification: true (positive) or false (negative)25Learning decision trees A decision tree reaches its decision by performing a sequence of tests Internal node in the tree: a test of the value of one of the properties. The bran
26、ches from the node :labeled with the possible values of the testEach leaf node in the tree specifies the value to be returned if that leaf is reachedThe decision tree representation seems to be very natural for humans 26Learning decision trees A simple example whether to wait for a table at a restau
27、rant. Aim :learn a definition for the goal predicate WillWait List of attributes 1. Alternate : whether there is a suitable alternative restaurant nearby. 2. Bar : whether the restaurant has a comfortable bar area to wait in 3. Fri/Sat : true on Fridays and Saturdays 4. Hungry : whether we are hungr
28、y27Learning decision trees 5. Patrons : how many people in the restaurant (values are None, Some, and Full). 6. Price : the restaurants price range ($,$,$) 7. Raining : whether it is raining outside. 8. Reservation : whether we made a reservation 9. Type : the kind of restaurant (French, Italian, Th
29、ai, or burger) 10. WaitEstimate: the wait estimated by the host(0-10 minutes,10-30,30-60,60)2829Learning decision trees Figure 18.2 shows that the decision tree usually used by one of us for this domain The tree does not use the Price and Type attributes Examples are processed by the tree starting a
30、t the root and following the appropriate branch until a leaf is reached: An example with Patrons=Full and WaitEstimate =0- 10 will be classified as positive (yes, we will wait for a table).30Learning decision trees Expressiveness of decision treesAn assertion of the form of any particular decision t
31、ree hypothesis for the WillWait goal predicate: where each condition Pi (s) is a conjunction of tests corresponding to a path from the root of the tree to a leaf with a positive outcome.The decision tree is really describing a relationship between WillWait and some logical combination of attribute v
32、alues.31Learning decision trees We cannot use decision trees to present tests that refer to two or more different objects. An exampleMeans: Is there a cheaper restaurant nearby?Obviously, we could add another Boolean attribute with the name CheaperRestaurantNearyby, but it is intractable to add all
33、such attributes.32Learning decision trees Any Boolean function can be written as a decision treeBe done trivially by having each row in the truth table for the function correspond to a path in the tree.Yield an exponentially large decision tree representation because the truth table has exponentiall
34、y many rows.Decision tree can represent many function with much smaller trees.33Learning decision trees Decision trees are good for some kinds of function and bad for others. For some kind of function , there is a real problem If the function is parity function ,which returns 1 if and only if an eve
35、n number of inputs are 1,then an exponentially large decision tree will be need.Yield an exponentially large decision tree representation because the truth table has exponentially many rows.It is difficult to use a decision tree to present a majority function, which returns 1if more than half of its
36、 inputs are 1.34Learning decision trees Is there any kind of representation that is efficient for all kind of functions. The answer is no. Consider the set of all Boolean function on n attributes.How many different functions are in this set? This is just the number of different truth tables that we
37、can write down, because the function is defined by its truth table. No matter what representation we use for functions, almost all of the functions are going to require at least that many bits to represent.35Learning decision trees If it takes 2n bits to define the function, then there are 22n diffe
38、rent function on n attributes. This is a scary number. An example With just six Boolean attributes, there are 226 =18,446,744,073,709,551,616 different functions to choose from. We will need some ingenious algorithms to find consistent hypotheses in such a large space. 36Learning decision trees Indu
39、cing decision trees from examplesAn example for a Boolean decision tree consists of a vector of input attributes, X ,and a single Boolean output value y. A set of examples (X1,y1), (X12,y12) is shown in Figure 18.3.The positive examples are the ones in which the goal WillWait is true (X1,X3,.). The
40、negative examples are the ones in which it is false (X1,X5,.). The complete set of examples is called the training set. 37Learning decision trees38Learning decision trees There is a trivial solution to find decision trees that agree with the training set.We could simple construct a decision tree tha
41、t has one path to a leaf for each example, where the path tests each attribute in turn and follows the value for the example and the leaf has the classification of the example.When given the same example again, the decision tree will come up with the right classification39Learning decision trees Pro
42、blem It just memorizes the observations and does not extract any pattern from the examples, so we cannot expect it to be able to extrapolate to example it has not seen. Solution Applying Ockhams razor, find instead the smallest decision tree that is consistent with the examples. Finding “smallest” i
43、s an intractable problem, so finding a “smallish ” one instead.40Learning decision trees The basic idea behind the DECISION-TREE- LEARNING algorithm is to test the most important attribute. Most important attribute: the one that makes the most difference to the classification of an example. The corr
44、ect classification with a small number of tests: All paths in the tree will be short and the tree as a whole will be small. Figure 18.4 shows how the algorithm gets started. 41Learning decision trees42Learning decision trees We are given 12 training examples. Figure 18.4 (a) shows that Type is a poo
45、r attribute: Type leaves us with four possible outcomes, each of which has the same number of positive and negative examples. Figure 18.4 (b) shows Patrons is a fairly important attribute: If the value is None or Some, then we are left with example sets for which we can answer definitively (No and Y
46、es, respectively). If the value is Full, we are left with a mixed set of examples.43Learning decision trees In general ,after the first attribute test splits up the examples, each outcome is a new decision tree learning problem in itself, with fewer examples and one fewer attribute. Four cases to co
47、nsider for these recursive problems: 1. If there are some positive and some negative examples, then choose the best attribute to split them. 2. If all the remaining examples are positive (or all negative), then we are done; we can answer Yes or No 44Learning decision trees 3. If there are no example
48、s left, it means that no such example has been observed, and we return a default value calculated from the majority classification at the nodes parent 4. If there are no attributes left, but both positive and negative examples, we have a problem. 454647From the figure above, we can find that :The tr
49、ee is clearly different from the original tree show in Figure 18.2, despite the fact that the data were actually generated from an agent using the original tree.The learning algorithm looks at the examples, not the function, and its hypothesis not only agrees with all the examples, but is considerab
50、ly simpler than the original tree.The learning algorithm has no reason to include tests for Raining and Reservation, for it can classify all the examples without them.Learning decision trees 48It has also detected an interesting and previously unsuspected pattern: the first author will wait for Thai
51、 food on weekends.Some questions about Figure 18.6:The tree is bound to make a mistake; e.g., it has never seen a case where the wait is 0-10 minutes but the restaurant is full. For a case where Hungry is false, the tree says not to wait, but I(SR) would certainly wait. This raises an obvious questi
52、on: if the algorithm induces a consistent, but in correct, tree from the examples, how incorrect will the tree be?Learning decision trees 49Choosing attribute testsA perfect attribute divides the examples into sets that are all positive or all negative.One suitable measure is the expected amount of
53、information provided by the attribute, where we use the term in the mathematical sense first defined in Shannon and Weaver(1949).To understand the notion of information, think about it as providing the answer to a question. The amount of information contained in the answer depends on ones prior know
54、ledge. The less you know the more information is provided.Learning decision trees 50Learning decision trees Information theory measures information content in bits. One bit of information is enough to answer a yes/no question about which one has no idea.If the possible answer have probabilities , th
55、e information content I of the actual answer is given by 51Learning decision trees A example of Information content I. Whether a coin will come up heads. For the tossing of a fair coin, we getIf the coin is loaded to give 99% heads, we get I(1/100, 99/100)= 0.08 bits, and as the probability of heads
56、 goes to 1, the information of the actual answer goes to 0.52Learning decision trees For decision tree learning, what is the correct classification for a given example.An estimate of the probabilities of the positive and negative examples in the training set.Suppose the training set contains p posit
57、ive examples and n negative examples. Then an estimate of the information contained in a correct answer isThe restaurant training set in Figure 18.3 has p=n=6, so we need 1 bit of information.53Learning decision trees With a attribute, we can measure exactly how much by looking at how much informati
58、on we still need after the attribute test.Any attribute A divides the training set E into subsets E1, , Ev according to their values for A, where A can have v distinct values. Each subset Ei has pi positive examples and ni negative examples, so we need an additional bits of information to answer the
59、 question.54Learning decision trees A randomly chosen example from the training set has the ith value for the attribute with probability , on average, after testing attribute A, the information we need is The information gain from the attribute test is the difference between the original information
60、 requirement and the new requirement:55Learning decision trees Assessing the performance of the learning algorithmMethod 1. Collect a large set of examples. 2.Divide it into two disjoint sets: the training set and the test set. 3.Apply the learning algorithm to the training set, generating a hypothe
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 企業(yè)向個人汽車租賃合同
- 軟件服務(wù)轉(zhuǎn)讓合同
- 土方轉(zhuǎn)包運輸合同
- 業(yè)務(wù)合作伙伴招募合同
- 合肥手房交易合同
- 衣柜合租合同范本
- 《有機化學(xué)》課程標準
- 醫(yī)療器戒租賃合同范本
- 水質(zhì)檢驗工初級考試模擬題(含參考答案)
- 充電設(shè)備出租合同范本
- 汽車電腦故障解碼器項目可行性研究報告評審方案設(shè)計2025年發(fā)改委標準
- 國家文化安全教育課件
- 2025年春新滬粵版物理八年級下冊課件 7.2 運動的快慢 速度
- DG-T 110-2024 茶樹修剪機標準
- 外貿(mào)英語口語900句
- 騰訊風控師(初級)認證考試題庫(附答案)
- 第28課改革開放和社會主義現(xiàn)代化建設(shè)的巨大成就 課件-高一統(tǒng)編版(2019)必修中外歷史綱要上冊
- 豬場消防安全培訓(xùn)
- 歐式古典風格-室內(nèi)設(shè)計風67課件講解
- 2024解析:第十章 浮力綜合應(yīng)用-基礎(chǔ)練(解析版)
- 【MOOC】社會調(diào)查與研究方法-北京大學(xué) 中國大學(xué)慕課MOOC答案
評論
0/150
提交評論