人工智能英文版課件:18_Learning_Observation_第1頁
人工智能英文版課件:18_Learning_Observation_第2頁
人工智能英文版課件:18_Learning_Observation_第3頁
人工智能英文版課件:18_Learning_Observation_第4頁
人工智能英文版課件:18_Learning_Observation_第5頁
已閱讀5頁,還剩101頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

1、Chapter 18 Learning From Observations2OutlineIntroductionForms of learningInductive learningLearning decision treesEnsemble learningWhy learning works: computational learning theory3IntroductionAn idea: Percepts should be also used to improve the agents ability to act in the future. Learning helps t

2、o realize the idea.When learning takes places?The agent observes its interactions with the world.The agent observes its own decision-making processes.4IntroductionThe range of learning: from trivial memorization of experiences to the creation of entire scientific theories. We describes inductive lea

3、rning from observations in this chapter. How to learn simple theories in propositional logic.A theoretical analysis that explains why inductive learning works.5Forms of learningLearning agent Performance element Decides what actions to take Learning elementModifies the performance element so as to m

4、ake better decisions.Machine learning element researchers have come up with a large variety of learning elements.6Forms of learningThree major issues of designing a learning element:Which components of the performance element are to be learned.What feedback is available to learn these components.Wha

5、t representation is used for the components.7Forms of learningMany ways are available to build the performance element of an agentThe components of these agents A direct mapping from conditions on the current state to actions.A means to infer relevant properties of the world from the percept sequenc

6、e.Information about the way the world evolves and about the results of possible actions the agent can take.8Forms of learningUtility information indicating the desirability of world statesAction-value information indicating the desirability of actions.Goals that describe classes of states whose achi

7、evement maximizes the agents utility.Those components can be learned from appropriatefeedback. 9Forms of learningExamples Example 1:An agent training to become a taxi driver.“Brake!” - learn a condition-action rule for when to brake.Example 2:An agent learning whether the image contains a busSeeing

8、many camera images that it is told contain buses - learn to recognize them.10Forms of learningExample 3: An agent trying action and observing the resultsBraking hard on a wet road- learn the effects of its actionExample 4: An agent learning a functionReceives no tip from passengers who have been tho

9、roughly shaken up during the trip -learn a useful component of its overall utility function.11Forms of learningThe most important factor in determining the nature of the learning problem that the agent faces The type of feedback available for learning Three cases in the field of machine learning: Su

10、pervised Learning a function from examples of its inputs and outputs. In Example 1 and 2, a teacher provided the correct output value of the examples .12Forms of learningIn Example 3,the output value was available directly from the agents percepts.In fully observable environments, an agent can obser

11、ve the effects of its actions and hence can use supervised method to learn to predict them.In partially observable environments, the problem is more difficult, because the immediate effects might be invisible 13Forms of learningUnsupervisedLearning patterns in the input when no specific output value

12、s are supplied.Example: A taxi agent might gradually develop a concept of “good traffic days” and “bad traffic days ” without ever being given labeled examples of each.A purely unsupervised learning agent cannot learn what to do ,because it has no information as to what constitutes a correct action

13、or a desirable state.14Forms of learningReinforcement learningMore general than the other two categories.Learning from reinforcement without being told what to do by a teacher.Example : the lack of a tip at the end of the journey give the agent some indication that its behavior is undesirable.Typica

14、lly includes the subproblem of learning how the environment works.15Forms of learningThe representation of the learned information also plays a very important role in determining how the learning algorithm must work.Examples Linear weighted polynomials for utility functions in game-playing programsP

15、ropositional and first-order logical sentences for all of the components in a logical agentProbabilistic descriptions such as Bayesian networks for the inferential components of a decision-theoretic agent.Effective learning algorithms have been devised for all of these. 16Forms of learningAnother ma

16、jor factor in the design of learning systems: the availability of prior knowledge.The majority of learning research in AI, computer science ,and psychology has studied the case in which the agent begins with no knowledge at all about what it is trying to learn. Most human learning takes place in the

17、 context of a good deal of background knowledge. Whatever the truth, there is no doubt that prior knowledge can help enormously in learning. 17Inductive learning An algorithm for deterministic supervised learningInput the correct value of the unknown function for particular inputs and must try to re

18、cover the unknown function or something close to it. More formally, an example is a pair (x,f(x). where x is the input and f(x) is the output of the function applied to x.18Inductive learning Given a collection of examples of f, return a function h that approximates f. The function h is called a hyp

19、othesis. Choosing h is the fundamental problem of inductionA good hypothesis will generalize well.Difficult to distinguish whether any particular h is a good approximation of f 19Inductive learningExample: Fitting a function of a single variable to some data points20Inductive learning Figure 18.1(a)

20、 shows some data with exact fit by a straight line (a polynomial degree 1).Figure 18.1(b) shows a high-degree polynomial that is also consistent with the same data.Figure 18.1(c) shows a second data set. There is no consistent straight line for this data set; It requires a degree-6 polynomial (with

21、7parameters )for an exact fit. Figure 18.1(d) shows that dada in (c) can be fit exactly by a simple function of the form ax+b+csinx21Inductive learning The example shows the importance of the choice of hypothesis space.A learning problem is realizable if the hypothesis space contains the true functi

22、on, otherwise ,it is unrealizable.Sometimes the true function is not known, so we dont know whether a given learning problem is realizable.22Inductive learning Two approaches to solve the problemUse prior knowledge to derive a hypothesis space in which we know the true function must lie. Use the lar

23、gest possible hypothesis space. 23Learning decision trees Decision tree inductionOne of the simplest learning algorithmMost successful forms of learning algorithm.A good introduction to the area of inductive learning Easy to implement. 24Learning decision trees Performance elementsA decision tree In

24、put : an object or situation described by a set of attributes. Attributes can be discrete or continuous. Return: a “decision”, the predicted output value for the input.Learning categoryClassification learning: earning a discrete-valued function. Attributes can be discrete or continuous.Regression: l

25、earning a continuous functionClassification in this chapter is Boolean classification: true (positive) or false (negative)25Learning decision trees A decision tree reaches its decision by performing a sequence of tests Internal node in the tree: a test of the value of one of the properties. The bran

26、ches from the node :labeled with the possible values of the testEach leaf node in the tree specifies the value to be returned if that leaf is reachedThe decision tree representation seems to be very natural for humans 26Learning decision trees A simple example whether to wait for a table at a restau

27、rant. Aim :learn a definition for the goal predicate WillWait List of attributes 1. Alternate : whether there is a suitable alternative restaurant nearby. 2. Bar : whether the restaurant has a comfortable bar area to wait in 3. Fri/Sat : true on Fridays and Saturdays 4. Hungry : whether we are hungr

28、y27Learning decision trees 5. Patrons : how many people in the restaurant (values are None, Some, and Full). 6. Price : the restaurants price range ($,$,$) 7. Raining : whether it is raining outside. 8. Reservation : whether we made a reservation 9. Type : the kind of restaurant (French, Italian, Th

29、ai, or burger) 10. WaitEstimate: the wait estimated by the host(0-10 minutes,10-30,30-60,60)2829Learning decision trees Figure 18.2 shows that the decision tree usually used by one of us for this domain The tree does not use the Price and Type attributes Examples are processed by the tree starting a

30、t the root and following the appropriate branch until a leaf is reached: An example with Patrons=Full and WaitEstimate =0- 10 will be classified as positive (yes, we will wait for a table).30Learning decision trees Expressiveness of decision treesAn assertion of the form of any particular decision t

31、ree hypothesis for the WillWait goal predicate: where each condition Pi (s) is a conjunction of tests corresponding to a path from the root of the tree to a leaf with a positive outcome.The decision tree is really describing a relationship between WillWait and some logical combination of attribute v

32、alues.31Learning decision trees We cannot use decision trees to present tests that refer to two or more different objects. An exampleMeans: Is there a cheaper restaurant nearby?Obviously, we could add another Boolean attribute with the name CheaperRestaurantNearyby, but it is intractable to add all

33、such attributes.32Learning decision trees Any Boolean function can be written as a decision treeBe done trivially by having each row in the truth table for the function correspond to a path in the tree.Yield an exponentially large decision tree representation because the truth table has exponentiall

34、y many rows.Decision tree can represent many function with much smaller trees.33Learning decision trees Decision trees are good for some kinds of function and bad for others. For some kind of function , there is a real problem If the function is parity function ,which returns 1 if and only if an eve

35、n number of inputs are 1,then an exponentially large decision tree will be need.Yield an exponentially large decision tree representation because the truth table has exponentially many rows.It is difficult to use a decision tree to present a majority function, which returns 1if more than half of its

36、 inputs are 1.34Learning decision trees Is there any kind of representation that is efficient for all kind of functions. The answer is no. Consider the set of all Boolean function on n attributes.How many different functions are in this set? This is just the number of different truth tables that we

37、can write down, because the function is defined by its truth table. No matter what representation we use for functions, almost all of the functions are going to require at least that many bits to represent.35Learning decision trees If it takes 2n bits to define the function, then there are 22n diffe

38、rent function on n attributes. This is a scary number. An example With just six Boolean attributes, there are 226 =18,446,744,073,709,551,616 different functions to choose from. We will need some ingenious algorithms to find consistent hypotheses in such a large space. 36Learning decision trees Indu

39、cing decision trees from examplesAn example for a Boolean decision tree consists of a vector of input attributes, X ,and a single Boolean output value y. A set of examples (X1,y1), (X12,y12) is shown in Figure 18.3.The positive examples are the ones in which the goal WillWait is true (X1,X3,.). The

40、negative examples are the ones in which it is false (X1,X5,.). The complete set of examples is called the training set. 37Learning decision trees38Learning decision trees There is a trivial solution to find decision trees that agree with the training set.We could simple construct a decision tree tha

41、t has one path to a leaf for each example, where the path tests each attribute in turn and follows the value for the example and the leaf has the classification of the example.When given the same example again, the decision tree will come up with the right classification39Learning decision trees Pro

42、blem It just memorizes the observations and does not extract any pattern from the examples, so we cannot expect it to be able to extrapolate to example it has not seen. Solution Applying Ockhams razor, find instead the smallest decision tree that is consistent with the examples. Finding “smallest” i

43、s an intractable problem, so finding a “smallish ” one instead.40Learning decision trees The basic idea behind the DECISION-TREE- LEARNING algorithm is to test the most important attribute. Most important attribute: the one that makes the most difference to the classification of an example. The corr

44、ect classification with a small number of tests: All paths in the tree will be short and the tree as a whole will be small. Figure 18.4 shows how the algorithm gets started. 41Learning decision trees42Learning decision trees We are given 12 training examples. Figure 18.4 (a) shows that Type is a poo

45、r attribute: Type leaves us with four possible outcomes, each of which has the same number of positive and negative examples. Figure 18.4 (b) shows Patrons is a fairly important attribute: If the value is None or Some, then we are left with example sets for which we can answer definitively (No and Y

46、es, respectively). If the value is Full, we are left with a mixed set of examples.43Learning decision trees In general ,after the first attribute test splits up the examples, each outcome is a new decision tree learning problem in itself, with fewer examples and one fewer attribute. Four cases to co

47、nsider for these recursive problems: 1. If there are some positive and some negative examples, then choose the best attribute to split them. 2. If all the remaining examples are positive (or all negative), then we are done; we can answer Yes or No 44Learning decision trees 3. If there are no example

48、s left, it means that no such example has been observed, and we return a default value calculated from the majority classification at the nodes parent 4. If there are no attributes left, but both positive and negative examples, we have a problem. 454647From the figure above, we can find that :The tr

49、ee is clearly different from the original tree show in Figure 18.2, despite the fact that the data were actually generated from an agent using the original tree.The learning algorithm looks at the examples, not the function, and its hypothesis not only agrees with all the examples, but is considerab

50、ly simpler than the original tree.The learning algorithm has no reason to include tests for Raining and Reservation, for it can classify all the examples without them.Learning decision trees 48It has also detected an interesting and previously unsuspected pattern: the first author will wait for Thai

51、 food on weekends.Some questions about Figure 18.6:The tree is bound to make a mistake; e.g., it has never seen a case where the wait is 0-10 minutes but the restaurant is full. For a case where Hungry is false, the tree says not to wait, but I(SR) would certainly wait. This raises an obvious questi

52、on: if the algorithm induces a consistent, but in correct, tree from the examples, how incorrect will the tree be?Learning decision trees 49Choosing attribute testsA perfect attribute divides the examples into sets that are all positive or all negative.One suitable measure is the expected amount of

53、information provided by the attribute, where we use the term in the mathematical sense first defined in Shannon and Weaver(1949).To understand the notion of information, think about it as providing the answer to a question. The amount of information contained in the answer depends on ones prior know

54、ledge. The less you know the more information is provided.Learning decision trees 50Learning decision trees Information theory measures information content in bits. One bit of information is enough to answer a yes/no question about which one has no idea.If the possible answer have probabilities , th

55、e information content I of the actual answer is given by 51Learning decision trees A example of Information content I. Whether a coin will come up heads. For the tossing of a fair coin, we getIf the coin is loaded to give 99% heads, we get I(1/100, 99/100)= 0.08 bits, and as the probability of heads

56、 goes to 1, the information of the actual answer goes to 0.52Learning decision trees For decision tree learning, what is the correct classification for a given example.An estimate of the probabilities of the positive and negative examples in the training set.Suppose the training set contains p posit

57、ive examples and n negative examples. Then an estimate of the information contained in a correct answer isThe restaurant training set in Figure 18.3 has p=n=6, so we need 1 bit of information.53Learning decision trees With a attribute, we can measure exactly how much by looking at how much informati

58、on we still need after the attribute test.Any attribute A divides the training set E into subsets E1, , Ev according to their values for A, where A can have v distinct values. Each subset Ei has pi positive examples and ni negative examples, so we need an additional bits of information to answer the

59、 question.54Learning decision trees A randomly chosen example from the training set has the ith value for the attribute with probability , on average, after testing attribute A, the information we need is The information gain from the attribute test is the difference between the original information

60、 requirement and the new requirement:55Learning decision trees Assessing the performance of the learning algorithmMethod 1. Collect a large set of examples. 2.Divide it into two disjoint sets: the training set and the test set. 3.Apply the learning algorithm to the training set, generating a hypothe

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論