《數(shù)據(jù)挖掘與管理決策》課程教學(xué)大綱_第1頁
《數(shù)據(jù)挖掘與管理決策》課程教學(xué)大綱_第2頁
《數(shù)據(jù)挖掘與管理決策》課程教學(xué)大綱_第3頁
《數(shù)據(jù)挖掘與管理決策》課程教學(xué)大綱_第4頁
《數(shù)據(jù)挖掘與管理決策》課程教學(xué)大綱_第5頁
已閱讀5頁,還剩8頁未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

1、數(shù)據(jù)挖掘與管理決策課程教學(xué)大綱課程編號:20157英文名:Data mining and Management Decision課程類別:專業(yè)主干(雙語)前置課:統(tǒng)計(jì)學(xué)、線性代數(shù)、管理學(xué)后置課:企業(yè)資源計(jì)劃學(xué) 分:3學(xué)分課 時:51課時選定教材:Data Mining Introductory and Advanced Topics(影印版). Margaret H. Dunham. 清華大學(xué)出版社,2003年10月課程概述:數(shù)據(jù)挖掘是近年來伴隨著數(shù)據(jù)庫系統(tǒng)的大量建立和萬維網(wǎng)的廣泛使用而發(fā)展起來的一門數(shù)據(jù)處理和分析技術(shù),它是數(shù)據(jù)庫、機(jī)器學(xué)習(xí)與統(tǒng)計(jì)學(xué)這三個領(lǐng)域的交叉結(jié)合而形成的一門新興技術(shù)。本課

2、程系統(tǒng)地介紹各種數(shù)據(jù)挖掘的基本概念、方法和算法,并結(jié)合軟件介紹和管理決策案例分析進(jìn)行系統(tǒng)學(xué)習(xí)數(shù)據(jù)挖掘和應(yīng)用。本課程由四部分構(gòu)成:第一部分是導(dǎo)論,全面介紹數(shù)據(jù)挖掘的背景信息、相關(guān)概念以及數(shù)據(jù)挖掘所使用的主要技術(shù);第二部分是數(shù)據(jù)挖掘的核心算法,系統(tǒng)深入地描述了用于分類、聚類和關(guān)聯(lián)規(guī)則的常用算法;第三部分是數(shù)據(jù)挖掘的高級課題,主要敘述了Web挖掘、空間數(shù)據(jù)挖掘、時序數(shù)據(jù)和序列數(shù)據(jù)挖掘。通過數(shù)據(jù)挖掘技術(shù)找到蘊(yùn)藏在數(shù)據(jù)中的有用信息,進(jìn)而找到尚未發(fā)現(xiàn)的知識,為商業(yè)競爭、企業(yè)生產(chǎn)和管理、政府部門決策以及科學(xué)探索等提供信息與知識,對于幫助管理者作出科學(xué)決策具有重要價值。教學(xué)目的:數(shù)據(jù)挖掘技術(shù)經(jīng)過十幾年的發(fā)展,

3、已經(jīng)取得一些重要成果,特別是在基本概念、基本原理、基本算法等方面發(fā)展的越來越清晰。因此,現(xiàn)在開設(shè)此課程具備基本的技術(shù)條件。本課程以介紹基本概念和基本算法為主,作為高級數(shù)據(jù)處理和分析技術(shù),其目的是通過本課程學(xué)習(xí)讓學(xué)生了解信息處理技術(shù)的發(fā)展方向以及數(shù)據(jù)挖掘技術(shù)本身的概念、原理和方法。同時結(jié)合管理決策的案例進(jìn)行教學(xué),以前沿問題的討論與探索為輔,為學(xué)生將來研究和學(xué)習(xí)提供知識儲備,適應(yīng)大數(shù)據(jù)時代的管理需要。教學(xué)方法:本課程課堂教學(xué)主要采用多媒體授課,并輔助以案例教學(xué)、課堂討論和軟件應(yīng)用。各章教學(xué)要求及教學(xué)要點(diǎn)第一章 引言(Introduction)課時分配:3課時教學(xué)要求:通過本章的教學(xué),使學(xué)生了解數(shù)據(jù)

4、挖掘基本概念、數(shù)據(jù)挖掘技術(shù),包括分類、回歸、時間序列分析、預(yù)測、聚類、關(guān)聯(lián)規(guī)則、序列發(fā)現(xiàn),以及 數(shù)據(jù)挖掘與數(shù)據(jù)庫中的知識發(fā)現(xiàn)、數(shù)據(jù)挖掘?qū)ξ磥砉芾頉Q策和社會發(fā)展的影響。教學(xué)內(nèi)容:1.1 Basic Data Mining Tasks1.2 Data Mining Versus Knowledge Discovery in Databases1.3 Data Mining Issues1.4 Data Mining Metrics1.5 Social Implications of Data Mining1.6 Data Mining from a Database Perspective1.7

5、The Future思考題:1. Identify and describe the phases in the KDD process, and how does KDD differ from data mining?2. Find at least three examples of data mining applications that have appeared in the business section of your local publication. And describe the data mining application involved.第二章相關(guān)概念(R

6、elated Concepts)課時分配:4課時教學(xué)要求:通過本章的教學(xué),使學(xué)生了解數(shù)據(jù)處理相關(guān)概念,掌握數(shù)據(jù)庫/OLTP系統(tǒng)、模糊集和模糊邏輯、信息檢索、決策支持系統(tǒng)、維數(shù)據(jù)建模、多維模式、索引、數(shù)據(jù)倉儲、 Web搜索引擎、機(jī)器學(xué)習(xí)、模式匹配等方法及其應(yīng)用的相關(guān)概念。教學(xué)內(nèi)容:2.1 Database/OLTP Systems2.2 Fuzzy Sets and Fuzzy Logic2.3 Information Retrieval2.4 Decision Support Systems2.5 Dimensional Modeling2.6 Indexing2.7 Data Warehou

7、sing2.8 OLAP2.9 Web Search Engines2.10 Statistics2.11 Machine Learning思考題:1. Compare and contrast database, information retrieval, and data mining queries. What metrics are used to measure the performance of each type of query?2. Data warehouse are often viewed to contain relatively static data. Inv

8、estigate techniques that have been proposed to provide updates to this data from the operational data . How often should these updates occur?第三章數(shù)據(jù)挖掘技術(shù) Data Mining Techniques 課時分配:4課時教學(xué)要求:通過本章的教學(xué),使學(xué)生了解數(shù)據(jù)挖掘技術(shù)的統(tǒng)計(jì)方法、貝葉斯定理、回歸和相關(guān)、決策樹、相似性、神經(jīng)網(wǎng)絡(luò)、激勵函數(shù)和遺傳算法等基本公式、計(jì)算步驟等內(nèi)容。教學(xué)內(nèi)容:3.1 Introduction3.2 A Statistical P

9、erspective on Data Mining3.3 Similarity Measures3.4 Decision Trees3.5 Neural Networks3.6 Genetic Algorithms思考題:1Given the following set of values 1,3 ,9 15, 20, determine the jackknife estimate for both the mean and standard deviation of the mean.2. Find the similarity between ,and using the Dice, J

10、accard and Cosine similarity measures.3. given the decision tree in Fig.3.5, classify each of the following students: , and .第四章分類 Classification課時分配:8課時教學(xué)要求:了解分類中的問題和數(shù)據(jù)分析方法,包括基于統(tǒng)計(jì)的算法(如回歸、貝葉斯分類)、基于距離的算法(K最近鄰)、基于決策樹的算法、神經(jīng)網(wǎng)絡(luò)、基于規(guī)則的算法以及其他組合技術(shù)。教學(xué)內(nèi)容:4.1 Introduction4.2 Statistical-Based Algorithms4.3 Dist

11、ance-Based Algorithms4.4 Decision Tree-Based Algorithms4.5 Neural Network-Based Algorithms4.6 Rule-Based Algorithms4.7 Combining Techniques思考題:1Apply the method of least squares technique to determine the division between medium and tall persons using the training data in Table4.1 and classification

12、 shown in output1(see example 4.3). You may use either the division technique or the prediction technique.2. Explain the difference between P(ti|Cj) and P (Cj|ti)3. Compare at least three different guideline that have been proposed for determining the optimal number of hidden nodes in an NN.4. Vario

13、us classification algorithm can be found online. Apply these programs to the height example in Table4.1 using the training classification shown in the output2 column.第五章聚類Clustering課時分配:6課時教學(xué)要求:掌握相似性和距離度量、異常點(diǎn)、層次算法、劃分算法(最小生成樹、平方誤差聚類算法、K均值聚類、最近鄰算法等)、大型數(shù)據(jù)庫聚類(BIRCH、DBSCAN、CURE算法)以及對類別屬性進(jìn)行聚類等方法教學(xué)內(nèi)容:5.1 I

14、ntroduction5.2 Similarity and Distance Measures5.3 Outliers5.4 Hierarchical Algorithms5.5 Partitional Algorithms5.6 Clustering Large Databases5.7 Clustering with Categorical attributes5.8 Comparison思考題:1. Show the dendrogram created by the single, complete, and average link clustering algorithms usi

15、ng the following adjacency matrix.ItemABCDA0145B1026C4303D56302. A major problem with the single link algorithm is that clusters consisting of long chains may be created. Describe and illustrate this concept.3. Trace the use of the nearest neighbor algorithm on the data of Exercise 1 assuming a thre

16、shold of 3.4. Perform a survey of recently proposed clustering algorithms. Identify where they fit in the classification tree in Figure5.2. Try to describe their approach and performance.第六章關(guān)聯(lián)規(guī)則(Association Rules)課時分配:8課時教學(xué)要求:通過本章的教學(xué),使學(xué)生了解大項(xiàng)目集法、基本算法(Apriori算法、抽樣算法、劃分)、并行和分布式算法、方法比較、增量規(guī)則、高級關(guān)聯(lián)規(guī)則技術(shù)相關(guān)規(guī)則

17、以及如何度量規(guī)則的質(zhì)量,并結(jié)合實(shí)際案例進(jìn)行應(yīng)用分析。教學(xué)內(nèi)容:6.1 Introduction6.2 Large Item sets6.3 Basic Algorithms6.4 Parallel and Distributed Algorithms6.5 Comparing Approaches6.6 Incremental Rules6.7 Advanced Association Rule Techniques6.8 Measuring the Quality of Rules思考題:1. Trace the results of using the Apriori algorithm

18、on the grocery store example with s=20% and a=40%. Be sure to show the candidate an large itemsets for each database scan. Also indicate the association rules that will be generated. 2. Trace the results of using the sampling algorithm on the clothing store example with s=20% and a=40%. Be sure to s

19、how the use of negative border function as well as the candidate and large itemsets for each database scan.3. Calculate the lift and conviction for the rules shown in Table 6.3, Compare these to the shown support and confidence.4. Perform a survey of recent research examining techniques to generate

20、rules incrementally.第七章Web 挖掘(Web Mining)課時分配:6課時教學(xué)要求:通過本章的教學(xué),使學(xué)生了解 Web內(nèi)容挖掘(爬蟲、Harvest系統(tǒng)、虛擬Web視圖)、Web結(jié)構(gòu)挖掘( PageRank、Clever)、Web使用挖掘(預(yù)處理、數(shù)據(jù)結(jié)構(gòu)、模式發(fā)現(xiàn)、模式分析)等高級數(shù)據(jù)挖掘技術(shù)和方法。教學(xué)內(nèi)容:7.1 Introduction7.2 Web Content Mining7.3 Web Structure Mining7.4 Web Usage Mining思考題:1. Construct the trie for the string .2. The

21、use of a Web server through a proxy (such as an ISP) complicates the collection of frequent sequence statistics. Suppose that two users use one proxy and have the following sessions:User 1:User2:When these are viewed together by the Web server(taking into account the time stamps), one large session

22、is generated:Identify the maximal frequent sequences assuming a minimum support of 2. What are the maximal frequent sequences if the two users could be separated?3. Perform a literature survey concerning current research into solutions to the proxy problem identified in Exercise 6.第八章空間數(shù)據(jù)挖掘(Spatial

23、Mining)課時分配:6課時教學(xué)要求:通過本章的教學(xué),使學(xué)生了解空間數(shù)據(jù)相關(guān)基本概念(空間查詢、空間數(shù)據(jù)結(jié)構(gòu)、主題地圖和圖像數(shù)據(jù)庫)、空間數(shù)據(jù)挖掘原語、一般化和特殊化(漸進(jìn)求精、一般化、最近鄰、STING)、空間規(guī)則(空間關(guān)聯(lián)規(guī)則、空間分類算法、對ID3的擴(kuò)展、空間決策樹)、空間聚類算法(對CLARANS的擴(kuò)展、SD(CLARANS)、DBCLASD、BANG、WaveCluster以及近似)。教學(xué)內(nèi)容:8.1 Introduction8.2 Spatial Data Overview8.3 Spatial Data Mining Primitives8.4 Generalization a

24、nd Specialization8.5 Spatial Rules8.6 Spatial Classification Algorithm8.7 Spatial Clustering Algorithms思考題:1. Compare the R-tree to the R*-tree.2. Another commonly used spatial index is the grid file. Define a grid file. Compare it to a k-D tree and a quad tree. Show the grid file that would be used

25、 to index the data found in Figure8.5.第九章時序數(shù)據(jù)挖掘(Temporal Mining)課時分配:6課時教學(xué)要求:通過本章的教學(xué),使學(xué)生了解時序事件建模、時間序列(時間序列分析、 趨勢分析、變換、相似性、預(yù)測)、模式檢測、時序序列(AprioriAll、SPADE、特征抽?。?、時序關(guān)聯(lián)規(guī)則(事務(wù)間關(guān)聯(lián)規(guī)則、情節(jié)規(guī)則、趨勢依賴、序列關(guān)聯(lián)規(guī)則、日歷關(guān)聯(lián)規(guī)則)等方法,重點(diǎn)結(jié)合管理案例講解數(shù)據(jù)分析方法。教學(xué)內(nèi)容:9.1 Introduction9.2 Modeling Temporal Events9.3 Time Series9.4 Pattern Detcdtion9.5 Sequences9.6 Temporal Association Rules思考題:1. Assume that you are given the following temperature values, Zt, taken at 5-minute time intervals: 50, 52, 55, 58, 60, 57, 66, 62, 60. Plot both Zt+2 and Zt. Does there appear to be an autocorrela

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論