版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)
文檔簡介
1、數(shù)據(jù)倉庫與數(shù)據(jù)挖掘綜述概念、體系結(jié)構(gòu)、趨勢、應(yīng)用報告人:朱建秋 2001年6月7日提綱數(shù)據(jù)倉庫庫概念數(shù)據(jù)倉庫庫體系結(jié)結(jié)構(gòu)及組組件數(shù)據(jù)倉庫庫設(shè)計數(shù)據(jù)倉庫庫技術(shù)(與數(shù)據(jù)據(jù)庫技術(shù)術(shù)的區(qū)別別)數(shù)據(jù)倉庫庫性能數(shù)據(jù)倉庫庫應(yīng)用數(shù)據(jù)挖掘掘應(yīng)用概概述數(shù)據(jù)挖掘掘技術(shù)與與趨勢數(shù)據(jù)挖掘掘應(yīng)用平平臺(科科委申請請項目)數(shù)據(jù)倉庫庫概念基本概念念對數(shù)據(jù)倉倉庫的一一些誤解解基本概念念數(shù)據(jù)倉庫庫Data warehouseisa subjectoriented, integrated,non-volatile andtime variantcollection of datainsupport of managementsde
2、cisionInmon,1996.Data warehouseisa setofmethods,techniques,andtoolsthat maybeleveragedtogethertoproducea vehiclethatdeliversdata to end-usersonanintegrated platform Ladley,1997.Data warehouseisa processofcrating,maintaining,andusingadecision-support infrastructure Appleton,1995Haley,1997Gardner1998.
3、基本概念念數(shù)據(jù)倉庫庫特征Inmon,1996面向主題題一個主題題領(lǐng)域的的表來源源于多個個操作型型應(yīng)用(如:客客戶主題題,來源源于:定定單處理理;應(yīng)收收帳目;應(yīng)付帳帳目;)典型的主主題領(lǐng)域域:客戶戶;產(chǎn)品品;交易易;帳目目主題領(lǐng)域域以一組組相關(guān)的的表來具具體實現(xiàn)現(xiàn)相關(guān)的表表通過公公共的鍵鍵碼聯(lián)系系起來(如:顧顧客標(biāo)識識號CustomerID)每個鍵碼碼都有時時間元素素(從日日期到日日期;每每月累積積;單獨獨日期)主題內(nèi)數(shù)數(shù)據(jù)可以以存儲在在不同介介質(zhì)上(綜合級級,細(xì)節(jié)節(jié)級,多多粒度)集成數(shù)據(jù)提取取、凈化化、轉(zhuǎn)換換、裝載載穩(wěn)定性批處理增增加,倉倉庫已經(jīng)經(jīng)存在的的數(shù)據(jù)不不會改變變隨時間而而變化(時間維
4、維)管理決策策支持基本概念念Data Mart, ODSData Mart數(shù)據(jù)集市市 -小型的,面向部部門或工工作組級級數(shù)據(jù)倉倉庫。OperationDataStore操作數(shù)據(jù)據(jù)存儲ODS是能支持持企業(yè)日日常的全全局應(yīng)用用的數(shù)據(jù)據(jù)集合,是不同同于DB的一種新新的數(shù)據(jù)據(jù)環(huán)境, 是DW擴展后得得到的一一個混合合形式。四個基基本特點點:面向向主題的的(Subject -Oriented)、集成的、可變的的、當(dāng)當(dāng)前或接接近當(dāng)前前的。基本概念念ETL,元數(shù)據(jù),粒度,分割ETLETL(Extract/Transformation/Load)數(shù)據(jù)裝載載、轉(zhuǎn)換換、抽取取工具。MicrosoftDTS;IBMV
5、isual Warehouseetc.元數(shù)據(jù)關(guān)于數(shù)據(jù)據(jù)的數(shù)據(jù)據(jù),用于構(gòu)造造、維持持、管理理、和使使用數(shù)據(jù)據(jù)倉庫,在數(shù)據(jù)倉倉庫中尤尤為重要要。粒度數(shù)據(jù)倉庫庫的數(shù)據(jù)據(jù)單位中中保存數(shù)數(shù)據(jù)的細(xì)細(xì)化或綜綜合程度度的級別別。細(xì)化化程度越越高,粒粒度越小小。分割數(shù)據(jù)分散散到各自自的物理理單元中中去,它它們能獨獨立地處處理。對數(shù)據(jù)倉倉庫的一一些誤解解數(shù)據(jù)倉庫庫與OLAP星型數(shù)據(jù)據(jù)模型多維分析析數(shù)據(jù)倉庫庫不是一一個虛擬擬的概念念數(shù)據(jù)倉庫庫與范式式理論需要非范范式化處處理提綱數(shù)據(jù)倉庫庫概念數(shù)據(jù)倉庫庫體系結(jié)結(jié)構(gòu)及組組件數(shù)據(jù)倉庫庫設(shè)計數(shù)據(jù)倉庫庫技術(shù)(與數(shù)據(jù)據(jù)庫技術(shù)術(shù)的區(qū)別別)數(shù)據(jù)倉庫庫性能數(shù)據(jù)倉庫庫應(yīng)用數(shù)據(jù)挖掘掘應(yīng)用
6、概概述數(shù)據(jù)挖掘掘技術(shù)與與趨勢數(shù)據(jù)挖掘掘應(yīng)用平平臺(科科委申請請項目)數(shù)據(jù)倉庫庫體系結(jié)結(jié)構(gòu)及組組件體系結(jié)構(gòu)構(gòu)ETL工具元數(shù)據(jù)庫庫(Repository)及元數(shù)據(jù)據(jù)管理數(shù)據(jù)訪問問和分析析工具體系結(jié)構(gòu)構(gòu)Pieter,1998SourceDatabasesData Extraction,Transformation, loadWarehouseAdmin.ToolsExtract, Transformand LoadDataModelingToolCentralMetadataArchitectedData MartsData AccessandAnalysisEnd-UserDWToolsCentr
7、al DataWarehouseCentralDataWarehouseMid-TierMid-TierDataMartDataMartLocalMetadataLocal MetadataLocal MetadataMetadataExchangeMDBDataCleansingToolRelationalAppl.PackageLegacyExternalRDBMSRDBMS帶ODS的體系結(jié)結(jié)構(gòu)SourceDatabasesHub - Data Extraction,Transformation, loadWarehouseAdmin.ToolsExtract, Transformand
8、LoadDataModelingToolCentralMetadataArchitectedData MartsData AccessandAnalysisCentral DataWare-houseandODSCentralDataWarehouseMid-TierRDBMSDataMartMid-TierRDBMSDataMartLocalMetadataLocal MetadataLocal MetadataMetadataExchangeODSOLTPToolsDataCleansingToolRelationalAppl. PackageLegacyExternalMDBEnd-Us
9、erDWTools現(xiàn)實環(huán)境境異質(zhì)質(zhì)性Douglas Hackney,2001CustomMarketingDataWarehousePackagedOracleFinancialDataWarehousePackagedI2SupplyChainNon-ArchitectedData MartSubsetData MartsOracleFinancialsi2SupplyChainSiebelCRM3rdPartye-Commerce聯(lián)合型數(shù)數(shù)據(jù)倉庫庫/數(shù)據(jù)據(jù)集市體體系結(jié)構(gòu)構(gòu)Real TimeODSFederatedFinancialDataWarehouseSubsetData MartsC
10、ommonStagingAreaOracleFinancialsi2SupplyChainSiebelCRM3rdPartyFederatedPackagedI2SupplyChainData MartsAnalyticalApplicationse-CommerceReal TimeData MiningandAnalyticsReal TimeSegmentation,Classification,Qualification,Offerings, etc.FederatedMarketingDataWarehouseETL tools & DW templatesData profilin
11、g & reengineering toolsDemand-driven data acquisition & analysisMetadata InterchangeFederated data warehouse and data mart systemsDecision engine models, rules and metricsOLAP & data mining tools, Analysis templatesAnalytic application development tools & componentsAnalytic applicationsFront-and bac
12、k-office OLTPe-Business systemsExternalinformationprovidersCRMAnalytics &ReportingSupplyChainAnalytics &ReportingEKP-Enterprise KnowledgeManagement PortalEPMAnalytics &ReportingBusinessinformation& recommendationsInformeddecisions&actionsFinancialAnalytics &ReportingHRAnalytics&Reporting閉環(huán)的聯(lián)聯(lián)合型BI體系結(jié)
13、構(gòu)構(gòu)數(shù)據(jù)倉庫庫的焦點點問題-數(shù)據(jù)的獲獲得、存存儲和使使用RelationalPackageLegacyExternalsourceDataCleanToolDataStagingEnterpriseDataWarehouseDatamartDatamartRDBMSROLAPRDBMSEnd-UserToolEnd-UserToolMDBEnd-UserToolEnd-UserTool數(shù)據(jù)倉庫庫和集市市的加載載能力至至關(guān)重要要數(shù)據(jù)倉庫庫和集市市的查詢詢輸出能能力至關(guān)關(guān)重要ETL工具去掉操作作型數(shù)據(jù)據(jù)庫中的的不需要要的數(shù)據(jù)據(jù)統(tǒng)一轉(zhuǎn)換換數(shù)據(jù)的的名稱和和定義計算匯總總數(shù)據(jù)和和派生數(shù)數(shù)據(jù)估計遺失失數(shù)據(jù)
14、的的缺省值值調(diào)節(jié)源數(shù)數(shù)據(jù)的定定義變化化ETL工具體系系結(jié)構(gòu)元數(shù)據(jù)庫庫及元數(shù)數(shù)據(jù)管理理元數(shù)據(jù)分分類:技技術(shù)元數(shù)數(shù)據(jù);商商業(yè)元數(shù)數(shù)據(jù);數(shù)數(shù)據(jù)倉庫庫操作型型信息。-Alex Bersonetc, 1999技術(shù)元數(shù)數(shù)據(jù)包括為數(shù)數(shù)據(jù)倉庫庫設(shè)計人人員和管管理員使使用的數(shù)數(shù)據(jù)倉庫庫數(shù)據(jù)信信息,用用于執(zhí)行行數(shù)據(jù)倉倉庫開發(fā)發(fā)和管理理任務(wù)。包括:數(shù)據(jù)源信信息轉(zhuǎn)換描述述(從操操作數(shù)據(jù)據(jù)庫到數(shù)數(shù)據(jù)倉庫庫的映射射方法,以及轉(zhuǎn)轉(zhuǎn)換數(shù)據(jù)據(jù)的算法法)目標(biāo)數(shù)據(jù)據(jù)的倉庫庫對象和和數(shù)據(jù)結(jié)結(jié)構(gòu)定義義數(shù)據(jù)清洗洗和數(shù)據(jù)據(jù)增加的的規(guī)則數(shù)據(jù)映射射操作訪問權(quán)限限,備份份歷史,存檔歷歷史,信信息傳輸輸歷史,數(shù)據(jù)獲獲取歷史史,數(shù)據(jù)據(jù)訪問,等等元數(shù)
15、據(jù)庫庫及元數(shù)數(shù)據(jù)管理理商業(yè)元數(shù)數(shù)據(jù)給用戶易易于理解解的信息息,包括括:主題區(qū)和和信息對對象類型型,包括括查詢、報表、圖像、音頻、視頻等等Internet主頁支持?jǐn)?shù)據(jù)據(jù)倉庫的的其它信信息,例例如對于于信息傳傳輸系統(tǒng)統(tǒng)包括預(yù)預(yù)約信息息、調(diào)度度信息、傳送目目標(biāo)的詳詳細(xì)描述述、商業(yè)業(yè)查詢對對象,等等數(shù)據(jù)倉庫庫操作型型信息例如,數(shù)數(shù)據(jù)歷史史(快照照,版本本),擁擁有權(quán),抽取的的審計軌軌跡,數(shù)數(shù)據(jù)用法法元數(shù)據(jù)庫庫及元數(shù)數(shù)據(jù)管理理元數(shù)據(jù)庫庫(metadatarepository)和工具 MartinStardt,2000數(shù)據(jù)訪問問和分析析工具報表OLAP數(shù)據(jù)挖掘掘提綱數(shù)據(jù)倉庫庫概念數(shù)據(jù)倉庫庫體系結(jié)結(jié)構(gòu)及組組
16、件數(shù)據(jù)倉庫庫設(shè)計數(shù)據(jù)倉庫庫技術(shù)(與數(shù)據(jù)據(jù)庫技術(shù)術(shù)的區(qū)別別)數(shù)據(jù)倉庫庫性能數(shù)據(jù)倉庫庫應(yīng)用數(shù)據(jù)挖掘掘應(yīng)用概概述數(shù)據(jù)挖掘掘技術(shù)與與趨勢數(shù)據(jù)挖掘掘應(yīng)用平平臺(科科委申請請項目)數(shù)據(jù)倉庫庫設(shè)計自上而下下(Top-Down)自底而上上(BottomUp)混合的方方法數(shù)據(jù)倉庫庫建模Top-downApproachBuildEnterprise datawarehouseCommoncentraldata modelData re-engineering performedonceMinimizeredundancy andinconsistencyDetailedandhistorydata;globald
17、atadiscoveryBuilddatamartsfromtheEnterpriseData Warehouse(EDW)SubsetofEDW relevant to departmentMostlysummarizeddataDirectdependencyonEDWdataavailabilityLocalData MartExternalDataLocalData MartOperationalDataEnterprise Warehouse自底而上上設(shè)計方方法創(chuàng)建部門門的數(shù)據(jù)據(jù)集市范圍局限限于一個個主題區(qū)區(qū)域快速的ROI-局部的商商業(yè)需求求得到滿滿足本部門自自治-設(shè)設(shè)計上具具有靈活
18、活性對其他部部門數(shù)據(jù)據(jù)集市是是一個好好的指導(dǎo)導(dǎo)容易復(fù)制制到其他他部門需要為每每個部門門做數(shù)據(jù)據(jù)重建有一定級級別的冗冗余和不不一致性性一個切實實可行的的方法擴大到企企業(yè)數(shù)據(jù)據(jù)倉庫創(chuàng)建EDB作為一個個長期的的目標(biāo)局部數(shù)據(jù)據(jù)集市外部數(shù)據(jù)操作型數(shù)據(jù) (全部)操作型數(shù)據(jù)(局部)操作型數(shù)數(shù)據(jù)(局部)局部數(shù)據(jù)據(jù)集市企業(yè)數(shù)據(jù)據(jù)倉庫EDB數(shù)據(jù)倉庫庫建模星星型模式式Example of StarSchemaDateMonthYearDateCustIdCustNameCustCityCustCountryCustSales Fact Table Date Product Store Customer unit_s
19、ales dollar_sales Yen_salesMeasurementsProductNoProdNameProdDescCategoryQOHProductStoreIDCityStateCountryRegionStore數(shù)據(jù)倉庫庫建模雪雪片模式式 DateMonthDateCustIdCustNameCustCityCustCountryCustSalesFact TableDateProductStoreCustomerunit_salesdollar_salesYen_salesMeasurementsProductNoProdNameProdDescCategoryQOHPr
20、oductMonthYearMonthYearYearCityStateCityCountryRegionCountryStateCountryStateStoreIDCityStoreExample of SnowflakeSchema操作型(OLTP)數(shù)據(jù)源- 銷售售庫星形模式式時間維事實表多維模型型事實度量(Metrics)時間維時間維的屬性提綱數(shù)據(jù)倉庫庫概念數(shù)據(jù)倉庫庫體系結(jié)結(jié)構(gòu)及組組件數(shù)據(jù)倉庫庫設(shè)計數(shù)據(jù)倉庫庫技術(shù)(與數(shù)據(jù)據(jù)庫技術(shù)術(shù)的區(qū)別別)數(shù)據(jù)倉庫庫性能數(shù)據(jù)倉庫庫應(yīng)用數(shù)據(jù)挖掘掘應(yīng)用概概述數(shù)據(jù)挖掘掘技術(shù)與與趨勢數(shù)據(jù)挖掘掘應(yīng)用平平臺(科科委申請請項目)數(shù)據(jù)倉庫庫技術(shù)Inmon,1996管理
21、大量量數(shù)據(jù)能夠管理理大量數(shù)數(shù)據(jù)的能能力能夠管理理好的能能力管理多介介質(zhì)(層層次)主存、擴擴展內(nèi)存存、高速速緩存、DASD、光盤、縮縮微膠片片監(jiān)視數(shù)據(jù)據(jù)決定是否否應(yīng)數(shù)據(jù)據(jù)重組決定索引引是否建建立得不不恰當(dāng)決定是否否有太多多數(shù)據(jù)溢溢出決定剩余余的可用用空間利用多種種技術(shù)獲獲得和傳傳送數(shù)據(jù)據(jù)批模式,聯(lián)機模模式并不不非常有有用程序員/設(shè)計者者對數(shù)據(jù)據(jù)存放位位置的控控制(塊塊/頁)數(shù)據(jù)的并并行存儲儲/管理理元數(shù)據(jù)管管理數(shù)據(jù)倉庫庫技術(shù)Inmon,1996數(shù)據(jù)倉庫庫語言接接口能夠一次次訪問一一組數(shù)據(jù)據(jù)能夠一次次訪問一一條記錄錄支持一個個或多個個索引有SQL接口數(shù)據(jù)的高高效裝入入高效索引引的利用用用位映像像的
22、方法法、多級級索引等等數(shù)據(jù)壓縮縮I/O資源比CPU資源少得得多,因因此數(shù)據(jù)據(jù)解壓縮縮不是主主要問題題復(fù)合鍵碼碼(因為為數(shù)據(jù)隨隨時間變變化)變長數(shù)據(jù)據(jù)加鎖管理理(程序序員能顯顯式控制制鎖管理理程序)單獨索引引處理(查看索索引就能能提供某某些服務(wù)務(wù))快速恢復(fù)復(fù)數(shù)據(jù)倉庫庫技術(shù)Inmon,1996其他技術(shù)術(shù)特征,傳統(tǒng)技技術(shù)起很很小作用用事務(wù)集成成性、高高速緩存存、行/頁級鎖鎖定、參參照完整整性、數(shù)數(shù)據(jù)視圖圖傳統(tǒng)DBMS與數(shù)據(jù)倉倉庫DBMS區(qū)別為數(shù)據(jù)倉倉庫和決決策支持持優(yōu)化設(shè)設(shè)計管理更多多數(shù)據(jù):10GB/100GB/TB傳統(tǒng)DBMS適合記錄錄級更新新,提供供:鎖定定Lock、提交Commit、檢測點Ch
23、eckPoint、日志處理理Log、死鎖處理理DeadLock、回退Roolback.基本數(shù)據(jù)據(jù)管理,如:塊塊管理,傳統(tǒng)DBMS需要預(yù)留留空間索引區(qū)別別:傳統(tǒng)統(tǒng)DBMS限制索引引數(shù)量,數(shù)據(jù)倉倉庫DBMS沒有限制制通用DBMS物理上優(yōu)優(yōu)化便于于事務(wù)訪訪問處理理,而數(shù)數(shù)據(jù)倉庫庫便于DSS訪問分析析改變DBMS技術(shù)多維DBMS和數(shù)據(jù)倉倉庫多維DBMS作為數(shù)據(jù)據(jù)倉庫的的數(shù)據(jù)庫庫技術(shù),這種想想法是不不正確的的多維DBMS(OLAP)是一種技技術(shù),數(shù)數(shù)據(jù)倉庫庫是一種種體系結(jié)結(jié)構(gòu)的基基礎(chǔ)雙重粒度度級別(DASD/磁帶)數(shù)據(jù)倉庫庫技術(shù)Inmon,1996數(shù)據(jù)倉庫庫環(huán)境中中的元數(shù)數(shù)據(jù)DSS分析人員員和IT專業(yè)人
24、員員不同,需要元元數(shù)據(jù)的的幫助操作型環(huán)環(huán)境和數(shù)數(shù)據(jù)倉庫庫環(huán)境之之間的映映射需要要元數(shù)據(jù)據(jù)數(shù)據(jù)倉庫庫包含很很長時間間的數(shù)據(jù)據(jù),必須須有元數(shù)數(shù)據(jù)標(biāo)記記數(shù)據(jù)結(jié)結(jié)構(gòu)/定定義上下文和和內(nèi)容(上下文文維)簡單上下下文信息息(數(shù)據(jù)據(jù)結(jié)構(gòu)/編碼/命名約約定/度度量)復(fù)雜上下下文信息息(產(chǎn)品品定義/市場領(lǐng)領(lǐng)域/定定價/包包裝/組組織結(jié)構(gòu)構(gòu))外部上下下文信息息(經(jīng)濟濟預(yù)測:通貨膨膨脹、金金融、稅稅收/政政治信息息/競爭爭信息/技術(shù)進(jìn)進(jìn)展)刷新數(shù)據(jù)據(jù)倉庫數(shù)據(jù)復(fù)制制(觸發(fā)發(fā)器)變化數(shù)據(jù)據(jù)捕獲(CDC)(日志)提綱數(shù)據(jù)倉庫庫概念數(shù)據(jù)倉庫庫體系結(jié)結(jié)構(gòu)及組組件數(shù)據(jù)倉庫庫設(shè)計數(shù)據(jù)倉庫庫技術(shù)(與數(shù)據(jù)據(jù)庫技術(shù)術(shù)的區(qū)別別)數(shù)據(jù)倉庫
25、庫性能數(shù)據(jù)倉庫庫應(yīng)用數(shù)據(jù)挖掘掘應(yīng)用概概述數(shù)據(jù)挖掘掘技術(shù)與與趨勢數(shù)據(jù)挖掘掘應(yīng)用平平臺(科科委申請請項目)數(shù)據(jù)倉庫庫性能 Inmon,1999使用數(shù)據(jù)平臺服務(wù)管理理王天佑等等譯,數(shù)據(jù)據(jù)倉庫管管理,電電子工業(yè)業(yè)出版社社,2000年年5月提綱數(shù)據(jù)倉庫庫概念數(shù)據(jù)倉庫庫體系結(jié)結(jié)構(gòu)及組組件數(shù)據(jù)倉庫庫設(shè)計數(shù)據(jù)倉庫庫技術(shù)(與數(shù)據(jù)據(jù)庫技術(shù)術(shù)的區(qū)別別)數(shù)據(jù)倉庫庫性能數(shù)據(jù)倉庫庫應(yīng)用數(shù)據(jù)挖掘掘應(yīng)用概概述數(shù)據(jù)挖掘掘技術(shù)與與趨勢數(shù)據(jù)挖掘掘應(yīng)用平平臺(科科委申請請項目)數(shù)據(jù)倉庫庫應(yīng)用DW用戶數(shù)的的調(diào)查“DW系統(tǒng)的用用戶在100-500以內(nèi)內(nèi)或以上上是未來一一段時期期內(nèi)的主要部部分“DW用戶的調(diào)查最近一年年Meta Group
26、 Survey調(diào)查對象象:3000+ 用戶戶或意向向用戶DW數(shù)據(jù)規(guī)模模的調(diào)查查DW規(guī)模的調(diào)調(diào)查最近一年年Meta Group Survey調(diào)查對象象:3000+ 用戶戶或意向向用戶HowMuch?$3-6m formid-sizecompany,less if smaller, moreiflarger$10m+forlargeorganizations,largedatasets10-50+% annualmaintenancecosts33%Hardware/33%Software/33%ServicesHowLong?2-4yearsfor80/20offullsystemfor mid
27、-size company6-12 monthsforinitialiteration3-6months forsubsequent iterationsHowRisky?ForEDW Projects,20%(Meta) to 70%(OTR,DWN) failHigh failureratefornon-business driveninitiativesVery fewsystems meetthe expectationsofthebusinessFailure notduetotechnology,due to “soft”issuesMassive upsidetosuccessf
28、ul projects (100% -2000+% ROI)99%politics- 1% technology參考文獻(xiàn)獻(xiàn)提綱數(shù)據(jù)倉庫庫概念數(shù)據(jù)倉庫庫體系結(jié)結(jié)構(gòu)及組組件數(shù)據(jù)倉庫庫設(shè)計數(shù)據(jù)倉庫庫技術(shù)(與數(shù)據(jù)據(jù)庫技術(shù)術(shù)的區(qū)別別)數(shù)據(jù)倉庫庫性能數(shù)據(jù)倉庫庫應(yīng)用數(shù)據(jù)挖掘掘應(yīng)用概概述數(shù)據(jù)挖掘掘技術(shù)與與趨勢數(shù)據(jù)挖掘掘應(yīng)用平平臺(科科委申請請項目)數(shù)據(jù)挖掘掘應(yīng)用綜綜述數(shù)據(jù)挖掘掘應(yīng)用概概述數(shù)據(jù)挖掘掘技術(shù)與與趨勢數(shù)據(jù)挖掘掘應(yīng)用平平臺數(shù)據(jù)挖掘掘應(yīng)用概概述應(yīng)用比例例Data MiningUpsidesData MiningDownsidesData MiningUseData MiningIndustryandApp
29、licationData MiningCosts應(yīng)用比例例Discoveryofpreviouslyunknown relationships, trends,anomalies,etc.PowerfulcompetitiveweaponAutomation of repetitiveanalysisPredictive capabilitiesData MiningUpsidesKnowledgediscovery technologyimmatureLong learning andtuningcycles forsome technologies“Blackbox”technology
30、minimizesconfidenceVLDB (Very Large DataBase)requirementsData MiningDownsidesData MiningUsesDiscoveranomalies, outliers andexceptions in processdataDiscoverbehaviorandpredictoutcomesofcustomerrelationshipsChurnmanagementTargetmarketing (marketofone)PromotionmanagementFrauddetectionPattern ID &matchi
31、ng(darkprograms, science)Data MiningIndustryandApplicationsFrom research prototypestodataminingproducts, languages,and standardsIBMIntelligentMiner, SASEnterprise Miner,SGIMineSet,Clementine, MS/SQLServer2000,DBMiner,BlueMartini,MineIt,DigiMine, etc.A fewdata mininglanguagesand standards(esp.MSOLEDB
32、forDataMining).ApplicationachievementsinmanydomainsMarketanalysis, trend analysis,frauddetection, outlieranalysis, Webmining, etc.Data MiningCostsDesktop tools:$500 andup(MSFTcomingatlow price point)Server/MFbased:$20,000to$700,000+Must alsoadd costofextensive consultingfor highend toolsDontforgetlo
33、ngtrainingandlearningcurvetimeOngoing process, nottask automationsoftware提綱數(shù)據(jù)倉庫庫概念數(shù)據(jù)倉庫庫體系結(jié)結(jié)構(gòu)及組組件數(shù)據(jù)倉庫庫設(shè)計數(shù)據(jù)倉庫庫技術(shù)(與數(shù)據(jù)據(jù)庫技術(shù)術(shù)的區(qū)別別)數(shù)據(jù)倉庫庫性能數(shù)據(jù)倉庫庫應(yīng)用數(shù)據(jù)挖掘掘應(yīng)用概概述數(shù)據(jù)挖掘掘技術(shù)與與趨勢數(shù)據(jù)挖掘掘應(yīng)用平平臺(科科委申請請項目)數(shù)據(jù)挖掘掘趨勢歷史回顧顧多學(xué)科交交叉數(shù)據(jù)挖掘掘從多個個角度分分類最近十年年的研究究進(jìn)展數(shù)據(jù)挖掘掘的趨勢勢數(shù)據(jù)挖掘掘與標(biāo)準(zhǔn)準(zhǔn)化進(jìn)程程歷史回顧顧1989IJCAIWorkshoponKnowledgeDiscovery in DatabasesK
34、nowledgeDiscovery in Databases(G.Piatetsky-Shapiroand W. Frawley, 1991)1991-1994Workshops on KnowledgeDiscoveryinDatabasesAdvancesinKnowledgeDiscovery andData Mining(U.Fayyad,G.Piatetsky-Shapiro,P.Smyth, andR.Uthurusamy,1996)1995-1998InternationalConferencesonKnowledgeDiscovery in DatabasesandDataMi
35、ning(KDD95-98)Journal of DataMining andKnowledgeDiscovery (1997)1998 ACMSIGKDD, SIGKDD1999-2001conferences, andSIGKDDExplorationsMore conferences on dataminingPAKDD,PKDD,SIAM-Data Mining,(IEEE) ICDM, DaWaK,SPIE-DM,etc.Data Mining:ConfluenceofMultipleDisciplinesData MiningDatabaseTechnologyStatistics
36、OtherDisciplinesInformationScienceMachineLearning(AI)VisualizationA Multi-Dimensional ViewofDataMiningResearchProgressintheLastDecadeMulti-dimensionaldata analysis:Data warehouseandOLAP(on-lineanalytical processing)Association,correlation, andcausalityanalysisClassification:scalabilityand newapproac
37、hesClustering andoutlier analysisSequential patterns andtime-seriesanalysisSimilarity analysis:curves, trends,images,texts,etc.Text mining,Web miningandWeblog analysisSpatial,multimedia,scientific dataanalysisData preprocessingand database compressionData visualizationand visualdata miningMany other
38、s,e.g.,collaborativefilteringResearchDirections HanJ.W.,2001WebminingTowards integrateddataminingenvironments andtools“Vertical” (orapplication-specific)dataminingInvisibledataminingTowards intelligent,efficient, andscalabledata miningmethodsTowards IntegratedDataMiningEnvironments andToolsOLAP Mini
39、ng:IntegrationofDataWarehousingandDataMiningQueryingandMining:AnIntegrated Information Analysis EnvironmentBasicMiningOperationsandMining Query Optimization“Vertical” (orapplication-specific)dataminingInvisibledataminingQueryingandMining:AnIntegrated Information Analysis EnvironmentData miningasa co
40、mponentofDBMS,data warehouse,orWeb information systemIntegrated information processingenvironmentMS/SQLServer-2000(Analysis service)IBMIntelligentMineronDB2SASEnterpriseMiner:data warehousing +miningQuery-basedminingQueryingdatabase/DW/WebknowledgeEfficiency andflexibility:preprocessing,on-lineproce
41、ssing,optimization, integration,etc.“Vertical”DataMiningGeneric datamining tools?Too simpletomatchdomain-specific, sophisticatedapplicationsExpertknowledge andbusinesslogicrepresentmanyyearsofwork in their ownfields!Data mining+ business logic +domain expertsA multi-dimensional viewofdataminersCompl
42、exity of data: Web,sequence, spatial, multimedia, Complexity of domains: DNA,astronomy,market, telecom, Domain-specificdataminingtoolsProvide concrete,killersolutiontospecificproblemsFeedbacktobuildmore powerful toolsInvisibleDataMiningBuildminingfunctions intodailyinformationservicesWebsearch engin
43、e(linkanalysis,authoritativepages,userprofiles)adaptiveweb sites,etc.Improvementofqueryprocessing:history +dataMakingservicesmartandefficientBenefitsfrom/to datamining researchData miningresearchhasproducedmanyscalable,efficient,novelminingsolutionsApplicationsfeednewchallenge problems to researchTo
44、wards Intelligent Tools forData MiningIntegrationpavestheway to intelligent miningSmartinterfacebrings intelligenceEasy to use,understandandmanipulateOnepicturemayworth1,000wordsVisualand audio dataminingHuman-CenteredData MiningTowards self-tuning,self-managing,self-triggeringdataminingIntegrated M
45、ining:ABooster forIntelligentMiningIntegrationpavestheway to intelligent miningData miningintegrates withDBMS,DW, WebDB,etcIntegrationinheritsthepowerofup-to-dateinformationtechnology:querying,MDanalysis, similaritysearch,etc.Miningcan be viewedasqueryingdatabaseknowledgeIntegrationleadstostandardinterface/language, function/processstandardization,utility,andreacha
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 二零二五年度農(nóng)業(yè)技術(shù)合作免責(zé)責(zé)任書4篇
- 通信協(xié)議基礎(chǔ)課程設(shè)計
- 年度掘進(jìn)機市場分析及競爭策略分析報告
- 2024裝飾工程監(jiān)工質(zhì)量保障合同模板版
- 二零二五版電子商務(wù)平臺合作協(xié)議補充協(xié)議3篇
- 2025年度高品質(zhì)社區(qū)門窗安裝與物業(yè)綜合服務(wù)協(xié)議3篇
- 2025年度綜合能源服務(wù)項目承包工程合同范本4篇
- 2024投資融資咨詢服務(wù)合同范本兩
- 扶壁碼頭胸墻施工方案
- 汀步的施工方案
- 2025年病案編碼員資格證試題庫(含答案)
- 企業(yè)財務(wù)三年戰(zhàn)略規(guī)劃
- 提高膿毒性休克患者1h集束化措施落實率
- 山東省濟南市天橋區(qū)2024-2025學(xué)年八年級數(shù)學(xué)上學(xué)期期中考試試題
- 主播mcn合同模板
- 新疆2024年中考數(shù)學(xué)試卷(含答案)
- 2024測繪個人年終工作總結(jié)
- DB11 637-2015 房屋結(jié)構(gòu)綜合安全性鑒定標(biāo)準(zhǔn)
- 制造業(yè)生產(chǎn)流程作業(yè)指導(dǎo)書
- DB34∕T 4444-2023 企業(yè)信息化系統(tǒng)上云評估服務(wù)規(guī)范
- 福建中閩能源股份有限公司招聘筆試題庫2024
評論
0/150
提交評論