人工智能與數(shù)據(jù)挖掘教學(xué)課件2.datawarehouse_第1頁(yè)
人工智能與數(shù)據(jù)挖掘教學(xué)課件2.datawarehouse_第2頁(yè)
人工智能與數(shù)據(jù)挖掘教學(xué)課件2.datawarehouse_第3頁(yè)
人工智能與數(shù)據(jù)挖掘教學(xué)課件2.datawarehouse_第4頁(yè)
人工智能與數(shù)據(jù)挖掘教學(xué)課件2.datawarehouse_第5頁(yè)
已閱讀5頁(yè),還剩69頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、data warehousewhy data warehousenthe most common issue companies face when looking at data mining is that the information is not in one place.nthe biggest challenge business analysts face in using data mining is how to extract, integrate, cleanse, and prepare data to solve their most pressing busine

2、ss problems.what is data warehousenthe idea of a data warehouse is to put a wide range of operational data from internal and external sources into one place so it can be better utilized by executives, line of business managers and other business analysts.nonce the information is gathered, olap (on-l

3、ine analytical processing ) software comes into play by providing the desktop analysis tools for querying, manipulating and reporting the data from the data warehouse. data warehouse environment nthe source systems from which data is extracted nthe tools used to extract data for loading the data war

4、ehouse nthe data warehouse database itself where the data is stored nthe desktop query and reporting tools used for decision support data warehousing process overview operational vs. multidimensional view of salescreating a data warehousethe data warehousenthe data warehouse is an integrated, subjec

5、t-oriented, time-variant, non-volatile database that provides support for decision making.the data warehousenintegratednthe data warehouse is a centralized, consolidated database that integrates data retrieved from the entire organization.nsubject-oriented nthe data warehouse data is arranged and op

6、timized to provide answers to questions coming from diverse functional areas within a company.the data warehousentime variant nthe warehouse data represent the flow of data through time. it can even contain projected data.nnon-volatile nonce data enter the data warehouse, they are never removed.nthe

7、 data warehouse is always growing.operational database vs. data warehouse operational dbnsimilar data can have different representations or meaningsnfunctional or process orientationncurrent transactionnfrequent updating data warehousenunified view of all data elementsnsubject orientation for decisi

8、on supportnhistorical information with time dimensionndata are added without changedata martna data mart is a small, single-subject data warehouse subset that provides decision support to a small group of people.data martndata marts can serve as a test vehicle for companies exploring the potential b

9、enefits of data warehouses.ndata marts address local or departmental problems, while a data warehouse involves a company-wide effort to support decision making at all levels in the organization.enterprise data warehouse (edw)na large scare data warehouse that is used across the enterprise for decisi

10、on supportnedw are used to provide data for many types of dss, including crm, scm, bpm, bam, plm, and kms.nbpm: business performance managementnbam: business activity monitoringnplm: product lifecycle managementnkms: knowledge management systemsmetadatanmetadata is the data about data. nin a data wa

11、rehouse, metadata describe the contents of a data warehouse and the manner of its usengood metadata is essential to the effective operation of a data warehouse and it is used in data acquisition/collection, data transformation, and data access. the needs for technical metadatanthe use of data wareho

12、using and decision processing often involves a wide range of different products, and creating and maintaining the meta data for these products is time- consuming and error prone.nautomating the meta data management process and enabling the sharing of this so-called technical meta data between produc

13、ts can reduce both costs and errors.the needs for business metadatanbusiness users need to have a good understanding of what information exists in a data warehouse. they need to understand what the information means from a business viewpoint, how it was derived, from what source systems it comes, wh

14、en it was created, what pre-built reports and analyses exist for manipulating the information, and so forth. metadata in a data warehousenkimball lists the following types of metadata in a data warehouse:nsource system metadatandata staging metadatandbms metadatanralph kimball, the data warehouse li

15、fecycle toolkit, wiley, 1998, isbn 0-471-25547-5source system metadata nsource specifications, such as repositories, and source logical schemas nsource descriptive information, such as ownership descriptions, update frequencies and access methods nprocess information, such as job schedules and extra

16、ction code data staging metadata ndata acquisition information, such as data transmission scheduling and results, and file usage ndimension table management, such as definitions of dimensions, and surrogate key assignments ntransformation and aggregation, such as data enhancement and mapping, dbms l

17、oad scripts, and aggregate definitions naudit, job logs and documentation, such as data lineage records, data transform logs star schemanthe star schema is a data modeling technique used to map multidimensional decision support into a relational database.nstar schemas yield an easily implemented mod

18、el for multidimensional data analysis while still preserving the relational structure of the operational database.star schemanfour components:nfactsndimensionsnattributesnattribute hierarchiesfigure 13.14 a three-dimensional view of sales figure 13.17 attribute hierarchies in multidimensional analys

19、is factsnnumeric measurements that represent specific business aspect or activitynnormally stored in fact table that is center of star schemanfact table contains facts linked through their dimensionsnmetrics are facts computed at run timedimensionsnqualifying characteristics provide additional persp

20、ectives to a given factndecision support data almost always viewed in relation to other datanstudy facts via dimensionsndimensions stored in dimension tablesattributesndimensions provide descriptions of facts through their attributesnno mathematical limit to the number of dimensionsnuse to search, f

21、ilter, and classify factsnslice and dice: focus on slices of the data cub for more detailed analysisattribute hierarchiesnprovide top-down data organizationntwo purpose: naggregationndrill-down/roll-up data analysisndetermine how the data are extracted and representednstored in a dbmss data dictiona

22、rynused by olap tool to access warehouse properly.star schemana star schema consists of fact tables and dimension tables. nfact tables contain the quantitative or factual data about a business-the information being queried. this information is often numerical, additive measurements and can consist o

23、f many columns and millions or billions of rows.ndimension tables are usually smaller and hold descriptive data that reflects the dimensions, or attributes, of a business.figure 13.17 star schema for salesstar schema representationnfacts and dimensions are normally represented by physical tables in

24、the data warehouse database.nthe fact table is related to each dimension table in a many-to-one (m:1) relationship.nfact and dimension tables are related by foreign keys and are subject to the primary/foreign key constraints.figure 13.18 orders star schemastar schemanperformance-improving techniques

25、nnormalization of dimensional tablesnmultiple fact tables representing different aggregation levelsndenormalization of fact tablesntable partitioning and replicationfigure 13.19 normalized dimension tablesmultiple fact tablespracticenhow to design a star schema for an auto insurance company to do ri

26、sk analysis?nwhat is the objective?nwhat are the facts?nwhat are the dimensions?nwhat are the attributes?nwhat are the attribute hierarchy?auto insurance dw star schemadata warehouse design ngrain a definition of the highest level of detail that is supported in a data warehouse ndrill-downthe proces

27、s of probing beyond a summarized value to investigate each of the detail transactions that comprise the summary data warehouse implementationnthe data warehouse as an active decision support networkna company-wide effort that requires user involvement and commitment at all levelsnsatisfy the trilogy

28、: data, analysis, and usersnapply database design proceduresdata warehouse implementation nimplementing a data warehouse is generally a massive effort that must be planned and executed according to established methodsnthere are many facets to the project lifecycle, and no single person can be an exp

29、ert in each area data warehouse implementation road mapdata integration and the extraction, transformation, and load (etl) processndata integration comprises three major processes: ndata access (the ability to access and extract data from any data source)ndata federation (the integration of business

30、 views across multiple data stores), and nchange capture (the identification, capture , and delivery of the changes made to enterprise data sources). data integration and the extraction, transformation, and load (etl) processnextraction, transformation, and load (etl)nextraction - reading data from

31、a databasentransformation - converting the extracted data from its previous form into the form that can be placed into a data warehouse nload - putting the data into the data warehousedata integration and the extraction, transformation, and load (etl) processdata cleansendata cleansing or data scrub

32、bing is the act of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database. nused mainly in databases, the term refers to identifying incomplete, incorrect, inaccurate, irrelevant etc. parts of the data and then replacing, modifying or deleting this

33、 dirty data.etl toolsna good etl tool must be able to communicate with the many different relational databases and read the various file formats used throughout an organization. netl tools have started to migrate into enterprise application integration, or even enterprise service bus, systems that n

34、ow cover much more than just the extraction, transformation and loading of data. many etl vendors now have data profiling, data quality and metadata capabilities.on-line analytical processingnon-line analytical processing (olap) is an advanced data analysis environment that supports decision making,

35、 business modeling, and operations research activities. nfour main characteristics of olapnuse multidimensional data analysis techniques.nprovide advanced database support.nprovide easy-to-use end user interfaces.nsupport client/server architecture.on-line analytical processingnadditional functions

36、of multidimensional data analysis techniquesnadvanced data presentation functionsnadvanced data aggregation, consolidation, and classification functionsnadvanced computational functionsnadvanced data modeling functionsintegration of olap with a spreadsheet programfigure 13.7 olap server arrangements

37、aps business information warehouse:an enterprise-wide information hubnan end-to-end enterprise-wide information hub to support planning and decision-making.na central data repository of sap, non-sap, current, and historical business transactions and meta data.ntimely information to all levels and ro

38、les, from analyst to executive.nyears of sap financial, logistic, and human resource information systems experience wedded with modern data warehouse methodologies. sap ag 1999 / 2bw architecture detailsr/3 oltp applicationsr/3 oltp applicationsoltpreportingoltpreportingproduction dataextractorprodu

39、ction dataextractorbusiness informationwarehouse serverstagingstagingbapibapibusiness exploreranalyzer(hosted by ms excel)analyzer(hosted by ms excel)browserbrowsernon r/3 production dataextractornon r/3 production dataextractornon r/3 oltp applicationsnon r/3 oltp applications3rd party olap client3

40、rd party olap clientdata managerdata manageroperationaldata store3rd party olap client3rd party olap client3rd party olap clients3rd party olap clientsmeta data managermeta data managerstaging enginestaging engineadministratorworkbenchadministrationadministrationschedulingschedulingmonitormonitorola

41、p processorolap processormeta datarepositorymeta datarepositoryinfocubesole-db for olap providerole-db for olap providerodboodbobapibapidata providerserverdata providerserverremotecuberemotecubebapibapistagingstagingbapibapipsaa sample of current data warehousing and data mining vendorstable 13.10su

42、ccess stories at pepsinusing the data warehouse, weve been able to identify important items, find national suppliers for them, and leverage those relationships to reduce costs.“nthanks to the warehouse, pepsi can monitor purchasing compliance at the user level, an ability that has boosted price and

43、product compliance well over 90 percent.nthe warehouse also helps ensure 100 percent sales tax compliance, says bridgman. nsince going online in 1995, the warehouse has helped generate procurement savings in excess of $100 million.levels of dw support for enterprise decision makingthe need for real-

44、time datana business often cannot afford to wait a whole day for its operational data to load into the data warehouse for analysisnprovides incremental real-time data showing every state change and almost analogous patterns over timenmaintaining metadata in sync is possiblenless costly to develop, m

45、aintain, and secure one huge data warehouse so that data are centralized for bi/ba toolsnan eai with real-time data collection can reduce or eliminate the nightly batch processes real-time / active data warehouse (rdw/adw)nloading and and providing data via the data warehouse as they become available.nexpand traditional data warehouse functions into the realm of tactical decision makingnempower decision making when interact directly with custo

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論