人工智能與數(shù)據(jù)挖掘教學課件2datawarehouse_第1頁
人工智能與數(shù)據(jù)挖掘教學課件2datawarehouse_第2頁
人工智能與數(shù)據(jù)挖掘教學課件2datawarehouse_第3頁
人工智能與數(shù)據(jù)挖掘教學課件2datawarehouse_第4頁
人工智能與數(shù)據(jù)挖掘教學課件2datawarehouse_第5頁
已閱讀5頁,還剩69頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

1、Data Warehouse,Why Data warehouse,The most common issue companies face when looking at data mining is that the information is not in one place. The biggest challenge business analysts face in using data mining is how to extract, integrate, cleanse, and prepare data to solve their most pressing busin

2、ess problems.,What is Data Warehouse,The idea of a data warehouse is to put a wide range of operational data from internal and external sources into one place so it can be better utilized by executives, line of business managers and other business analysts. Once the information is gathered, OLAP (on

3、-line analytical processing ) software comes into play by providing the desktop analysis tools for querying, manipulating and reporting the data from the data warehouse.,Data Warehouse environment,the source systems from which data is extracted the tools used to extract data for loading the data war

4、ehouse the data warehouse database itself where the data is stored the desktop query and reporting tools used for decision support,Data Warehousing Process Overview,Operational Vs. Multidimensional View Of Sales,Creating A Data Warehouse,The Data Warehouse,The Data Warehouse is an integrated, subjec

5、t-oriented, time-variant, non-volatile database that provides support for decision making.,The Data Warehouse,Integrated The Data Warehouse is a centralized, consolidated database that integrates data retrieved from the entire organization. Subject-Oriented The Data Warehouse data is arranged and op

6、timized to provide answers to questions coming from diverse functional areas within a company.,The Data Warehouse,Time Variant The Warehouse data represent the flow of data through time. It can even contain projected data. Non-Volatile Once data enter the Data Warehouse, they are never removed. The

7、Data Warehouse is always growing.,Operational Database vs. Data warehouse,Operational DB Similar data can have different representations or meanings Functional or process orientation Current transaction Frequent updating,Data Warehouse Unified view of all data elements Subject orientation for decisi

8、on support Historical information with time dimension Data are added without change,Data Mart,A data mart is a small, single-subject data warehouse subset that provides decision support to a small group of people.,Data Mart,Data Marts can serve as a test vehicle for companies exploring the potential

9、 benefits of Data Warehouses. Data Marts address local or departmental problems, while a Data Warehouse involves a company-wide effort to support decision making at all levels in the organization.,Enterprise Data Warehouse (EDW),A large scare data warehouse that is used across the enterprise for dec

10、ision support EDW are used to provide data for many types of DSS, including CRM, SCM, BPM, BAM, PLM, and KMS. BPM: Business performance management BAM: Business activity monitoring PLM: product lifecycle management KMS: Knowledge management systems,Metadata,Metadata is the data about data. In a data

11、 warehouse, metadata describe the contents of a data warehouse and the manner of its use Good metadata is essential to the effective operation of a data warehouse and it is used in data acquisition/collection, data transformation, and data access.,The needs for Technical metadata,The use of data war

12、ehousing and decision processing often involves a wide range of different products, and creating and maintaining the meta data for these products is time- consuming and error prone. Automating the meta data management process and enabling the sharing of this so-called technical meta data between pro

13、ducts can reduce both costs and errors.,The Needs for Business metadata,Business users need to have a good understanding of what information exists in a data warehouse. They need to understand what the information means from a business viewpoint, how it was derived, from what source systems it comes

14、, when it was created, what pre-built reports and analyses exist for manipulating the information, and so forth.,metadata in a data warehouse,Kimball lists the following types of metadata in a data warehouse: Source system metadata Data staging metadata DBMS metadata Ralph Kimball, The Data Warehous

15、e Lifecycle Toolkit, Wiley, 1998, ISBN 0-471-25547-5,source system metadata,source specifications, such as repositories, and source logical schemas source descriptive information, such as ownership descriptions, update frequencies and access methods process information, such as job schedules and ext

16、raction code,data staging metadata,data acquisition information, such as data transmission scheduling and results, and dimension table management, such as definitions of dimensions, and surrogate key assignments transformation and aggregation, such as data enhancement and mapping, DBMS load scripts,

17、 and aggregate definitions audit, job logs and documentation, such as data lineage records, data transform logs,Star Schema,The star schema is a data modeling technique used to map multidimensional decision support into a relational database. Star schemas yield an easily implemented model for multid

18、imensional data analysis while still preserving the relational structure of the operational database.,Star Schema,Four Components: Facts Dimensions Attributes Attribute hierarchies,Figure 13.14 A Three-Dimensional View of Sales,Figure 13.17 Attribute Hierarchies in Multidimensional Analysis,Facts,Nu

19、meric measurements that represent specific business aspect or activity Normally stored in fact table that is center of star schema Fact table contains facts linked through their dimensions Metrics are facts computed at run time,Dimensions,Qualifying characteristics provide additional perspectives to

20、 a given fact Decision support data almost always viewed in relation to other data Study facts via dimensions Dimensions stored in dimension tables,Attributes,Dimensions provide descriptions of facts through their attributes No mathematical limit to the number of dimensions Use to search, filter, an

21、d classify facts Slice and dice: focus on slices of the data cub for more detailed analysis,Attribute Hierarchies,Provide top-down data organization Two purpose: Aggregation Drill-down/roll-up data analysis Determine how the data are extracted and represented Stored in a DBMSs data dictionary Used b

22、y OLAP tool to access warehouse properly.,Star Schema,A star schema consists of fact tables and dimension tables. Fact tables contain the quantitative or factual data about a business-the information being queried. This information is often numerical, additive measurements and can consist of many co

23、lumns and millions or billions of rows. Dimension tables are usually smaller and hold descriptive data that reflects the dimensions, or attributes, of a business.,Figure 13.17 Star Schema For Sales,Star Schema Representation,Facts and dimensions are normally represented by physical tables in the dat

24、a warehouse database. The fact table is related to each dimension table in a many-to-one (M:1) relationship. Fact and dimension tables are related by foreign keys and are subject to the primary/foreign key constraints.,Figure 13.18 Orders Star Schema,Star Schema,Performance-Improving Techniques Norm

25、alization of dimensional tables Multiple fact tables representing different aggregation levels Denormalization of fact tables Table partitioning and replication,Figure 13.19 Normalized Dimension Tables,Multiple Fact Tables,Practice,How to design a star schema for an auto insurance company to do risk

26、 analysis? What is the Objective? What are the Facts? What are the Dimensions? What are the Attributes? What are the Attribute hierarchy?,Auto insurance DW star schema,Data Warehouse Design,Grain A definition of the highest level of detail that is supported in a data warehouse Drill-down The process

27、 of probing beyond a summarized value to investigate each of the detail transactions that comprise the summary,Data Warehouse Implementation,The Data Warehouse as an Active Decision Support Network A Company-Wide Effort that Requires User Involvement and Commitment at All Levels Satisfy the Trilogy:

28、 Data, Analysis, and Users Apply Database Design Procedures,Data Warehouse Implementation,Implementing a data warehouse is generally a massive effort that must be planned and executed according to established methods There are many facets to the project lifecycle, and no single person can be an expe

29、rt in each area,Data Warehouse Implementation Road Map,Data Integration and the Extraction, Transformation, and Load (ETL) Process,Data integration comprises three major processes: data access (the ability to access and extract data from any data source) data federation (the integration of business

30、views across multiple data stores), and change capture (the identification, capture , and delivery of the changes made to enterprise data sources).,Data Integration and the Extraction, Transformation, and Load (ETL) Process,Extraction, transformation, and load (ETL) Extraction - reading data from a

31、database Transformation - converting the extracted data from its previous form into the form that can be placed into a data warehouse Load - putting the data into the data warehouse,Data Integration and the Extraction, Transformation, and Load (ETL) Process,Data Cleanse,Data cleansing or data scrubb

32、ing is the act of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database. Used mainly in databases, the term refers to identifying incomplete, incorrect, inaccurate, irrelevant etc. parts of the data and then replacing, modifying or deleting this d

33、irty data.,ETL tools,A good ETL tool must be able to communicate with the many different relational databases and read the various used throughout an organization. ETL tools have started to migrate into Enterprise Application Integration, or even Enterprise Service Bus, systems that now cover much m

34、ore than just the extraction, transformation and loading of data. Many ETL vendors now have data profiling, data quality and metadata capabilities.,On-Line Analytical Processing,On-Line Analytical Processing (OLAP) is an advanced data analysis environment that supports decision making, business mode

35、ling, and operations research activities. Four Main Characteristics of OLAP Use multidimensional data analysis techniques. Provide advanced database support. Provide easy-to-use end user interfaces. Support client/server architecture.,On-Line Analytical Processing,Additional Functions of Multidimens

36、ional Data Analysis Techniques Advanced data presentation functions Advanced data aggregation, consolidation, and classification functions Advanced computational functions Advanced data modeling functions,Integration Of OLAP With A Spreadsheet Program,Figure 13.7 OLAP Server Arrangement,SAPs Busines

37、s Information Warehouse:an Enterprise-Wide Information Hub,An end-to-end enterprise-wide information hub to support planning and decision-making. A central data repository of SAP, non-SAP, current, and historical business transactions and meta data. Timely information to all levels and roles, from a

38、nalyst to executive. Years of SAP financial, logistic, and human resource information systems experience wedded with modern data warehouse methodologies.,A Sample Of Current Data Warehousing And Data Mining Vendors,Table 13.10,Success Stories at Pepsi,Using the data warehouse, weve been able to iden

39、tify important items, find national suppliers for them, and leverage those relationships to reduce costs.“ Thanks to the warehouse, Pepsi can monitor purchasing compliance at the user level, an ability that has boosted price and product compliance well over 90 percent. The warehouse also helps ensur

40、e 100 percent sales tax compliance, says Bridgman. Since going online in 1995, the warehouse has helped generate procurement savings in excess of $100 million.,Levels of DW Support for Enterprise Decision Making,The need for real-time data,A business often cannot afford to wait a whole day for its o

41、perational data to load into the data warehouse for analysis Provides incremental real-time data showing every state change and almost analogous patterns over time Maintaining metadata in sync is possible Less costly to develop, maintain, and secure one huge data warehouse so that data are centraliz

42、ed for BI/BA tools An EAI with real-time data collection can reduce or eliminate the nightly batch processes,Real-Time / Active Data Warehouse (RDW/ADW),Loading and and providing data via the data warehouse as they become available. Expand traditional data warehouse functions into the realm of tactical decision making Empower decision making when interact directly with customers and suppliers.,Real-Time Data Ware

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經(jīng)權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論