Dtabase Systems (資料庫系統(tǒng))_第1頁
Dtabase Systems (資料庫系統(tǒng))_第2頁
Dtabase Systems (資料庫系統(tǒng))_第3頁
Dtabase Systems (資料庫系統(tǒng))_第4頁
Dtabase Systems (資料庫系統(tǒng))_第5頁
已閱讀5頁,還剩45頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、1database systems(資料庫系統(tǒng)) october 31, 2005lecture #62physical & digital space interaction consider moving a pencil physical space movement mapping to digital spacedigital space movement mapping to physical space which one is more difficult? the actuated workbench (mit media lab)3course administra

2、tion assignment #2: due 11/2 assignment #3: post on the course homepage due 11/8 (next tuesday) practicum assignment #1: post on the course homepage on 11/7 next week reading: chapter 94reflection how to design a database?conceptual design: er modellogical design: relational model how to ask questio

3、ns on a database?relational algebra & sqls whats next?how to get fast access to records? file organizations & indexes whats further next?how to optimize query processing time?5overview of storage & indexingchapter 86outline types of external storage devices file organizationsquestions: h

4、ow to store table records on external storage device (e.g., a disk)? how to speed up access to needed records on a disk?heap file, sorted fileindexing data structures tree-based indexing, hash-based indexing comparison on file organizations question: which one is better/worse in performance? indexes

5、 and performancequestion: how to use indexing for better performance?7data on external storage external storage: offer persistent data storageunlike physical memory, data saved on a persistent storage is not lost when the system shutdowns or crashes. 8types of external storage devices magnetic disks

6、: can retrieve random page at fixed cost$1 per gigabytebut reading several consecutive pages is much cheaper than reading them in random order tapes: can only read pages in sequence$0.3 per gigabytecheaper than disks; used for archival storage other types of persistent storage devices:optical storag

7、e (cd-r, cd-rw, dvd-r, dvd-rw)flash memory9definition a record is a tuple or a row in a relation table. fixed-size records or variable-size records a file is a collection of records.store one table per file, or multiple tables in the same file a page is a fixed length block of data for disk i/o. a f

8、ile is consisted of pages.a data page also contains a collection of records. typical page sizes are 4 and 8 kb.10file organization method of arranging a file of records on external storage.record id (rid) is used to locate a record on a disk page id, slot numberindexes are data structures to efficie

9、ntly search rids of given values11db storage and indexinglayered architecturedisk space manager allocates/de-allocates spaces on disks.buffer manager moves pages between disks and main memory. file and index layers organize records on files, and manage the indexing data structure.12alternative file

10、organizations many alternatives exist, each ideal for some situations, and not so good in others:heap files: records are unsorted. suitable when typical access is a file scan retrieving all records without any order.fast update (insertions / deletions)sorted files: records are sorted. best if record

11、s must be retrieved in some order, or only a range of records is needed.examples: employees are sorted by age.slow update in comparison to heap file.indexes: data structures to organize records via trees or hashing. for example, create an index on employee age.like sorted files, speed up searches fo

12、r a subset of records that match values in certain (“search key”) fieldsupdates are much faster than in sorted files.13indexes an index on a file speeds up selections on the search key fields for the index.any subset of the attributes of a table can be the search key for an index on the relation.sea

13、rch key does not have to be candidate keyexample: employee age is not a candidate key. an index file contains a collection of data entries (called k k* *).).quickly search an index to locate a data entry with a key value k. example of a data entry: can use the data entry to find the data record. exa

14、mple of a data record: can create multiple indexes on the same data records. example indexes: age, salary, name 14indexing exampleindex data structure:index entries + indexing methoddata entriesdata recordssearch key value: find employees with age = 25index file(small for efficientsearch)data file (

15、large)paul(k=25, pauls rid)(b+ tree)(hash)mary15alternatives for data entry k k* * three alternatives for what to store in a data entry:(alternative 1): data record with key value k k example data record = data entry: (alternative 2): example data entry: (alternative 3): example data entry: choice o

16、f alternative for data entries is independent of the indexing method.indexing method takes a search key and finds the data entries matching the search key.examples of indexing methods: b+ trees or hashing.16alternatives for data entries (contd.) alternative 1: data record with key value k kdata entr

17、ies are also the data records.at most one index on a given collection of data records can use alternative 1.if data records are very large, # of pages containing data entries is high. may lead to less efficient search. 17alternatives for data entries (contd.) alternative 2: k alternative 3: data ent

18、ries typically much smaller than data records. may lead to more efficient search than alternative 1. why?alternative 3 more compact than alternative 2, lead to variable sized data entries (size of rid-list is not fixed)18index classificationclustered vs. unclustered: if order of data records is the

19、same as, or close to the order of data entries, then it is called clustered index.alternative 1 implies clustered; in practice, clustered also implies alternative 1.one clustered index and multiple unclustered indexeswhy is this important?consider the cost of range search query: find all records 30a

20、ge bucket number retrieve the primary page of the bucket. search records in the primary page. if not found, search the overflow pages.cost of locating rids: # pages in bucket (small) insert a record:apply key value to the hash function - bucket number if all (primary & overflow) pages in that bu

21、cket are full, allocate a new overflow page.cost: similar to search. delete a recordcost: similar to search.22b+ tree indexes leaf pages contain data entries, and are chained (prev & next) non-leaf pages contain index entries and direct searches:p0k1p1k2p2kmpmindex entryindex entrynon-leafnon-le

22、afpagespagespages pages leafleaf23example b+ tree find 7*, 29*? 15* age 30* insert/delete: find data entry in leaf, then change it. need to adjust parent sometimes.and change sometimes bubbles up the tree (keep the tree balance) more details about tree-based index in chapter 10.rootroot1717303013135

23、 527272 2* * 3 3* *1414* *1616* *3333* *3434* *3838* *3939* *7 7* *5 5* *8 8* *2222* *2424* *2727* * 2929* *entries 1724cost model for our analysis ignore cpu costs, for simplicity.measure disk i/o costs: number of page i/osignores gains of pre-fetching a sequence of pagescost analysisb b: : the num

24、ber of data pagesr r: : number of records per page* good enough to show the overall trends!25comparing file organizationsheap files (random order; insert at eof)sorted files, sorted on clustered b+ tree file, alternative (1), search key heap file with unclustered b + tree index on search key heap fi

25、le with unclustered hash index on search key 26operations to comparescan: fetch all records from diskequality searchexample: find all employees with age = 23 and salary = 5000. range selectionexample: find all employees with age 35.insert a recordidentify the page for inserting the record, fetch it,

26、 modify it, and write it back. delete a recordsimilar to insert.27assumptions in our analysis heap files:equality selection on key; exactly one match. sorted files:files compacted after deletions. indexes: alt (2), (3): data entry size = 10% size of data record 28heap filesb: b: the number of data p

27、agesr: r: number of records per page scan: b search with equality selection: b search with range selection: b insert: 2new record is inserted at the end of the file. read/write out last page. delete a record: search cost + 1 (no compacting) delete a record (with rid): 2search cost = 12 234341616 141

28、47 75 581813232 24242727 29291 112121111 505083839 9434329sorted filesb: the number of data pagesr: number of records per page scan: b search with equality selection: log2(b) binary search search with range selection: log2(b) + # pages of matched records14x empty data entries in modified page. delet

29、e: search cost + 1 search record, delete record, and write back modified page. no need to shift. index entriesdata entries = data recordsclustered32heap file with unclustered tree indexb: the number of data pagesr: number of records per page scan (in-order): 0.1 b + rb unorder scan: b search with eq

30、uality selection: logf(0.1 b) + 1 search for data entry takes logf(0.1b) pages + one read on data record page. data/index entry size 0.1 data record sizeunclustereddata entriesdata records33heap file with unclustered tree indexb: the number of data pagesr: number of records per page search with rang

31、e selection: logf(0.1b) + # pages of matched records search for first matching data entry, then find all the qualifying entries in sequential order. but each data entry may point to a data record on a different data page. insert: search cost + 3 one read/write to heap file page + search + one write

32、to data entry page. delete: search cost + 3data entriesdata records34heap file with unclustered hash indexb: the number of data pagesr: number of records per page scan (in-order): 0.125 b + rb = (0.125 + r) bunorder scan: b search with equality selection: 2 hash function + read data entry page + rea

33、d data record page search with range selection: b hash function is useless, do unorder scan.hash: no overflow buckets. 80% page occupancy = # data entry pages = 0.125bhsmith, 44, 30005004-300030005004jones, 40, 6003tracy, 44, 5004primary pageoverflow pagebucket asalaryashby, 25, 3000basu, 33, 4003br

34、istow, 29, 2007data recordsh(salary)=adata entries35heap file with unclustered hash indexb: the number of data pagesr: number of records per page insert: 4read & write heap file page + read & write data entry page delete: 4hash: no overflow buckets. 80% page occupancy = # data entry pages =

35、0.125bhsmith, 44, 30005004-300030005004jones, 40, 6003tracy, 44, 5004primary pageoverflow pagebucket asalaryashby, 25, 3000basu, 33, 4003bristow, 29, 2007data recordsh(salary)=adata entries36cost of operations *no one file organizations is uniformly superior(a) scan(b) equality(c) range(d) insert(e)

36、 delete(1) heapb0.5bb2search + 1(2) sorted blog2(b)log2(b) + #matchessearch + bsearch + b(3) clustered tree index1.5 blogf(1.5b)logf(1.5b) + #matchessearch + 1search + 1(4) unclustered tree indexb(r+0.1)1+logf (0.1b)logf (0.1b) + #matchessearch + 3search + 337general guidelines an index supports eff

37、icient retrieval of data entries satisfying a selection condition:two types of selections: equality and rangehash-based indexing is only optimized for equality selection, useless for range selection.tree-based indexing is better for both.tree-based clustering index is best for range selection.38gene

38、ral guidelines (cont) clustered index can be more expensive than unclustered index:when inserting a new record into a full page, shift existing records into other pages change data entries for these records expensive.tradeoff for more efficient range selection.39understanding the workload how to dec

39、ide the best indexing for a table?need to understand the workload for each query in the workload:which tables does it access?which fields are retrieved?which fields are involved in selection/join conditions? how selective are these conditions likely to be? for each update in the workload:which field

40、s are involved in selection/join conditions? how selective are these conditions likely to be?the type of update (insert/delete/update), and the fields that are affected.40choice of indexes what indexes should we create?which tables should have indexes? what field(s) should be the search key? should

41、we build several indexes? for each index, what kind of an index should it be?clustered or unclustered? hash index or tree index? 41choice of indexes (contd.) one approach: consider the most important queries in turn. consider the best plan using the current indexes, and see if a better plan is possi

42、ble with an additional index. if so, create it.obviously, this implies that we must understand how a dbms evaluates queries and creates query evaluation plans!for now, we discuss simple 1-table queries. before creating an index, must also consider the impact on updates in the workload!trade-off: ind

43、exes can make queries go faster, updates slower (because also have to update the indexes). indexes also require disk space, too.42index selection guidelines attributes in where clause are candidates for index keys.exact match condition suggests hash index.range query suggests tree index. clustering

44、is especially useful for range queries; can also help on equality queries if there are many duplicates. multi-attribute search keys should be considered when a where clause contains several conditions.order of attributes is important for range queries.such indexes can sometimes enable index-only str

45、ategies for important queries. for index-only strategies, clustering is not important! try to choose indexes that benefit as many queries as possible. since only one index can be clustered per relation, choose it based on important queries that would benefit the most from clustering.43examples of cl

46、ustered indexes b+ tree index on e.age can be used to get qualifying records.how selective is the condition? (selective means % of qualified records) is this index useful? consider the group by query.is e.age index good? why not?clustered e.dno index may be better. equality queries and duplicates:un

47、clustering is bad in case of many qualified records.clustering on e.hobby helps!select e.dnofrom emp ewhere e.age40select e.dno, count (*)from emp ewhere e.age10group by e.dnoselect e.dnofrom emp ewhere e.hobby=stamps44composite search keys search on a combination of fields. which index can be appli

48、ed?equality query: every field value is equal to a constant value. age=12 and sal =10range query: some field value is not a constant.age =13; or sal=10 and age 5the order of fields in composite key is important!: data entries are sorted by sal first, then age.sue 1375bobcaljoe121020801112nameage sal

49、12,2012,1011,8013,7520,1210,1275,1380,111112121310207580data recordssorted by namedata entries in indexsorted by data entriessorted by examples of composite keyindexes using lexicographic order.45composite search keys to retrieve emp records with age=30 and sal=4000, an index on would be better than

50、 an index on age or an index on sal. if condition is: 20age30 and 3000sal5000: clustered tree index on or . if condition is: age=30 and 3000sal5000: clustered index much better than index. why?for , find the first data entry with (age=30, sal=3000) and the qualified entries are likely to be qualifie

51、d. however, for , find the first data entry with (sal=3000, age=anything), subsequent entries can have any ages.the order of fields in composite key is important! composite indexes are larger, updated more often.46index-only plans a number of queries can be answered without retrieving any records from one or more of the relations involved if a suitable index is available.select e.dno, count(*)from emp egroup by e.dno tree inde

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論