紹興縣夏履鎮(zhèn)土地利用總體規(guī)劃_第1頁
紹興縣夏履鎮(zhèn)土地利用總體規(guī)劃_第2頁
紹興縣夏履鎮(zhèn)土地利用總體規(guī)劃_第3頁
紹興縣夏履鎮(zhèn)土地利用總體規(guī)劃_第4頁
紹興縣夏履鎮(zhèn)土地利用總體規(guī)劃_第5頁
已閱讀5頁,還剩40頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

1、The Memory HierarchySept 29, 2006TopicsStorage technologies and trendsLocality of referenceCaching in the memory hierarchyclass10.ppt15-213“The course that gives CMU its Zip!”qq群發(fā) http:/Random-Access Memory (RAM)Key featuresRAM is traditionally packaged as a chip.Basic storage unit is normally a cel

2、l (one bit per cell).Multiple RAM chips form a memory.Static RAM (SRAM)Each cell stores a bit with a four or six-transistor circuit.Retains value indefinitely, as long as it is kept powered.Relatively insensitive to electrical noise (EMI), radiation, etc.Faster and more expensive than DRAM.Dynamic R

3、AM (DRAM)Each cell stores bit with a capacitor. One transistor is used for accessValue must be refreshed every 10-100 ms.More sensitive to disturbances (EMI, radiation,) than SRAM.Slower and cheaper than SRAM.SRAM vs DRAM SummaryTran.AccessNeedsNeedsper bit timerefresh?EDC?CostApplicationsSRAM4 or 6

4、1XNoMaybe100 xcache memoriesDRAM110XYesYes1XMain memories,frame buffersConventional DRAM Organizationd x w DRAM:dw total bits organized as d supercells of size w bitscolsrows01230123internal row buffer16 x 8 DRAM chipaddrdatasupercell(2,1)2 bits/8 bits/memorycontroller(to CPU)Reading DRAM Supercell

5、(2,1)Step 1(a): Row access strobe (RAS) selects row 2.colsrowsRAS = 20123012internal row buffer16 x 8 DRAM chip3addrdata2/8/memorycontrollerStep 1(b): Row 2 copied from DRAM array to row buffer.Reading DRAM Supercell (2,1)Step 2(a): Column access strobe (CAS) selects column 1.colsrows01230123interna

6、l row buffer16 x 8 DRAM chipCAS = 1addrdata2/8/memorycontrollerStep 2(b): Supercell (2,1) copied from buffer to data lines, and eventually back to the CPU.supercell (2,1)supercell (2,1)To CPUMemory Modules: supercell (i,j)64 MB memory moduleconsisting ofeight 8Mx8 DRAMsaddr (row = i, col = j)Memoryc

7、ontrollerDRAM 7DRAM 00317815162324326339404748555664-bit doubleword at main memory address Abits0-7bits8-15bits16-23bits24-31bits32-39bits40-47bits48-55bits56-6364-bit doubleword0317815162324326339404748555664-bit doubleword at main memory address AEnhanced DRAMsDRAM Cores with better interface logi

8、c and faster I/O :Synchronous DRAM (SDRAM)Uses a conventional clock signal instead of asynchronous controlDouble data-rate synchronous DRAM (DDR SDRAM)Double edge clocking sends two bits per cycle per pinRamBus DRAM (RDRAM)Uses faster signaling over fewer wires (source directed clocking)with a Trans

9、action oriented interface protocolObsolete Technologies :Fast page mode DRAM (FPM DRAM)Allowed re-use of row-addressesExtended data out DRAM (EDO DRAM)Enhanced FPM DRAM with more closely spaced CAS signals.Video RAM (VRAM)Dual ported FPM DRAM with a second, concurrent, serial interfaceExtra function

10、ality DRAMS (CDRAM, GDRAM)Added SRAM (CDRAM) and support for graphics operations (GDRAM)Nonvolatile MemoriesDRAM and SRAM are volatile memoriesLose information if powered off.Nonvolatile memories retain value even if powered offRead-only memory (ROM): programmed during productionMagnetic RAM (MRAM):

11、 stores bit magnetically (in development)Ferro-electric RAM (FERAM): uses a ferro-electric dielectricProgrammable ROM (PROM): can be programmed onceEraseable PROM (EPROM): can be bulk erased (UV, X-Ray)Electrically eraseable PROM (EEPROM): electronic erase capabilityFlash memory: EEPROMs with partia

12、l (sector) erase capabilityUses for Nonvolatile MemoriesFirmware programs stored in a ROM (BIOS, controllers for disks, network cards, graphics accelerators, security subsystems,)Solid state disks (flash cards, memory sticks, etc.)Smart cards, embedded systems, appliancesDisk cachesTraditional Bus S

13、tructure Connecting CPU and MemoryA bus is a collection of parallel wires that carry address, data, and control signals.Buses are typically shared by multiple devices.mainmemoryI/O bridgebus interfaceALUregister fileCPU chipsystem busmemory busMemory Read Transaction (1)CPU places address A on the m

14、emory bus. ALUregister filebus interfaceA0Axmain memoryI/O bridge%eaxLoad operation: movl A, %eaxMemory Read Transaction (2)Main memory reads A from the memory bus, retrieves word x, and places it on the bus.ALUregister filebus interfacex0Axmain memory%eaxI/O bridgeLoad operation: movl A, %eaxMemory

15、 Read Transaction (3)CPU read word x from the bus and copies it into register %eax.xALUregister filebus interfacexmain memory0A%eaxI/O bridgeLoad operation: movl A, %eaxMemory Write Transaction (1) CPU places address A on bus. Main memory reads it and waits for the corresponding data word to arrive.

16、yALUregister filebus interfaceAmain memory0A%eaxI/O bridgeStore operation: movl %eax, AMemory Write Transaction (2) CPU places data word y on the bus.yALUregister filebus interfaceymain memory0A%eaxI/O bridgeStore operation: movl %eax, AMemory Write Transaction (3) Main memory reads data word y from

17、 the bus and stores it at address A.yALUregister filebus interfaceymain memory0A%eaxI/O bridgeStore operation: movl %eax, AMemory Subsystem TrendsObservation: A DRAM chip has an access time of about 50ns. Traditional systems may need 3x longer to get the data from memory into a CPU register.Modern s

18、ystems integrate the memory controller onto the CPU chip: Latency matters!DRAM and SRAM densities increase and so does the soft-error rate:Traditional error detection & correction (EDC) is a must have (64bit of data plus 8bits of redundancy allow any 1 bit error to be corrected and any 2 bit error i

19、s guaranteed to be detected)EDC is increasingly needed for SRAMs tooChipKill capability (can correct all bits supplied by one failing memory chip) will become standard soonDisk GeometryDisks consist of platters, each with two surfaces.Each surface consists of concentric rings called tracks.Each trac

20、k consists of sectors separated by gaps.spindlesurfacetrackstrack ksectorsgapsDisk Geometry (Muliple-Platter View) Aligned tracks form a cylinder.surface 0surface 1surface 2surface 3surface 4surface 5cylinder kspindleplatter 0platter 1platter 2Disk CapacityCapacity: maximum number of bits that can b

21、e stored.Vendors express capacity in units of gigabytes (GB), where1 GB = 109 Bytes (Lawsuit pending! Claims deceptive advertising). Capacity is determined by these technology factors:Recording density (bits/in): number of bits that can be squeezed into a 1 inch segment of a track.Track density (tra

22、cks/in): number of tracks that can be squeezed into a 1 inch radial segment.Areal density (bits/in2): product of recording and track density.Modern disks partition tracks into disjoint subsets called recording zonesEach track in a zone has the same number of sectors, determined by the circumference

23、of innermost track.Each zone has a different number of sectors/track Computing Disk CapacityCapacity = (# bytes/sector) x (avg. # sectors/track) x(# tracks/surface) x (# surfaces/platter) x (# platters/disk)Example:512 bytes/sector300 sectors/track (on average)20,000 tracks/surface2 surfaces/platter

24、5 platters/diskCapacity = 512 x 300 x 20000 x 2 x 5 = 30,720,000,000 = 30.72 GB Disk Operation (Single-Platter View) The disk surface spins at a fixedrotational ratespindleBy moving radially, the arm can position the read/write head over any track.The read/write headis attached to the endof the arm

25、and flies over the disk surface ona thin cushion of air.spindlespindlespindlespindleDisk Operation (Multi-Platter View) armread/write heads move in unisonfrom cylinder to cylinderspindleDisk Access TimeAverage time to access some target sector approximated by :Taccess = Tavg seek + Tavg rotation + T

26、avg transfer Seek time (Tavg seek)Time to position heads over cylinder containing target sector.Typical Tavg seek = 9 msRotational latency (Tavg rotation)Time waiting for first bit of target sector to pass under r/w head.Tavg rotation = 1/2 x 1/RPMs x 60 sec/1 minTransfer time (Tavg transfer)Time to

27、 read the bits in the target sector.Tavg transfer = 1/RPM x 1/(avg # sectors/track) x 60 secs/1 min.Disk Access Time ExampleGiven:Rotational rate = 7,200 RPMAverage seek time = 9 ms.Avg # sectors/track = 400.Derived:Tavg rotation = 1/2 x (60 secs/7200 RPM) x 1000 ms/sec = 4 ms.Tavg transfer = 60/720

28、0 RPM x 1/400 secs/track x 1000 ms/sec = 0.02 msTaccess = 9 ms + 4 ms + 0.02 msImportant points:Access time dominated by seek time and rotational latency.First bit in a sector is the most expensive, the rest are free.SRAM access time is about 4 ns/doubleword, DRAM about 60 nsDisk is about 40,000 tim

29、es slower than SRAM, 2,500 times slower then DRAM.Logical Disk BlocksModern disks present a simpler abstract view of the complex sector geometry:The set of available sectors is modeled as a sequence of b-sized logical blocks (0, 1, 2, .)Mapping between logical blocks and actual (physical) sectorsMai

30、ntained by hardware/firmware device called disk controller.Converts requests for logical blocks into (surface,track,sector) triples.Allows controller to set aside spare cylinders for each zone.Accounts for the difference in “formatted capacity” and “maximum capacity”. I/O BusmainmemoryI/O bridgebus

31、interfaceALUregister fileCPU chipsystem busmemory busdisk controllergraphicsadapterUSBcontrollermousekeyboardmonitordiskI/O busExpansion slots forother devices suchas network adapters.Reading a Disk Sector (1) mainmemoryALUregister fileCPU chipdisk controllergraphicsadapterUSBcontrollermousekeyboard

32、monitordiskI/O busbus interfaceCPU initiates a disk read by writing a command, logical block number, and destination memory address to a port (address) associated with disk controller.Reading a Disk Sector (2)mainmemoryALUregister fileCPU chipdisk controllergraphicsadapterUSBcontrollermousekeyboardm

33、onitordiskI/O busbus interfaceDisk controller reads the sector and performs a direct memory access (DMA) transfer into main memory.Reading a Disk Sector (3)mainmemoryALUregister fileCPU chipdisk controllergraphicsadapterUSBcontrollermousekeyboardmonitordiskI/O busbus interfaceWhen the DMA transfer c

34、ompletes, the disk controller notifies the CPU with an interrupt (i.e., asserts a special “interrupt” pin on the CPU)Storage Trendsmetric1980198519901995200020052005:1980$/MB8,0008801003010.2040,000access (ns)3752001007060508typical size(MB) 0.0640.256416641,00015,000 DRAMmetric198019851990199520002

35、0052005:1980$/MB19,2002,90032025610075256access (ns)3001503515121030SRAMmetric1980198519901995200020052005:1980$/MB50010080.300.050.00110,000access (ms)877528108422typical size(MB) 1101601,0009,000400,000400,000DiskCPU Clock Rates1980198519901995200020052005:1980processor 8080286386PentiumP-IIIP-4cl

36、ock rate(MHz) 16201507503,0003,000cycle time(ns)1,0001665061.30.33,333The CPU-Memory GapThe gap widens between DRAM, disk, and CPU speeds. LocalityPrinciple of Locality:Programs tend to reuse data and instructions near those they have used recently, or that were recently referenced themselves.Tempor

37、al locality: Recently referenced items are likely to be referenced in the near future.Spatial locality: Items with nearby addresses tend to be referenced close together in time.Locality Example:DataReference array elements in succession (stride-1 reference pattern):Reference sum each iteration:Instr

38、uctionsReference instructions in sequence:Cycle through loop repeatedly: sum = 0;for (i = 0; i n; i+)sum += ai;return sum;Spatial localitySpatial localityTemporal localityTemporal localityLocality ExampleClaim: Being able to look at code and get a qualitative sense of its locality is a key skill for

39、 a professional programmer.Question: Does this function have good locality?int sum_array_rows(int aMN) int i, j, sum = 0; for (i = 0; i M; i+) for (j = 0; j N; j+) sum += aij; return sum;Locality ExampleQuestion: Does this function have good locality?int sum_array_cols(int aMN) int i, j, sum = 0; fo

40、r (j = 0; j N; j+) for (i = 0; i M; i+) sum += aij; return sum;Locality ExampleQuestion: Can you permute the loops so that the function scans the 3-d array a with a stride-1 reference pattern (and thus has good spatial locality)?int sum_array_3d(int aMNN) int i, j, k, sum = 0; for (i = 0; i M; i+) f

41、or (j = 0; j N; j+) for (k = 0; k N; k+) sum += akij; return sum;Memory HierarchiesSome fundamental and enduring properties of hardware and software:Fast storage technologies cost more per byte, have less capacity, and require more power (heat!). The gap between CPU and main memory speed is widening

42、.Well-written programs tend to exhibit good locality.These fundamental properties complement each other beautifully.They suggest an approach for organizing memory and storage systems known as a memory hierarchy.An Example Memory Hierarchyregisterson-chip L1cache (SRAM)main memory(DRAM)local secondar

43、y storage(local disks)Larger, slower, and cheaper (per byte)storagedevicesremote secondary storage(tapes, distributed file systems, Web servers)Local disks hold files retrieved from disks on remote network servers.Main memory holds disk blocks retrieved from local disks.off-chip L2cache (SRAM)L1 cac

44、he holds cache lines retrieved from the L2 cache memory.CPU registers hold words retrieved from L1 cache.L2 cache holds cache lines retrieved from main memory.L0:L1:L2:L3:L4:L5:Smaller,faster,and costlier(per byte)storage devicesCachesCache: A smaller, faster storage device that acts as a staging ar

45、ea for a subset of the data in a larger, slower device.Fundamental idea of a memory hierarchy:For each k, the faster, smaller device at level k serves as a cache for the larger, slower device at level k+1.Why do memory hierarchies work?Programs tend to access the data at level k more often than they

46、 access the data at level k+1. Thus, the storage at level k+1 can be slower, and thus larger and cheaper per bit.Net effect: A large pool of memory that costs as much as the cheap storage near the bottom, but that serves data to programs at the rate of the fast storage near the top.Caching in a Memo

47、ry Hierarchy0123456789101112131415Larger, slower, cheaper storagedevice at level k+1 is partitionedinto blocks.Data is copied betweenlevels in block-sized transfer units89143Smaller, faster, more expensivedevice at level k caches a subset of the blocks from level k+1Level k:Level k+1:444101010Reques

48、t14Request12General Caching ConceptsProgram needs object d, which is stored in some block b.Cache hitProgram finds b in the cache at level k. E.g., block 14.Cache missb is not at level k, so level k cache must fetch it from level k+1. E.g., block 12.If level k cache is full, then some current block must be replaced (evicted). Which one is the “victim”? Placement policy: where can the new block go? E.g., b mod 4Replacement policy: which block should be evicted? E.g., LRU930123456789101112131415Level k:Lev

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論