data:image/s3,"s3://crabby-images/4540c/4540c629716022f884365f7b443dfcac887f2249" alt="ECE259CPS221AdvancedComputerArchitectureII(_第1頁"
data:image/s3,"s3://crabby-images/38b40/38b40ae4d39f3748853971806ec98e39b460c7e7" alt="ECE259CPS221AdvancedComputerArchitectureII(_第2頁"
data:image/s3,"s3://crabby-images/d312a/d312a8271ce5e95388f92e6b6e2ce6d2ab2b0d72" alt="ECE259CPS221AdvancedComputerArchitectureII(_第3頁"
data:image/s3,"s3://crabby-images/c1dd2/c1dd2d8675b5648ad367f071daec8fbb6d8448d5" alt="ECE259CPS221AdvancedComputerArchitectureII(_第4頁"
data:image/s3,"s3://crabby-images/94d75/94d755223dc1fad85f22f2441a44379b0647816b" alt="ECE259CPS221AdvancedComputerArchitectureII(_第5頁"
版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領
文檔簡介
1、ECE 259 / CPS 221 Advanced Computer Architecture II(Parallel Computer Architecture)Shared Memory MPs COMA & BeyondCopyright 2004 Daniel J. SorinDuke UniversitySlides are derived from work bySarita Adve (Illinois), Babak Falsafi (CMU),Mark Hill (Wisconsin), Alvy Lebeck (Duke), Steve Reinhardt (Michig
2、an), and J. P. Singh (Princeton). Thanks!OutlineCache Only Memory Architecture (COMA)BasicsData Diffusion Machine (DDM)Simple COMA (S-COMA)Reactive NUMAHierarchical CoherenceBasicsSequent NUMA-QChip Multiprocessor (CMP)Sun WildfireToken CoherenceReviewBasic idea of directoriesPer-processor cache hie
3、rarchiesDirectory interleaved with memoryDirectory limitations/drawbacksLimited capacity for replicationHigh design & implementation costSingle hard-wired protocolLimitations of shared physical address spaceCache Only Memory Architecture (COMA)Make all memory available for migration & replicationAll
4、 memory is DRAM cache called Attraction MemoryExamplesData Diffusion Machine (next)Flat COMA (fixed home for directory but not data)KSR-1 (hierarchy of snooping rings)But how do youFind data?Deal with replacements?COMA example: Data Diffusion Machine (DDM)All hardware COMAAttraction Memory One giant
5、 hardware cacheMaintains both address tags and stateData addressed, allocated, & kept coherent in blocksDirectory info on a per cache-block basisNot home based:Data is migratory AM attracts dataMust find a home when replacing the dataMust find the directory entry before finding the dataDDM Directory
6、Directory is hierarchical in a tree formEach is a set-associative cache of directory infoTree maintains inclusion:Higher levels keep replica of lower sub-treesDDDDDDDDDM Coherence/Placement ProtocolSimple write-invalidate protocolCache states: Invalid, Shared, ExclusiveMust traverse the directory:To
7、 find a copy on a read or write missTo invalidate on a write to SharedDirectory is hierarchical set-associative cachesQ1: Is the block in my sub-tree?Q2: Does the block exist outside my sub-tree?Request goes up until Q2=no and then downRequest goes down until Q1=no or leafOn a replacement:for an Exc
8、lusive copy, must find another home (HARD!)for a Shared copy, must make sure other copies existelse must find another homeSimple COMA (S-COMA)(Pure) COMABlock granularity to find/allocate/replace (complex hardware)Block granularity for coherence/transfers (good for false sharing)Software DSMPage gra
9、nularity to find/allocate/replace (use VM: good)Page granularity for coherence/transfers (bad for false sharing)Simple COMAPage granularity to find/allocate/replace (use VM: good)Block granularity for coherence/transfers (good for false sharing)Blocks act like sub-blocks on pageS-COMA-like ExamplesW
10、isconsin Typhoon Reinhardt et al. ISCA 1994On access, VM system checks if page presentOn access, HW/SW checks block stateFailure invokes user-level protocol in SWGood flexibility, but SW slow & users dont want to write protocolsSun Wildfire Hagersten/Koster HPCA 1999Begin with up to four SMP nodesAd
11、d pseudo-processor board to each as proxy for rest of systemCan run CC-NUMA directory protocolCan selectively use S-COMA (called Coherent Memory Replication)Selects between with competitive algorithm Falsafi/Wood ISCA97Hierarchical method of building parallel machinesWELL TALK MORE ABOUT THIS LATERA
12、 Taxonomy of IssuesAllocation/ReplicationCache line vs pageAccess Control (Coherence)Cache line vs pageHW vs SWProtocol ProcessingHW vs SWCommunicationCache line vs pageHW vs SW (message passing)Reactive NUMA (R-NUMA)PRESENTATIONOutlineCache Only Memory Architecture (COMA)BasicsData Diffusion Machin
13、e (DDM)Reactive NUMA (R-NUMA)Hierarchical CoherenceBasicsNUMA-QChip multiprocessor (CMP)Sun WildfireIntel ProfusionHierarchical CoherenceMany older systems were flatE.g., a directory that points to 1K processorsUse hierarchyIntra-node coherence (e.g., snooping in SMP node)Inter-node coherence (e.g.,
14、 directory between nodes)Why?Divide & conquer markets (e.g., sell node)Divide & conquer complexity (but must interface protocols)Example Two-level HierarchiesAdvantages of Multiprocessor NodesAmortization of node fixed costs over multiple processorsApplies even if processors simply packaged together
15、 but not coherentCan use commodity SMPsLess nodes for directory to keep track of (coarser grain)Much communication may be contained within node (cheaper)Nodes prefetch data for each other (fewer “remote” misses)Combining of requests (like hierarchical, only two-level)Can even share caches (overlappi
16、ng of working sets)Benefits depend on sharing pattern (and mapping)Good for widely read-shared: e.g. tree data in Barnes-HutGood for nearest-neighbor, if properly mappedNot so good for all-to-all communicationDisadvantages of Coherent MP NodesBandwidth shared among nodesAll-to-all exampleApplies to
17、coherent or notBus increases latency to local memoryWith coherence, typically wait for local snoop results before sending remote requestsSnoopy bus at remote node increases delays there, too, increasing latency and reducing bandwidthOverall, may hurt performance if sharing patterns dont complySequen
18、t NUMA-Q System OverviewUse of high-volume SMPs as building blocksQuad bus is 532MB/s split-transaction in-order responsesLimited facility for out-of-order responses for off-node accessesCross-node interconnect is 1GB/s unidirectional ringLarger SCI systems built by bridging multiple ringsNUMA-Q IQ-
19、Link BoardIQ-Link board plays the role of Hub Chip in SGI OriginCan generate interrupts between quadsRemote cache (visible to SCI) block size is 64 bytes (32MB, 4-way) Processor caches not visible (snoopy-coherent within SMP node) to SCIRemote cache is inclusive with respect to processor caches on S
20、MPData Pump (GaAs) implements SCI, pulls off relevant packetsInterface to quad bus.Manages remote cachedata and bus logic. Pseudo-memory controller and pseudo-processor. Interface to data pump,OBIC, interrupt controllerand directory tags. ManagesSCI protocol using program-mable engines. NUMA-Q cont.IQ-Link is keyLocal directory: home (I), fresh (S), gone (E) + pointer“L3” remote ca
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
- 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
- 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- T-ZRCMA 001-2024 城市軌道交通智慧實訓系統(tǒng)技術規(guī)范
- 二零二五年度餐飲店面租賃合同含節(jié)假日促銷活動
- 二零二五年度個人擔保合同-個人理財產品擔保服務條款
- 二零二五年度農村墓地選購與祭祀活動組織合同
- 二零二五年度茶飲品牌全國使用許可合同
- 二零二五年度互聯(lián)網保險產品銷售委托理財服務協(xié)議
- 二零二五年度棋牌室合作伙伴關系管理與維護合同
- 2025年度順豐員工勞動合同爭議解決機制合同
- 二零二五年度個人合同范本:智能家居控制系統(tǒng)研發(fā)合作合同
- 二零二五年度新型工業(yè)園區(qū)委托中介代理出租服務協(xié)議
- 車站信號自動控制課件:進站信號機點燈電路
- 民用無人機操控員執(zhí)照(CAAC)考試復習重點題庫500題(含答案)
- GB/T 6553-2024嚴酷環(huán)境條件下使用的電氣絕緣材料評定耐電痕化和蝕損的試驗方法
- 瀝青基材料在石油化工設備熱絕緣中的開發(fā)
- 中職旅游專業(yè)《中國旅游地理》說課稿
- 煤層底板采動破裂深度壓水
- 第15課 列強入侵與中國人民的反抗斗爭【課件】-中職高一上學期高教版
- 新人教小學數(shù)學六年級下冊《成數(shù)》示范課教學課件
- CAR-T細胞療法行業(yè)營銷策略方案
- 中國海關科學技術研究中心招聘筆試真題2022
- 四年級美術測國測復習題答案
評論
0/150
提交評論