版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)
文檔簡介
計算機(jī)組22內(nèi)容主要取材:CS61C的11講和12層次概直 直 Cache舉Cache讀和Cache性多級組相連改進(jìn)Cache性多級Cache性能實當(dāng)代Cache舉StorageinStorageinaHoldsdatainregisterfiles(~100Registersaccessedonsub-nanosecondMemory(“mainAccesstime~50-100HundredsofclockcyclespermemorySummer2012--Lecture4GreatIdea#3:PrincipleofLocality/MemoryHierarchymemory2
1989first CPUwithcacheon1998PentiumIIIhastwocachelevelson1
PerformanceGap(grows50%/year)
Summer2012--Lecture PrincipleofLocalityPrincipleofLocality:ProgramsaccessonlyasmallportionofthefulladdressspaceatanyinstantoftimeRecall:AddressspaceholdsbothcodeandgenerallylocalizedcodeStackandHeaptrytokeepyourdataaccess705Summer20127053PrinciplePrincipleofLocalityTemporalLocality(localityinGobacktothesamebookondeskmultipleIfamemorylocationisreferencedthenitwilltobereferencedagainSpatialLocality(localityinWhengotobookshelf,grabmanybooksonIfamemorylocationisreferenced,thelocationswithnearbyaddresseswilltendtobereferenced7705 2012 temporallocality/時間局部性spatiallocality/空間局部PrinciplePrincipleofLocalityWeexploittheprincipleoflocalityinhardwareviaamemoryhierarchywhere:–Levelsclosertoprocessorarefaster(andmoreexpensiveperbitso–Levelsfartherfromprocessorare(andlessexpensiveperbitsoGoal:CreatetheillusionofmemorybeingalmostasfastasfastestmemoryandalmostaslargeasbiggestmemoryofthehierarchySummer2012--Lecture84LevelLevelMoreLevel..LevelSummer2012--Lecture9CacheIntroduceintermediatehierarchylevel:memorycache,whichholdsacopyofasubsetofmainAsapun,oftenuse$(“cash”)toabbreviatecache(e.g.D$=D ache,L1$=Level1Cache)Modernprocessorshaveseparatecachesforinstructionsanddata,aswellasseverallevelsofcachesimplementedindifferentsizesImplementedwithsameICprocessingtechnologyasC ndintegratedon-chip–fasterbutmoreexpensivethanmainmemory Summer2012--Lecture 5MemoryMemoryHierarchyCachesusestaticRAM+Fast(typicalaccesstimesof0.5to2.5–Lowdensity(6transistorcells),higherpower,expensive($2000to$4000perGBin2011)Static:contentwilllastaslongaspowerisMainmemoryusesdynamicRAM+Highdensity(1transistorcells),lowerpower,cheaper($20to$40perGBin2011)–Slower(typicalaccesstimesof50to70Dynamic:needstobe“refreshed”regularly(~every8Summer2012--LectureMemoryMemoryTransferintheInclusive:datain?datain?datain?datainBlock:UnittransferbetweenmemoryandcacheMainSummer2012--Lecture6cachecachemain–BythecachecontrollerWeareManagingthe–Bycompiler(orassemblylevelmainmemorydisks(secondaryBytheOS(virtualmemory,whichisalaterVirtualtophysicaladdressmap thehardware(TLB)Summer2012--LectureTypicalMemorySummer2012--Lecture Cacheor7Cacheogy:Cacheogy:organizationisWhatistheoverallorganizationofblocksweimposeonourcache?WheredoweputablockofdatafromHowdoweknowifablockisalreadyinHowdowequicklyfindablockwhenweneedWhendowereplacesomethingintheSummer2012--Lecture直直 多級組相連航航空航天大學(xué)計算機(jī)學(xué)8GeneralNotesonRecall:Memoryisbyte-Wehaven’tspecifiedthesizeofourbutwillbemultipleofwordsize(32-HowdoweaccessindividualwordsorwithinaCacheissmallerthan
Can’tfitallblocksatonce,somultipleblocksmemorymaptothesamecacheslotNeedsomewayofidentifyingwhi blockiscurrentlyintherow
Summer2012--Lecture Direct-MappedCaches emoryblockismappedtoexactlyrowinthecache(direct-UsesimplehashEffectofblockadjacentbytes,whichdifferinaddressbyOffsetfield:LowestbitsofmemoryaddresscanbeusedtoindextospecificbyteswithinablockBlocksizeneedstobeapoweroftwo(in Summer2012--Lecture 9Direct-MappedDirect-MappedCachesEffectofcachesize:(totalstoredIfcouldholdallofmemory,woulduseremainingbits(minusoffsetbits)toselectappropriaterowofcacheIndexfield:Applyhashfunctiontoremainingtodeterminewhichrowtheblockgoes(blockaddress)modulo(#ofblocksintheTagfield:Leftoverupperbitsofmemoryaddressdeterminewhichportionofmemorytheblockcamefrom(identifier)Summer2012--LectureTIOTIOAddressMemoryaddress TIOMeaningofthefieldObits?2Obytes/block=2O-2Ibits?2Irowsincache=cachesize/blockTbits=A–I–O,whereA=#ofaddress(A=32Summer2012--LectureDirect-MappedDirect-MappedCachesWhat’sactuallyintheEachrowcontainstheactualdatablockto(Bbits=8×2OInaddition,mustsaveTagfieldofaddressasidentifier(Tbits)Validbit:IndicateswhethertheblockinthatisvalidorTotalbitsincache=#rows×(B+T+=2I×(8×2O+T+1)Summer2012--LectureCacheCacheExampleCache–Addressspaceof64B,blocksizeof1word,cachesizeof4wordsTIOMemoryAddresses:XXXX–1word=4bytes,soO=log2(4)=BlockCachesize/blocksize=4,soI=log2(4)=A=log2(64)=6bits,soT=6–2–2=Bitsincache=22×(8×22+2+1)=140Summer2012--LectureCacheExampleCacheExampleMainIndexWhichblocksmapeachrowofthe(seeOnamemory twoTakeIndexfieldCheckifValidbittrueinthatrowofIfvalid,thencheckifTagmatchesSummer2012--LectureMainMemoryshowninblocks,sooffsetbitsnotshown(x’s)CacherowsexactlymatchtheIndexfieldDirect-MappedDirect-MappedCache4words/block,cachesize=1Ki3130.. 131211...43218BlockIndexValid012.. Data Summer2012--LectureSummer2012--Lecture705Whenreadingmemory,3thingscanCacheCacheblockisvalidandcontainstheaddress,soreadthedesiredCacheNothinginthatrowofthecache(notvalid),sofetchfrommemoryCachemisswithblockWrongblockisintherow,sodiscarditandfetchdesireddatafrommemorySummer2012--Lecture705HoweffectiveisyourWanttomaxcachehitsandmincacheHitrate(HR):PercentageofmemoryaccessesinaprogramorsetofinstructionsthatresultinacachehitMissrate(MR):Likehitrate,butformissesMR=1–HowfastisyourHittime(HT):Timetoaccesscache(includingTagMisspenalty(MP):TimetoreplaceablockincachefromalowerlevelinthememorySourcesofCacheMisses:TheCompulsory:(coldstartorprocessmigration,1streference)FirstaccesstoblockimpossibletoEffectissmallforlongrunningCachecannotcontainallblocksaccessedbythe:Multiplememorylocationsmappedtothesamecachelocation705705直直 多級組相連(modifiedby(modifiedbyConsiderthesequenceofmemoryaddressStartwithanemptycache-all initiallymarkedasnot1 01234434158requests,6misses(HR=0.25,MR=地址空間:16B,blocksize:1B,cachesize:4BTIO=2-2-0TakingAdvantageofTakingAdvantageofSpatialLetcacheblockholdmorethanoneStartwithanemptycache-allinitiallymarkedasnot0 000000010010001101000011010001234448requests,4misses(HR= MR=地址空間:16B,blocksize:2B,cachesize:4BTIO=2-1-1EffectofBlockandEffectofBlockandCacheonMissCache50BlocksizeMissrategoesupiftheblock esasignificantofthecachesizebecausethenumberofblocksthatcanbeinthesamesizecacheissmaller(increasingcapacity Summer2012--Lecture Missrate441664256直直 多級組相連Cache一致性問011112A將0寫入01CacheReadsandWanttohandlereadsandwritesquicklywhilemaintainingconsistencybetweencacheandmemory(i.e.bot owaboutallupdates)PoliciesforcachehitsandmissesareHereweassumetheuseofinstructionand aches(I$andReadfromWriteonlytoD$(assumenoself-modifying705Summer2012--705HandlingHandlingCacheReadhits(I$and–Fastestpossiblescenario,sowantmoreofWritehitsWrite-ThroughPolicy:Alwayswritedatatocacheandtomemory(throughcache)ForcescacheandmemorytoalwaysbeSlow!(everymemoryaccessisIncludeaWriteBufferthatupdatesmemoryinparallelwithprocessorAssumepresentinallwhenwritingto Summer2012--Lecture HandlingHandlingCacheReadhits(I$and–Fastestpossiblescenario,sowantmoreofWritehitsWrite-BackPolicy:Writedataonlytocache,updatememorywhenblockisAllowscacheandmemorytobeMultiplewritescollectedincache;singlewritetomemoryperblockDirtybit:Extrabitpercacherowthatissetifblockwrittento(is“dirty”)andneedstobewrittenSummer2012--LectureHandlingHandlingCacheMisspenaltygrowsasblocksizeReadmisses(I$andWritemissesWriteallocate:Fetchblockfrommemory,putincache,executeawritehitWorkswitheitherwrite-throughorwrite-Ensurescacheisup-to-dateafterwriteSummer2012--LectureHandlingHandlingCacheMisspenaltygrowsasblocksizeReadmisses(I$and–Stallexecution,fetchblockfrommemory,putinWritemissesNo-writeallocate:SkipcachealtogetherandwritedirectlytomemoryCacheisneverup-to-dateafterwriteEnsuresmemoryisalwaysup-to-Summer2012--LectureMemoryhierarchyexploitsprincipleoftodeliverlotsofmemoryatfastDirect-MappedCache:EachblockinmemorymapstoexactlyonerowinthecacheIndextodeterminewhichTagtoidentifyifit’stheblockyouCachereadandwriteWriteallocateandno-writeallocatefor Summer2012--Lecture 直直 多級組相連航航空航天大學(xué)計算機(jī)學(xué)GreatGreatIdea#3:PrincipleofCacheCacheTwothingshurttheperformanceofa–MissrateandmissAverageMemoryAccessTime(AMAT):averagetimetoaccessmemoryconsideringbothhitsandmissesAMAT=Hittime+Missrate×Miss(abbreviatedAMAT=HT+MR×Summer2012--LectureProcessorspecs:200psclock,MPof50clockcycles,MRof0.02misses/instruction,andHTof1clockcycleAMAT=1+0.02×50=2clockcycles=400Whichimprovementwouldbe190ps 380MPof40clock 360MRof0.015 350Summer2012--LectureCacheCacheParameterWhatisthepotentialimpactofmuchcacheonAMAT?(sameblockIncreaseLongerHT:smalleris–Atsomepoint,increaseinhittimeforalargercachemay etheimprovementinhitrate,yieldingadecreaseinperformanceEffectonTIO?Bitsincache?Summer2012--LectureEffectEffectofCachePerformanceonIncludememoryaccessesinRecall:CPUCPUTime=Instructions×CPI×ClockCycleCPIstall=CPIbase+AverageMemory-stallCPUTime=IC×CPIstall×Simplifiedmodelformemory-stallMemory-stallSummer2012--LectureCPICPIProcessorspecs:CPIbaseof1,a100cycle36%load/storeinstructions,and2%I$andD$HowmanytimesperinstructiondoweaccesstheI$?TheD$?MPisassumedthesameforbothI$andMemory-stallcycleswillbesumofstallcyclesforbothI$andD$Summer2012--LectureTheThe3CsRevisited:DesignIncreasecachesize(mayincrease:IncreasecacheIncreaseassociativity(mayincreaseSummer2012--LectureCPICPIProcessorspecs:CPIbaseof1,a100cycle36%load/storeinstructions,and2%I$andD$Memory-stall=(100%×2%+36%×4%)×100= CPIstall=1+3.44= (morethan3xWhatiftheCPIbaseisreducedtoWhatiftheD$missratewentupby Summer2012--Lecture MultipleCacheWithMultipleCacheWithadvancingtechnology,havemoreroomondieforbiggerL1cachesandforL2(andinsomecasesevenL3)cache(i.e.holdsbothinstructionsandMultilevelcachingisawaytoreducemissSowhatdoesthislookSummer2012--Lecture直直 多級組相連航航空航天大學(xué)計算機(jī)學(xué)MultilevelCacheMultilevelCacheMainCPU...PathofdatabacktoSummer2012--LectureRequestfordataReturnofdataMultilevelMultilevelCacheAMAT=L1HT+L1MR×L1NowL1MPdependsonothercacheL1MP=L2HT+L2MR×L2Ifmorelevels,thencontinuethischain(i.e.MPi=HTi+1+MRi+1×MPi+1)FinalMPismainmemoryaccessFortwoAMAT=L1HT+L1MR×(L2HT+L2MR×L2Summer2012--LectureProcessorspecs:1cycleL1HT,2%L1MR,5cycleL2HT,5%L2MR,100cyclemainmemoryHT–HereassumingunifiedWithoutAMAT1=1+0.02×100=WithAMAT2=1+0.02×(5+0.05×100)=Summer2012--LectureLocalLocalvs.GlobalMissLocalmissrate:Fractionofreferencestolevelofacachethate.g.L2$localMR=L2$misses/L1$Specifictolevelofcaching(asusedinGlobalmissrate:FractionofallthatmissinalllevelsofamultilevelGlobalMRistheproductofalllocalStartatGlobalMR=Lnmisses/L1accessesandSobydefinition,globalMR≤anylocalSummer2012--Lecture1000mem40mem20mem110100Forevery1000CPUtomemory–40willmissinL1$;whatisthelocal –20willmissinL2$;whatisthelocal –Globalmiss Summer2012--LectureDesignDesignL1$focusesonlowhittime(fastL1MPsignificantlyreducedbypresenceofL2$,socanbesmaller/fasterevenwithhigherMRe.g.smaller$(fewerL2$,L3$focusonlowmissAsmuchaspossibleavoidreachingtomainmemory(heavypenalty)e.g.larger$withlargerblocksizes(same#Summer2012--Lecture直直 多級組相連航航空航天大學(xué)計算機(jī)學(xué)ReducingReducingCacheAllowmoreflexibleblockplacementinDirect-mapped:MemoryblockmapstoonecacheFullyassociative:MemoryblockcangoinanyN-wayset-associative:Divide$intosets,eachofwhichconsistsofnslotstoplacememoryblock–MemoryblockmapstoasetdeterminedbyIndexfieldandisplacedinanyofthenslotsofthat–Hashfunction:(blockaddress)modulo(#setsin Summer2012--Lecture BlockBlockPlacementPlacememoryblock12inacachethatholds8Direct-mapped:Canonlygoinrow(12mod8)=Fullyassociative:Cangoinanyoftheslots(12-waysetassociative:Cangoineitherslotof(12mod4)= Summer2012--LectureEffectEffectofAssociativityonTIOHereweassumeacacheoffixedsizeOffset:#ofbytesinablock(sameasIndex:Insteadofpointingtoarow,nowpointstoaset,soI=C/B/associativity?Fullyassociative(1set):0Index?Direct-mapped(associativityof1):maxIndex?Setassociative:somewherein-Tag:Remainingidentifierbits(T=A–I–Summer2012--LectureEffectEffectofAssociativityonTIOForafixed-sizecache,eachincreasebyafactoroftwoinassociativitydoublesthenumberofblocksperset(i.e.thenumberofslots)andhalvesthenumberofsets–decreasingthesizeoftheIndexby1bitandincreasesthesizeoftheTagby1bitUsedfortagSelectsthe SelectsthewordintheDirectmapped(onlyone(onlyoneset)Summer2012--LectureSetSetAssociativeExampleCache6-bitaddresses,blocksizeof1cachesizeof4words,2-waysetHowmanyC/B/associativity=4/1/2=2TIOO=log2(4)=2,I=log2(2)=1,T=6–1–2=MemoryAddresses:XXXXBlockSummer2012--LectureSetAssociativeSetAssociativeExampleMainSetSlotV 101Eachblockmapsoneset(either(seeOnamemory(let’ssay11)TakeIndexfield2)ForEACHslotincheckvalidbit,thencompareTag Summer2012--Lecture MainMemoryshowninblocks,sooffsetbitsnotshown(x’s)SetnumbersexactlymatchtheIndexfieldQuestion:Question:WhatistheTIObreakdownforfollowing32-bitaddress32KiB4-waysetassociative8wordT Example:direct-mapped$thatholds4Startsempty(allinitiallynotConsiderthememory 0,4,0,4,0 HRof
4
0
.. pongeffect:alternatingrequeststhatmapintothesamecacherow Summer2012--Lecture SetAssociativeExample:2-wayassociative$holds4Startsempty(allinitiallynotConsiderthememory 0,4,0,4,0 4 0 4HRof(n-2)/n–big–Reduced missesbecausememorylocationsthatmapintothesamesetcanco-exist Summer2012--Lecture Example:Example:Eight-BlockCacheTotalsizeof$=#sets×Forfixed$size,associativity↑means#sets↓andslotsperset↑With8blocks,an8-waysetassociative$issameasafullyassociative$Summer2012--Lecture4-Way4-WaySetAssociative28=256setseachwithfourslotsfor3130.. 13109.. 21 Byte8IndexV012..VVV012..012..012..Summer2012--LectureBlockBlockReplacementPoliciesRandomHardwarerandomlyselectsacacheblockinLeastRecentlyUsedHardwarekeepstrackofaccesshistoryandreplacestheentrythathasnotbeenusedforthelongesttimeFor2-waysetassociativecache,cangetawaywithjustonebitpersetExampleofaSimple“Pseudo”LRUForeachset,storeahardwarereplacementpointerthatpointstooneslotWheneverthatslotisaccessed,movepointertonextThatslotisthemostrecentlyusedandcannotbetheSummer2012--LectureCostsCostsofSet-AssociativeForn-waysetassociativeMustchoosedatafromcorrectslotOncachemiss,whichblockdoyoureplacetheUseacacheblockreplacementTherearemany(mostareintuitivelynamed),butwewilljustcoverafewinthisclass Summer2012--Lecture BenefitsofSet-AssociativeBenefitsofSet-AssociativeConsidercostofamissvs.costofLargestgainsareingoingfromdirectmappedto2-way(20%+reductioninmissrate) Summer2012--Lecture直直 多級組相連ReducetheHitTimeoftheSmallercache(lessto1wordblocks(noMUX/selectortopickReducetheMissBiggercacheLargerblocks(compuls
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 626建材、家具、家電電商平臺建設(shè)項目
- 2024跨境電子商務(wù)合作經(jīng)營合同
- 2024股權(quán)補(bǔ)償協(xié)議范本
- 2025年度主播與直播平臺合作分成協(xié)議3篇
- 福建省南平市莒口中學(xué)2021-2022學(xué)年高二化學(xué)下學(xué)期期末試卷含解析
- 2024棉花種子種植基地建設(shè)與運(yùn)營合同3篇
- 2024版:北京企業(yè)經(jīng)營托管協(xié)議3篇
- 2024版空壓機(jī)短期租賃合同
- 2024跨國企業(yè)集團(tuán)內(nèi)部交易合同
- 2023年教科版四年級上冊英語Unit7How many stars does each group have(含答案)
- 柜類家具結(jié)構(gòu)設(shè)計課件
- 建設(shè)項目管理費(fèi)用(財建2016504號)
- 煤炭運(yùn)輸安全保障措施提升運(yùn)輸安全保障措施
- JTGT-3833-2018-公路工程機(jī)械臺班費(fèi)用定額
- LDA型電動單梁起重機(jī)參數(shù)
- 保安巡邏線路圖
- (完整版)聚乙烯課件
- 中國雷暴日多發(fā)區(qū)特征及雷電發(fā)展變化
- 大華基線解碼器解碼上墻的操作
- 干部業(yè)績相關(guān)信息采集表
- 八年級上綜合性學(xué)習(xí) 我們的互聯(lián)網(wǎng)時代 練習(xí)卷(含答案)
評論
0/150
提交評論