版權說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權,請進行舉報或認領
文檔簡介
Chapter1:
FundamentalsofComputerDesignDavidPattersonElectricalEngineeringandComputerSciencesUniversityofCalifornia,Berkeley/~pattrsn/~cs252Originalslidescreatedby:Outline:IntroductionQuantitativePrinciplesofComputerDesignClassesofComputersComputerArchitectureTrendsinTechnologyPowerinIntegratedCircuitsTrendsinCostDependabilityPerformanceFallaciesandPitfalls2WhatisComputerArchitecture?FunctionaloperationoftheindividualHWunitswithinacomputersystem,andtheflowofinformationandcontrolamongthem.TechnologyProgrammingLanguageInterfaceInterfaceDesign(ISA)Measurement&EvaluationParallelismComputerArchitecture:ApplicationsOSHardwareOrganization34AbstractionLayersinModernSystemsAlgorithmGates/Register-TransferLevel(RTL)ApplicationInstructionSetArchitecture(ISA)OperatingSystem/VirtualMachineMicroarchitectureDevicesProgrammingLanguageCircuitsPhysicsOriginaldomainofthecomputerarchitect(‘50s-’80s)Domainofrecentcomputerarchitecture(‘90s)Reliability,power,…Parallelcomputing,security,…Reinvigorationofcomputerarchitecture,mid-2000sonward.5ComputerSystems:TechnologyTrends1988SupercomputersMassivelyParallelProcessorsMini-supercomputersMinicomputersWorkstationsPC’s2002PowerfulPC’sandSMPWorkstationsNetworkofSMPWorkstationsMainframesSupercomputersEmbeddedComputersCrossroads:ConventionalWisdominComp.ArchOldConventionalWisdom:Powerisfree,TransistorsexpensiveNewConventionalWisdom:“Powerwall”Powerexpensive,Xtorsfree
(Canputmoreonchipthancanaffordtoturnon)OldCW:SufficientlyincreasingInstructionLevelParallelismviacompilers,innovation(Out-of-order,speculation,…)NewCW:“ILPwall”lawofdiminishingreturnsonmoreHWforILPOldCW:Multipliesareslow,MemoryaccessisfastNewCW:“Memorywall”Memoryslow,multipliesfast
(200clockcyclestoDRAMmemory,4clocksformultiply)OldCW:Uniprocessorperformance2X/1.5yrsNewCW:PowerWall+ILPWall+MemoryWall=BrickWallUniprocessorperformancenow2X/5(?)yrs Seachangeinchipdesign:multiple“cores”
(2Xprocessorsperchip/~2years)Moresimplerprocessorsaremorepowerefficient6Crossroads:UniprocessorPerformanceVAX :25%/year1978to1986RISC+x86:52%/year1986to2002RISC+x86:??%/year2002topresentFromHennessyandPatterson,ComputerArchitecture:AQuantitativeApproach,4thedition,October,2006Lessthan20%7ChangeinChipDesignIntel4004(1971):4-bitprocessor,
2312transistors,0.4MHz,
10micronPMOS,11mm2chip
Processoristhenewtransistor?
RISCII(1983):32-bit,5stage
pipeline,40,760transistors,3MHz,
3micronNMOS,60mm2chip125mm2chip,0.065micronCMOS
=2312RISCII+FPU+Icache+DcacheRISCIIshrinksto~0.02mm2at65nmCachesviaDRAMor1transistorSRAM()?ProximityCommunicationviacapacitivecouplingat>1TB/s?
(IvanSutherland@Sun/Berkeley)8TakingAdvantageofParallelismIncreasingthroughputofservercomputerviamultipleprocessorsormultipledisksDetailedHWdesignCarrylookaheadaddersusesparallelismtospeedupcomputingsumsfromlineartologarithmicinnumberofbitsperoperandMultiplememorybankssearchedinparallelinset-associativecachesPipelining:overlapinstructionexecutiontoreducethetotaltimetocompleteaninstructionsequence.Noteveryinstructiondependsonimmediatepredecessorexecutinginstructionscompletely/partiallyinparallelpossibleClassic5-stagepipeline:
1)InstructionFetch(Ifetch),
2)RegisterRead(Reg),
3)Execute(ALU),
4)DataMemoryAccess(Dmem),
5)RegisterWrite(Reg)9PipelinedInstructionExecutionInstr.OrderTime(clockcycles)RegALUDMemIfetchRegRegALUDMemIfetchRegRegALUDMemIfetchRegRegALUDMemIfetchRegCycle1Cycle2Cycle3Cycle4Cycle6Cycle7Cycle510Limitstopipelining
HazardspreventnextinstructionfromexecutingduringitsdesignatedclockcycleStructuralhazards:attempttousethesamehardwaretodotwodifferentthingsatonceDatahazards:InstructiondependsonresultofpriorinstructionstillinthepipelineControlhazards:Causedbydelaybetweenthefetchingofinstructionsanddecisionsaboutchangesincontrolflow(branchesandjumps).Instr.OrderTime(clockcycles)RegALUDMemIfetchRegRegALUDMemIfetchRegRegALUDMemIfetchRegRegALUDMemIfetchReg11ThePrincipleofLocalityThePrincipleofLocality:Programaccessarelativelysmallportionoftheaddressspaceatanyinstantoftime.TwoDifferentTypesofLocality:TemporalLocality(LocalityinTime):Ifanitemisreferenced,itwilltendtobereferencedagainsoon(e.g.,loops,reuse)SpatialLocality(LocalityinSpace):Ifanitemisreferenced,itemswhoseaddressesareclosebytendtobereferencedsoon
(e.g.,straight-linecode,arrayaccess)Last30years,HWreliedonlocalityformemoryperf.PMEM$12LevelsoftheMemoryHierarchyCPURegisters100sBytes300–500ps(0.3-0.5ns)L1andL2Cache10s-100sKBytes~1ns-~10ns$1000s/GByteMainMemoryGBytes80ns-200ns~$100/GByteDisk10sTBytes,10ms
(10,000,000ns)~$1/GByteCapacityAccessTimeCostTapeinfinitesec-min~$1/GByteRegistersL1CacheMemoryDiskTapeInstr.OperandsBlocksPagesFilesStagingXferUnitprog./compiler1-8bytescachecntl32-64bytesOS4K-8Kbytesuser/operatorMbytesUpperLevelLowerLevelfasterLargerL2Cachecachecntl64-128bytesBlocks13WhatComputerArchitecturebringstoTableOtherfieldsoftenborrowideasfromarchitectureQuantitativePrinciplesofDesignTakeAdvantageofParallelismPrincipleofLocalityFocusontheCommonCaseAmdahl’sLawTheProcessorPerformanceEquationCareful,quantitativecomparisonsDefine,quantity,andsummarizerelativeperformanceDefineandquantityrelativecostDefineandquantitydependabilityDefineandquantitypowerCultureofanticipatingandexploitingadvancesintechnologyCultureofwell-definedinterfacesthatarecarefullyimplementedandthoroughlychecked14Comp.Arch.isanIntegratedApproachWhatreallymattersisthefunctioningofthecompletesystemhardware,runtimesystem,compiler,operatingsystem,andapplicationInnetworking,thisiscalledthe“EndtoEndargument”Computerarchitectureisnotjustabouttransistors,individualinstructions,orparticularimplementationsE.g.,OriginalRISCprojectsreplacedcomplexinstructionswithacompiler+simpleinstructions15ComputerArchitectureis
DesignandAnalysisArchitectureisaniterativeprocess:SearchingthespaceofpossibledesignsAtalllevelsofcomputersystemsCreativityGoodIdeasMediocreIdeasBadIdeasCost/PerformanceAnalysis16Outline:IntroductionQuantitativePrinciplesofComputerDesignClassesofComputersComputerArchitectureTrendsinTechnologyPowerinIntegratedCircuitsTrendsinCostDependabilityPerformanceFallaciesandPitfalls17FocusontheCommonCaseCommonsenseguidescomputerdesignSinceitsengineering,commonsenseisvaluableInmakingadesigntrade-off,favorthefrequentcaseovertheinfrequentcaseE.g.,Instructionfetchanddecodeunitusedmorefrequentlythanmultiplier,sooptimizeit1stE.g.,Ifdatabaseserverhas50disks/processor,storagedependabilitydominatessystemdependability,sooptimizeit1stFrequentcaseisoftensimplerandcanbedonefasterthantheinfrequentcaseE.g.,overflowisrarewhenadding2numbers,soimproveperformancebyoptimizingmorecommoncaseofnooverflowMayslowdownoverflow,butoverallperformanceimprovedbyoptimizingforthenormalcaseWhatisfrequentcaseandhowmuchperformanceimprovedbymakingcasefaster=>Amdahl’sLaw
18Amdahl’sLawBestyoucouldeverhopetodo:19Amdahl’sLawexampleNewCPU10XfasterI/Oboundserver,so60%timewaitingforI/OApparently,itshumannaturetobeattractedby10Xfaster,vs.keepinginperspectiveitsjust1.6Xfaster20Processorperformanceequation InstCount CPI ClockRateProgram X Compiler X (X)Inst.Set. X XOrganization X XTechnology XCPUtime =Seconds=InstructionsxCyclesxSeconds Program ProgramInstructionCycleinstcountCPICycletime21RelatingMetricsCPUexecutiontimeMeasuredtimeforarunningprogramEasytobemeasuredClockcyclesThenumberoftheclockpulseforarunningprogramHardtobemeasuredInstructioncountThenumberofinstructionsexecutedbytheprogramcanbemeasuredbyusingsoftwaretoolsthatprofiletheexecutionorbyusingasimulatorofthearchitectureCPIClockcyclesperinstructionsNeedtheclockcyclesandcountinstructionnumberforeachinstructiontypeforcomputingtheCPIClocksDigitalcircuithasaclockthatrunsataconstantrate(像人的脈膊),clockisusedforsignalsynchronizationCycletime=timeforonefullcycle(secondspercycle)Clockrate=cyclespersecond(HertzorHz)AlsoknownasclockfrequencyScientificPrefixesusingwithcycletimeandclockratePrefixSymbolMultipleteraT10E12gigaG10E9megaM10E6kilok10E3millim10E-3micro
u10E-6nanon10E-9picop10E-12What’saClockCycle?Olddays:10levelsofgatesToday:determinedbynumeroustime-of-flightissues+gatedelaysclockpropagation,wirelengths,driversLatchorregistercombinationallogic24TheaveragenumberofclockcycleseachinstructiontakestoexecuteAfloatingpointintensiveapplicationmighthaveahigherCPICPUclockcycles=InstructioncountxCPICPUtime=CPUclockcyclesxClockcycletimeCPUtime=InstructioncountxCPIxClockcycletimeCPUtime=(InstructioncountxCPI)/ClockrateCPI(Clockcyclesperinstruction)Supposewehavetwoimplementationsofthesameinstructionset
architecture(ISA).
Forsomeprogram,
MachineAhasaclockcycletimeof10ns.andaCPIof4.0
MachineBhasaclockcycletimeof20ns.andaCPIof1.2
Whatmachineisfasterforthisprogram,andbyhowmuch?
CPIExampleCPIExampleAnswer:MachineA:clockcycle=1ns,CPI=2MachineB:clockcycle=2ns,CPI=1.2CPUclockcyclesA=InstructionCountx4.0CPUclockcyclesB=InstructionCountx1.2CPUtimeA=CPUclockcyclesAxclockcycletime=InstructionCountx2x1=2xInstructionCountCPUtimeB=InstructionCountx1.2x2=4.4xInstructionCountPerformanceA/PerformanceB=ExecutiontimeB/ExecutiontimeA=(4.4xI)/(2xI)=1.2Thus,Ais1.2timesfasterthanBOutline:IntroductionQuantitativePrinciplesofComputerDesignClassesofComputersComputerArchitectureTrendsinTechnologyPowerinIntegratedCircuitsTrendsinCostDependabilityPerformanceFallaciesandPitfalls28Desktop:personalcomputerServer:webservers,fileservers,databaseserversEmbedded:handhelddevices(phones,cameras),dedicatedparallelcomputersThreemainclassesofcomputers29FeatureDesktopServerEmbeddedPriceofsystemPriceofmultiprocessormoduleCriticalsystemdesignissues$500-$5000$5000-$5,000,000$10-$100,000$50-$500$200-$10,000$.01-$100Price-performance,GraphicsperformanceThroughput,Availability,ScalabilityPrice,Powerconsumption,Application-specificperformance30Outline:IntroductionQuantitativePrinciplesofComputerDesignClassesofComputersComputerArchitectureTrendsinTechnologyPowerinIntegratedCircuitsTrendsinCostDependabilityPerformanceFallaciesandPitfalls31InstructionSetArchitecture:CriticalInterfacePropertiesofagoodabstractionLaststhroughmanygenerations(portability)Usedinmanydifferentways(generality)ProvidesconvenientfunctionalitytohigherlevelsPermitsanefficientimplementationatlowerlevelsinstructionsetsoftwarehardware32Example:MIPSarchitecture0r0r1°°°r31PClohiProgrammablestorage 2^32xbytes 31x32-bitGPRs(R0=0) 32x32-bitFPregs(pairedDP) HI,LO,PCDatatypes?Format?AddressingModes? Arithmeticlogical
Add,AddU,Sub,SubU,And,Or,Xor,Nor,SLT,SLTU, AddI,AddIU,SLTI,SLTIU,AndI,OrI,XorI,LUI SLL,SRL,SRA,SLLV,SRLV,SRAVMemoryAccess
LB,LBU,LH,LHU,LW,LWL,LWR SB,SH,SW,SWL,SWRControl
J,JAL,JR,JALR BEq,BNE,BLEZ,BGTZ,BLTZ,BGEZ,BLTZAL,BGEZAL32-bitinstructionsonwordboundary33RegistertoregisterTransfer,branchesJumpsMIPSarchitectureinstructionsetformat34ISAvs.ComputerArchitectureOlddefinitionofcomputerarchitecture
=instructionsetdesignOtheraspectsofcomputerdesigncalledimplementationInsinuatesimplementationisuninterestingorlesschallengingOurviewiscomputerarchitecture>>ISAArchitect’sjobmuchmorethaninstructionsetdesign;technicalhurdlestodaymorechallengingthanthoseininstructionsetdesignSinceinstructionsetdesignnotwhereactionis,someconcludecomputerarchitecture(usingolddefinition)isnotwhereactionisWedisagreeonconclusionAgreethatISAnotwhereactionis(ISAinCA:AQA4/eappendix)35Outline:IntroductionQuantitativePrinciplesofComputerDesignClassesofComputersComputerArchitectureTrendsinTechnologyPowerinIntegratedCircuitsTrendsinCostDependabilityPerformanceFallaciesandPitfalls36Moore’sLaw:2Xtransistors/“year”“CrammingMoreComponentsontoIntegratedCircuits”GordonMoore,Electronics,1965#ontransistors/cost-effectiveintegratedcircuitdoubleeveryNmonths(12≤N≤24)37TrackingTechnologyPerformanceTrendsDrilldowninto4technologies:Disks,Memory,Network,ProcessorsCompare~1980Archaic(Nostalgic)vs.
~2000Modern(Newfangled)PerformanceMilestonesineachtechnologyCompareforBandwidthvs.LatencyimprovementsinperformanceovertimeBandwidth:numberofeventsperunittimeE.g.,Mbits/secondovernetwork,Mbytes/secondfromdiskLatency:elapsedtimeforasingleeventE.g.,one-waynetworkdelayinmicroseconds,
averagediskaccesstimeinmilliseconds38Disks:Archaic(Nostalgic)v.Modern(Newfangled)CDCWrenI,19833600RPM0.03GBytescapacityTracks/Inch:800
Bits/Inch:9550
Three5.25”platters
Bandwidth:
0.6MBytes/secLatency:48.3msCache:noneSeagate373453,200315000RPM (4X)73.4GBytes (2500X)Tracks/Inch:64000 (80X)Bits/Inch:533,000 (60X)Four2.5”platters
(in3.5”formfactor)Bandwidth:
86MBytes/sec (140X)Latency:5.7ms (8X)Cache:8MBytes39LatencyLagsBandwidth(forlast~20years)PerformanceMilestonesDisk:3600,5400,7200,10000,15000RPM(8x,143x)(latency=simpleoperationw/ocontentionBW=best-case)40Memory:Archaic(Nostalgic)v.Modern(Newfangled)1980DRAM
(asynchronous)0.06Mbits/chip64,000xtors,35mm216-bitdatabuspermodule,16pins/chip13Mbytes/secLatency:225ns(noblocktransfer)2000
DoubleDataRateSynchr.
(clocked)DRAM256.00Mbits/chip (4000X)256,000,000xtors,204mm264-bitdatabusper
DIMM,66pins/chip (4X)1600Mbytes/sec (120X)Latency:52ns (4X)Blocktransfers(pagemode)41LatencyLagsBandwidth(last~20years)PerformanceMilestones
MemoryModule:16bitplainDRAM,PageModeDRAM,32b,64b,SDRAM,
DDRSDRAM(4x,120x)Disk:
3600,5400,7200,10000,15000RPM(8x,143x)(latency=simpleoperationw/ocontentionBW=best-case)42LANs:Archaic(Nostalgic)v.Modern(Newfangled)Ethernet802.3
YearofStandard:197810Mbits/s
linkspeedLatency:3000msecSharedmediaCoaxialcableEthernet802.3ae
YearofStandard:200310,000Mbits/s (1000X)
linkspeedLatency:190msec (15X)SwitchedmediaCategory5copperwireCoaxialCable:CoppercoreInsulatorBraidedouterconductorPlasticCoveringCopper,1mmthick,
twistedtoavoidantennaeffectTwistedPair:"Cat5"is4twistedpairsinbundle43LatencyLagsBandwidth(last~20years)PerformanceMilestones
Ethernet:10Mb,100Mb,1000Mb,10000Mb/s(16x,1000x)MemoryModule:
16bitplainDRAM,PageModeDRAM,32b,64b,SDRAM,
DDRSDRAM(4x,120x)Disk:
3600,5400,7200,10000,15000RPM(8x,143x)(latency=simpleoperationw/ocontentionBW=best-case)44CPUs:Archaic(Nostalgic)v.Modern(Newfangled)1982Intel8028612.5MHz2MIPS(peak)Latency320ns134,000xtors,47mm216-bitdatabus,68pinsMicrocodeinterpreter,
separateFPUchip(nocaches)
2001IntelPentium4
1500
MHz (120X)4500MIPS(peak) (2250X)Latency15ns (20X)42,000,000xtors,217mm264-bitdatabus,423pins3-waysuperscalar,
DynamictranslatetoRISC,Superpipelined(22stage),
Out-of-OrderexecutionOn-chip8KBDatacaches,
96KBInstr.Tracecache,
256KBL2cache45LatencyLagsBandwidth(last~20years)PerformanceMilestonesProcessor:‘286,‘386,‘486,Pentium,PentiumPro,Pentium4(21x,2250x)Ethernet:10Mb,100Mb,1000Mb,10000Mb/s(16x,1000x)MemoryModule:16bitplainDRAM,PageModeDRAM,32b,64b,SDRAM,
DDRSDRAM(4x,120x)Disk:3600,5400,7200,10000,15000RPM(8x,143x)CPUhigh,Memorylow
(“MemoryWall”)46RuleofThumbforLatencyLaggingBWInthetimethatbandwidthdoubles,latencyimprovesbynomorethanafactorof1.2to1.4
(andcapacityimprovesfasterthanbandwidth)Statedalternatively:
BandwidthimprovesbymorethanthesquareoftheimprovementinLatency
476ReasonsLatency
LagsBandwidth1. Moore’sLawhelpsBWmorethanlatencyFastertransistors,moretransistors,
morepinshelpBandwidthMPUTransistors: 0.130vs.42Mxtors (300X)DRAMTransistors: 0.064vs.256Mxtors (4000X)MPUPins: 68vs.423pins
(6X)DRAMPins: 16vs.66pins
(4X)Smaller,fastertransistorsbutcommunicate
over(relatively)longerlines:limitslatency
Featuresize: 1.5to3vs.0.18micron (8X,17X)MPUDieSize: 35vs.204mm2 (ratiosqrt2X)DRAMDieSize: 47vs.217mm2 (ratiosqrt2X)486ReasonsLatency
LagsBandwidth(cont’d)
2.Distancelimitslatency
SizeofDRAMblock
longbitandwordlines
mostofDRAMaccesstimeSpeedoflightandcomputersonnetwork1.&2.explainslinearlatencyvs.squareBW?3. Bandwidtheasiertosell(“bigger=better”)E.g.,10Gbits/sEthernet(“10Gig”)vs.
10mseclatencyEthernet4400MB/sDIMM(“PC4400”)vs.50nslatencyEvenifjustmarketing,customersnowtrainedSincebandwidthsells,moreresourcesthrownatbandwidth,whichfurthertipsthebalance496ReasonsLatency
LagsBandwidth(cont’d)
4. LatencyhelpsBW,butnotviceversa
Spinningdiskfasterimprovesbothbandwidthandrotationallatency
3600RPM15000RPM=4.2XAveragerotationallatency:8.3ms2.0msThingsbeingequal,alsohelpsBWby4.2XLowerDRAMlatency
Moreaccess/second(higherbandwidth)HigherlineardensityhelpsdiskBW
(andcapacity),butnotdiskLatency9,550BPI533,000BPI
60XinBW506ReasonsLatency
LagsBandwidth(cont’d)
5.BandwidthhurtslatencyQueueshelpBandwidth,hurtLatency(QueuingTheory)AddingchipstowidenamemorymoduleincreasesBandwidthbuthigherfan-outonaddresslinesmayincreaseLatency6.OperatingSystemoverheadhurts
LatencymorethanBandwidthLongmessagesamortizeoverhead;
overheadbiggerpartofshortmessages51Outline:IntroductionQuantitativePrinciplesofComputerDesignClassesofComputersComputerArchitectureTrendsinTechnologyPowerinIntegratedCircuitsTrendsinCostDependabilityPerformanceFallaciesandPitfalls52Defineandquantitypower(1/2)ForCMOSchips,traditionaldominantenergyconsumptionhasbeeninswitchingtransistors,calleddynamicpower:Formobiledevices,energybettermetricForafixedtask,slowingclockrate(frequencyswitched)reducespower,butnotenergyCapacitiveloadafunctionofnumberoftransistorsconnectedtooutputandtechnology,whichdeterminescapacitanceofwiresandtransistorsDroppingvoltagehelpsboth,sowentfrom5Vto1VTosaveenergy&dynamicpower,mostCPUsnowturnoffclockofinactivemodules(e.g.Fl.Pt.Unit)53ExampleofquantifyingpowerSuppose15%reductioninvoltageresultsina15%reductioninfrequency.Whatisimpactondynamicpower?54Defineandquantitypower(2/2)Becauseleakagecurrentflowsevenwhenatransistorisoff,nowstaticpowerimportanttooLeakagecurrentincreasesinprocessorswithsmallertransistorsizesIncreasingthenumberoftransistorsincreasespowereveniftheyareturnedoffIn2006,goalforleakageis25%oftotalpowerconsumption;highperformancedesignsat40%Verylowpowersystemsevengatevoltagetoinactivemodulestocontrollossduetoleakage55Outline:IntroductionQuantitativePrinciplesofComputerDesignClassesofComputersComputerArchitectureTrendsinTechnologyPowerinIntegratedCircuitsTrendsinCostDependabilityPerformanceFallaciesandPitfalls56CostofIntegratedCircuitsdependsofseveralfactors:Time:Thepricedropswithtime,learningcurveincreasesVolume:ThepricedropswithvolumeincreaseCommodities:ManymanufacturersproducethesameproductCompetitionbringspricesdown57ThepriceofIntelPentium4andPentiumM58AMDOpteronMicroprocessorDie59A300mmsiliconwafercontains117AMDOpteronmicroprocessorchipsina90nmprocess60Costofintegratedcircuit=Costofdie+Costoftestingdie+CostofPackagingandfinalTestFinalTestYieldCostofdie=CostofWaferDiesperwaferXDieyield61Diesperwafer=PiXWaferDiameterSqrt(2XDiearea)Example:WaferDiameter=300mmDiearea=1.5cmX1.5cm=2.25cm^2Diesperwafer=270PiX(WaferDiameter/2)^2Diearea-62Dieyield=DefectsperunitareaXDieareaaWaferyieldX(1+)-aWaferyield:measureshowmanywafersarecompletelybada=4Empiricalformulacorrespondstomaskinglevelsinmanufacturingprocess63Example:Diearea=1.5cmX1.5cm=2.25cm^2Dieyield=0.44Defectdensity=0.4percm^2Diearea=1.0cmX1.0cm=1cm^2Dieyield=0.68Smallerdieareagivesmoredieyield64Outline:IntroductionQuantitativePrinciplesofComputerDesignClassesofComputersComputerArchitectureTrendsinTechnologyPowerinIntegratedCircuitsTrendsinCostDependability
PerformanceFallaciesandPitfalls65Defineandquantitydependability(1/3)Howdecidewhenasystemisoperatingproperly?InfrastructureprovidersnowofferServiceLevelAgreements(SLA)toguaranteethattheirnetworkingorpowerservicewouldbedependableSystemsalternatebetween2statesofservicewithrespecttoanSLA:Serviceaccomplishment,wheretheserviceisdeliveredasspecifiedinSLAServiceinterruption,wherethedeliveredserviceisdifferentfromtheSLAFailure=transitionfromstate1tostate2Restoration=transitionfromstate2tostate166Defineandquantitydependability(2/3)Modulereliability=measureofcontinuousserviceaccomplishment(ortimetofailure).
2metricsMeanTimeToFailure(MTTF)measuresReliabilityFailuresInTime(FIT)=1/MTTF,therateoffailuresTraditionallyreportedasfailuresperbillionhoursofoperationMeanTimeToRepair(MTTR)measuresServiceInterruptionMeanTimeBetweenFailures(MTBF)=MTTF+MTTRModuleavailabilitymeasuresserviceasalternatebetweenthe2statesofaccomplishmentandinterruption(numberbetween0and1,e.g.0.9)Moduleavailability=MTTF/(MTTF+MTTR)67ExamplecalculatingreliabilityIfmoduleshaveexponentiallydistributedlifetimes(ageofmoduledoesnotaffectprobabilityoffailure),overallfailurerateisthesumoffailureratesofthemodulesCalculateFITandMTTFfor10disks(1MhourMTTFperdisk),1diskcontroller(0.5MhourMTTF),and1powersupply(0.2MhourMTTF):68Outline:IntroductionQuantitativePrinciplesofComputerDesignClassesofComputersComputerArchitectureTrendsinTechnologyPowerinIntegratedCircuitsTrendsinCostDependabilityPerformanceFallaciesandPitfalls6970HowtoQuantifyPerformance?Timetorunthetask(ExTime)Executiontime,responsetime,latencyTasksperday,hour,week,sec,ns…(Performance)Throughput,bandwidthPlaneBoeing747BAD/SudConcodreSpeed610mph1350mphDCtoParis6.5hours3hoursPassengers470132Throughput(pmph)286,700178,200Definition:Performance Performance(X) Execution_time(Y) n= = Performance(Y) Execution_time(X)PerformanceisinunitsofthingspersecbiggerisbetterIfweareprimarilyconcernedwithresponsetime1 execution_time(x)"XisntimesfasterthanY"means:performance(x)=71Performance:WhattomeasureUsuallyrelyonbenchmarksvs.realworkloadsToincreasepredictability,collectionsofbenchmarkapplications,calledbenchmarksuites,arepopularSPECCPU:populardesktopbenchmarksuiteCPUonly,splitbetweenintegerandfloatingpointprogramsSPECint2000has12integer,SPECfp2000has14integerpgmsSPECCPU2006tobeannouncedSpring2006SPECSFS(NFSfileserver)andSPECWeb(WebServer)addedasserverbenchmarksTransactionProcessingCouncilmeasuresserverperformanceandcost-performancefordatabasesTPC-CComplexqueryforOnlineTransactionProcessingTPC-HmodelsadhocdecisionsupportTPC-WatransactionalwebbenchmarkTPC-Appapplicationserverandwebservicesbenchmark7273SPEC:SystemPerformanceEvaluationCooperativeFirstRound198910programsyieldingasinglenumber(“SPECmarks”)SecondRound1992SPECInt92(6integerprograms)andSPECfp92(14floatingpointprograms)CompilerFlagsunlimited.March93newsetofprograms:SPECint95(8integerprograms)andSPECfp95(10floatingpoint)“benchmarksusefulfor3years”Singleflagsettingforallprograms:SPECint_base95,SPECfp_base95
SPECCPU2000(11integerbenchmarks–CINT2000,and14floating-pointbenchmarks–CFP2000NormalizedExecutionTimeNormalizeexecutiontimetoareferencemachineTwocommonmethodArithmeticmeanGeometricmeanComparisonArithmeticmeanUsetopredictperformanceMaynotbeconsistentGeometricmeanIndependentoftherunningtimesoftheindividualprogramsCannotbeusedtopredictrelativeexecutiontimeforaworkload4.5NormalizedExecutionTime–ExampleTimeonATimeonBNormalizedtoANormalizedtoBABABProgram111011
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經(jīng)權益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 廣東石油化工學院《Andoid基礎編程》2023-2024學年第一學期期末試卷
- 廣東汕頭幼兒師范高等??茖W?!兜谝煌鈬Z英》2023-2024學年第一學期期末試卷
- 廣東農(nóng)工商職業(yè)技術學院《生物制藥學科前沿進展》2023-2024學年第一學期期末試卷
- 廣東茂名幼兒師范專科學?!洞黉N策略》2023-2024學年第一學期期末試卷
- 廣東茂名健康職業(yè)學院《英國文學下》2023-2024學年第一學期期末試卷
- 廣東理工職業(yè)學院《美國社會與文化》2023-2024學年第一學期期末試卷
- 一年級數(shù)學計算題專項練習集錦
- 大腦的奧秘:神經(jīng)科學導論(復旦大學)學習通測試及答案
- 【2022屆走向高考】高三數(shù)學一輪(北師大版)基礎鞏固:第8章-第1節(jié)-簡單幾何體及其三視圖和直觀圖
- 2022韶山市高考英語閱讀理解及書面表達精煉(8)及答案
- 前置胎盤的手術配合課件
- 魚骨圖模板1PPT課件
- 八年級心理健康教育《自控力——成功的標尺》課件
- 中國動畫之經(jīng)典賞析PPT課件
- 施工現(xiàn)場節(jié)電方法
- T∕CAMDI 041-2020 增材制造(3D打?。┒ㄖ剖焦强剖中g導板
- 水利工程安全生產(chǎn)組織機構
- 廣東省佛山市南海區(qū)人民法院
- 實施農(nóng)村客運公交化改造推進城鄉(xiāng)客運一體化發(fā)展
- 口腔修復學專業(yè)英語詞匯整理
- 【圖文】化學纖維質(zhì)量檢驗標準
評論
0/150
提交評論