第05章標(biāo)量處理機(jī)eng3課件_第1頁(yè)
第05章標(biāo)量處理機(jī)eng3課件_第2頁(yè)
第05章標(biāo)量處理機(jī)eng3課件_第3頁(yè)
第05章標(biāo)量處理機(jī)eng3課件_第4頁(yè)
第05章標(biāo)量處理機(jī)eng3課件_第5頁(yè)
已閱讀5頁(yè),還剩96頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶(hù)提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

2023/7/23余臘生版權(quán)所有,違者必究5-1MultipleInstructionIssueWehaveattemptedtolimitstallsfromhazardstolowertheaverageCPItotheidealCPIof1CanwedecreaseCPItounder1?How?Issueandexecutemorethan1instructionatatimeMultiple-issueprocessorscomeintwokinds:Superscalarsusestaticand/ordynamicschedulingmechanismsandmultiplefunctionalunitstoissuemorethan1instructionatatimeVLIW(verylonginstructionword)useinstructionswhicharethemselvesmultipleinstructions,scheduledbyacompilerallinstructionsinthelongwordareexecutedinparallelthisrequiressoftware(compiler)support2023/7/23余臘生版權(quán)所有,違者必究5-2SuperscalarHardwareissuesfrom1to8instructionsperclockcycletheseinstructionsmustbeindependentandsatisfyotherconstraintsAvoidstructuralhazards-usedifferentfunctionalunits,makeupto1memoryreferencecombinedSchedulingofinstructionscanbedonestaticallybyacompilerordynamicallybyhardwareWhileasuperscalarcanissueanycombinationofinstructions,forsimplicity,wewillconcentrateona2instructionsuperscalarforMIPSwhereoneinstructionwillbeanintegeroperationandtheother,ifavailablewillbeafloatingpointoperationThissimplificationreducesthecomplexityofthehardware,butalsoreducestheusefulnessofthesuperscalar2023/7/23余臘生版權(quán)所有,違者必究5-3超標(biāo)量處理機(jī)的基本結(jié)構(gòu)如果把處理機(jī)中能夠同時(shí)運(yùn)行的指令條數(shù)定義為指令并行度ILP(instructionlevelparallelism),那未一條k級(jí)流水線的ILP為k。如果一個(gè)超標(biāo)量處理機(jī)中存在n條這樣的流水線,其ILP為nk。12341234整數(shù)寄存器123412345656浮點(diǎn)數(shù)寄存器存儲(chǔ)器圖2-26常見(jiàn)的超標(biāo)量處理機(jī)組成返回上一張2023/7/23余臘生版權(quán)所有,違者必究5-4指令的單發(fā)射與多發(fā)射處理機(jī)從指令存儲(chǔ)單元(或指令分配單元)取得指令的過(guò)程稱(chēng)為“發(fā)射”。如果一個(gè)處理機(jī)在單個(gè)時(shí)鐘周期中只能取出一條指令供執(zhí)行,就稱(chēng)為單發(fā)射處理機(jī)。如果在一個(gè)時(shí)鐘周期內(nèi)可以同時(shí)取得多條指令的處理機(jī)可以稱(chēng)為多發(fā)射處理機(jī)。時(shí)鐘周期指令I(lǐng)1I2I351234IFIDEXWRIFIDEXWRIFIDEXWR時(shí)鐘周期指令I(lǐng)6I1I2I3I4I512345EXWRIFIDIFIDEXWRIFIDEXWRIFIDEXWRIFIDEXWRIFIDEXWR(a)單發(fā)射

(b)多發(fā)射圖2-28單發(fā)射與多發(fā)射工作方式比較返回上一張2023/7/23余臘生版權(quán)所有,違者必究5-5超標(biāo)量流水線處理機(jī)超標(biāo)量流水線的發(fā)射策略

已經(jīng)指出,限制指令級(jí)并行性的3種因素是:1.結(jié)構(gòu)相關(guān),即資源沖突;2.控制相關(guān);3.數(shù)據(jù)相關(guān),即WR相關(guān)、RW相關(guān)、WW相關(guān)。在超標(biāo)量流水中,上述相關(guān)的存在,使問(wèn)題變得更加復(fù)雜化。因此超標(biāo)量流水線的調(diào)度,即指令的發(fā)射和完成策略,對(duì)于充分利用指令級(jí)的并行度,提高超標(biāo)量處理器的性能十分重要。

所謂指令發(fā)射策略包括兩層意思,

一是取指令的次序,另一個(gè)是所取指令的執(zhí)行次序。2023/7/23余臘生版權(quán)所有,違者必究5-6超標(biāo)量流水線處理機(jī)指令發(fā)射(instructionissue)是指啟動(dòng)指令進(jìn)入執(zhí)行段的過(guò)程。指令發(fā)射策略是指指令發(fā)射所用的協(xié)議或規(guī)則。當(dāng)指令按程序的次序發(fā)射時(shí),稱(chēng)之為按序發(fā)射(in-orderissue)。為改善流水線性能,可以將存在相關(guān)性的指令推后發(fā)射,而將后面無(wú)相關(guān)性的指令提前發(fā)射,即不按程序原有次序發(fā)射指令,稱(chēng)之為無(wú)序發(fā)射(out-of-orderissue)。類(lèi)似地,指令的完成也有按序完成和無(wú)序完成之分。一般而言,無(wú)序發(fā)射總導(dǎo)致無(wú)序完成。

2023/7/23余臘生版權(quán)所有,違者必究5-7超標(biāo)量流水線處理機(jī)超標(biāo)量流水線共有3種調(diào)度策略:1.按序發(fā)射按序完成;2.按序發(fā)射無(wú)序完成;3.無(wú)序發(fā)射無(wú)序完成。無(wú)論哪種調(diào)度策略,都要保證程序運(yùn)行的最終結(jié)果是正確的.2023/7/23余臘生版權(quán)所有,違者必究5-8超標(biāo)量流水線處理機(jī)假設(shè)有一個(gè)并行度為2的超標(biāo)量流水線,其結(jié)構(gòu)如圖7(a)所示。它分為取指(F)段、譯碼(D)段、執(zhí)行(E)段和寫(xiě)回(W)段共四段。F、D、W段都是1個(gè)時(shí)鐘周期完成。E段有多個(gè)功能部件:其中LOAD/STORE部件完成D-Cache訪問(wèn)只需1個(gè)時(shí)鐘周期,加法器部件完成加法操作需2個(gè)時(shí)鐘周期,乘法器部件完成乘法操作則需3個(gè)時(shí)鐘周期。加法器和乘法器都已流水化。F段和D段要求指令成對(duì)的輸入。E段有內(nèi)部數(shù)據(jù)定向傳送,結(jié)果生成即可使用。2023/7/23余臘生版權(quán)所有,違者必究5-9超標(biāo)量流水線處理機(jī)使用的程序包含如下6條指令序列:

I1

LOAD

R1,M(A)

;R1←M(A)

I2

ADD

R2,R1

;R2←(R2)+(R1)

I3

ADD

R3,R4

;R3←(R3)+(R4)

I4

MUL

R4,R5

;R4←(R4)×(R5)

I5

LOAD

R6,M(B)

;R6←M(B)

I6

MUL

R6,R7

;R6←(R6)×(R7)上述指令中I1,I2有WR相關(guān),I3,I4有RW相關(guān),I5,I6有WW相關(guān)和WR相關(guān)。2023/7/23余臘生版權(quán)所有,違者必究5-10超標(biāo)量流水線處理機(jī)1

按序發(fā)射圖7(b)給出了按序發(fā)射按序完成的譯碼段、執(zhí)行段、寫(xiě)回段的推進(jìn)情況,而圖7(c)給出了流水線的時(shí)空?qǐng)D。2023/7/23余臘生版權(quán)所有,違者必究5-11超標(biāo)量流水線處理機(jī)我們看到,指令I(lǐng)5與I3,I4無(wú)關(guān),若不推遲寫(xiě)回而是在時(shí)鐘7寫(xiě)回,程序的語(yǔ)義仍然正確。這樣實(shí)現(xiàn)的話,I5先于I4完成,這種情況就是按序發(fā)射無(wú)序完成,其流水線時(shí)空?qǐng)D見(jiàn)圖8所示。雖然總的完成時(shí)間仍是10個(gè)時(shí)鐘周期,但是圖7(b)中的I5不存在了,LOAD/STORE部件的利用率得到了提高。2023/7/23余臘生版權(quán)所有,違者必究5-12超標(biāo)量流水線處理機(jī)2

無(wú)序發(fā)射從按序發(fā)射方式看到,譯碼段只是對(duì)到達(dá)的指令進(jìn)行資源沖突或數(shù)據(jù)相關(guān)性的判測(cè),若無(wú)沖突或相關(guān)性則按序發(fā)射出去,否則指令滯留在譯碼段直到?jīng)_突或相關(guān)性消失再發(fā)射,如圖7(b)中的I2。如果處理器具有前找能力,即后續(xù)的指令中可能有獨(dú)立指令,它與已在流水線上的指令不相關(guān),此時(shí)應(yīng)提前譯碼并執(zhí)行,以充分發(fā)揮超標(biāo)量多條指令流水線的能力。這就是無(wú)序發(fā)射的目的2023/7/23余臘生版權(quán)所有,違者必究5-13超標(biāo)量流水線處理機(jī)2

無(wú)序發(fā)射為實(shí)現(xiàn)無(wú)序發(fā)射,就必須在流水線的譯碼段和執(zhí)行段之間建立緊密的聯(lián)系。一種常用的方法是使用指令窗口,它實(shí)質(zhì)上是一個(gè)緩沖棧。當(dāng)處理器譯碼一條指令后就將它放入指令窗口,只要緩沖器不滿(mǎn),就繼續(xù)取和譯碼后續(xù)的指令。指令由指令窗口發(fā)射到執(zhí)行段。只要滿(mǎn)足兩個(gè)條件:1.指令所需的功能部件是可用的,2.無(wú)相關(guān)性阻礙這條指令的執(zhí)行,那么這條指令即可發(fā)射出去,與取指或譯碼的順序無(wú)關(guān)。2023/7/23余臘生版權(quán)所有,違者必究5-14超標(biāo)量流水線處理機(jī)2

無(wú)序發(fā)射使用指令窗口的超標(biāo)量流水線模型見(jiàn)圖9(a)所示。注意,指令窗口只是譯碼段與執(zhí)行段之間的緩沖機(jī)構(gòu),并不是流水線的一個(gè)獨(dú)立段。在無(wú)序發(fā)射方式下,前述程序的6條指令在流水線上的推進(jìn)情況及流水線時(shí)空?qǐng)D分別示于圖9(b)和(c)中。2023/7/23余臘生版權(quán)所有,違者必究5-15SuperscalarProblemsWemustnowexpandthepotentialproblemsthatarisewithasuperscalarpipelineoveranordinarypipeline:RAWhazardscouldexistbetweenthetwoinstructionsissuedatthesametimeTherearenewpotentialWAWandWARhazardsWeneedtohavetwiceasmanyregisterreadsandwritesasbefore,ourregisterfilemustbeexpandedtoaccommodatethisLoadsandStoresareintegeroperationseveniftheyaredealingwithfloatingpointregisterswemightbereadingfloatingpointregistersforaFPoperationandalsoreading/writingfloatingpointregistersforanFPloadorstoreMaintainingpreciseexceptionsisdifficultbecauseanintegeroperationmayhavealreadycompletedHardwaremustdetecttheseproblems(andquickly)2023/7/23余臘生版權(quán)所有,違者必究5-16CostofaSuperscalarWealreadyhadthemultiplefunctionalunits,sothereisnoaddedcostintermsofhavinganintandaFPinstructionissueandexecuteinparallelThereareaddedcoststhoughforHazarddetectionthecomplexityhereisincreasedbecausenowinstructionsmustbecomparednotonlytoinstructionsfurtherdownthepipeline,buttotheinstructionatthesamestage,plusthereisapotentialfortwiceasmanyinstructionsbeingactiveatonetime!MaintainingpreciseexceptionsTwosetsofbusesintegeroperationsfromintegerregisterstointegerALU&datacacheFPoperationsfromFPregisterstoFPfunctionalunit&datacacheAbilitytoaccessfloatingpointregisterfilebyupto3instructionsduringthesamecycle(aloadorstoreFPintheIDorWBstage,anFPinstructioninIDandanFPinstructioninWB)2023/7/23余臘生版權(quán)所有,違者必究5-17HardwareBasedSpeculationInissuingmultipleinstructionspercycle,branchpredictionmaynotbeaccurateenoughtomaintainareasonableissuerateAhighissueprocessormayneedtoexecuteabrancheveryclockcycle!Toexploitfurtherperformance,wenowlookathardwaretopromotespeculativeinstructionissueHardwarewillpredictthenextinstructionandissueitbeforedeterminingthebranchresultIfpredictingwrong,theinstructionmustbekilledoffbeforeitcanaffectachangetothemachine’sstateitcannotupdateregistersormemoryWeaddanewbuffercalledthereorderbufferThisbufferstorestheresultsofcompletedinstructionsthatwerespeculated,untilthespeculationisproventrueorfalseIftrue,wecanallowtheinstruction’sresultstobewrittentoregisters/memoryIffalse,wemustremoveitandallinstructionsthatfolloweditsincetheywerespeculatedincorrectlyWeAddanewstatetoinstructionexecutioncalledcommittoourTomasulo-basedsuperscalararchitectureShouldtheresultbestoredinthedestinationregister?Thisbecomesthefinalstepforallinstructions2023/7/23余臘生版權(quán)所有,違者必究5-18TheNewArchitectureWillcombine:Tomasulo-basedapproachofreservationstationsfordynamicschedulingmulti-issuesuperscalarseparatelycontrolledintegratedfetchunitwhichwillspeculateoncontroldependencesreorderbuffertotemporarilystoreresultsbeforetheyaremovedtoregisters2023/7/23余臘生版權(quán)所有,違者必究5-19StepsforHardwareWemustenhanceourcontrolhardwarefromTomasulo’sapproachtoincludeInstructioncannotissueifthereorderbufferisfullUponissue,updateregisterstatustoincludereorderbufferentrynumber,andenterreorderbufferentrynumberintodestinationfieldofreservationstation–usethisvaluetorenameregistersifneededExecutionremainsthesamealthoughloadsandstoresarenowbeinghandledbyaseparatememorycontrolunitWriteresultremainsthesameexceptthatvaluesarenotwrittentoregistershere,buttheyareforwardedviaCDBIneachcycle,committheinstructionatthefrontofthereorderbufferifithasreachedthewriteresultstageandthespeculationfortheinstructionwascorrectOtherwise,ifthespeculationfortheinstructionwaswrong,flushtheinstructionandallothersinthereorderbufferuntilyoureachthefirstinstructionfetchedafterthebranchconditionwasdetermined2023/7/23余臘生版權(quán)所有,違者必究5-20ExampleHerewetakeabrieflookatanotherexampleofspeculationThecodeisgivenbelowAssumethereareseparateintegerunitsforeffectiveaddresscalculation,ALUoperations,andbranchconditionevaluationNoticethattherearenoFPoperationshere,soallinstructionsshouldexecutein1cycleWewilllookatthecyclesatwhicheachinstructionissues,executes,andwritestotheCDBwithoutspeculation,andissues,executes,writesandcommitswithspeculationLoop: LD R2,0(R1) DADDIU R2,R2,#1 SD R2,0(R1) DADDIU R1,R1,#4 BNE R2,R3,Loop2023/7/23余臘生版權(quán)所有,違者必究5-21WithoutSpeculationCycle#InstructionIssueExecuteMemAccCDBComments1LDR2,0(R1)1234Firstissue1DADDIUR2,R2,#1156WaitforLD1SDR2,0(R1)237Waitforadd1DADDIUR1,R1,#4234Executedirectly1BNER2,R3,Loop37Waitforadd2LDR2,0(R1)48910WaitforBNE2DADDIUR2,R2,#141112WaitforLD2SDR2,0(R1)5913Waitforadd2DADDIUR1,R1,#4589Waitfor1stBNE2BNER2,R3,Loop613Waitforadd3LDR2,0(R1)7141516Waitfor2ndBNE3DADDIUR2,R2,#171718WaitforLW3SDR2,0(R1)81519Waitforadd3DADDIUR1,R1,#481415Waitfor2ndBNE3BNER2,R3,Loop919Waitforadd2023/7/23余臘生版權(quán)所有,違者必究5-22WithSpeculationCycleInstructionIssueExecMemAccCDBCommitComments1LDR2,0(R1)12345Firstissue1DADDIUR2,R2,#11567WaitforLD1SDR2,0(R1)237Waitforadd1DADDIUR1,R1,#42348Commitinorder1BNER2,R3,Loop378Waitforadd2LDR2,0(R1)45679Nodelay2DADDIUR2,R2,#148910WaitforLD2SDR2,0(R1)5610Waitforadd2DADDIUR1,R1,#456711Commitinorder2BNER2,R3,Loop61011Waitforadd3LDR2,0(R1)7891012Nodelay3DADDIUR2,R2,#17111213WaitforLW3SDR2,0(R1)8913Waitforadd3DADDIUR1,R1,#4891014Commitinorder3BNER2,R3,Loop91314Waitforadd2023/7/23余臘生版權(quán)所有,違者必究5-23DesignIssuesReorderbuffervs.moreregistersWecouldforegothereorderbufferbyprovidingadditionaltemporarystorage–inessence,thetwoarethesamesolution,justaslightlydifferentimplementationBothrequireagooddealmorememorythanweneededwithanordinarypipeline,butbothimproveperformancegreatlyHowmuchshouldwespeculate?Otherfactorscauseourmultiple-issuesuperscalartoslow–cacheissuesorexceptionsforinstance,soalargeamountofspeculationisdefeatedbyotherhardwarefailings,wemighttrytospeculateoveracoupleofbranches,butnotmoreSpeculatingovermultiplebranchesImagineourloophasaselectionstatement,nowwespeculateovertwobranches–speculationovermorethanonebranchgreatlycomplicatesmattersandmaynotbeworthwhile2023/7/23余臘生版權(quán)所有,違者必究5-24Limitations/DifficultiesInherentlimitationstomultiple-issuearethelimitedamountofILPofaprogram:Howmanyinstructionsareindependentofeachother?Howmuchdistanceisavailablebetweenloadinganoperandandusingit?betweenusingandsavingit?Coupledwiththemulti-cyclelatencyforcertaintypesofoperationsthatcauseinconsistenciesintheamountofissuingthatcanbesimultaneousDifficultiesinbuildingtheunderlyinghardwareNeedmultiplefunctionunits(costgrowslinearlywiththenumberofunits)Needanincrease(possiblyverylarge)inmemoryandregister-filebandwidthwhichmighttakeupsignificantspaceonthechipandmayrequirelargersystembussizeswhichturnsintomorepinsComplexityofmultiplefetchesmeansamorecomplexmemorysystem,possiblywithindependentbanksforparallelaccesses2023/7/23余臘生版權(quán)所有,違者必究5-25LimitationsonIssueSizeIdeally,wewouldliketoissueasmanyindependentinstructionssimultaneouslyaspossible,butthisisnotpracticalbecausewewouldhaveto:LookarbitrarilyfaraheadtofindaninstructiontoissueRenameallregisterswhenneededtoavoidWAR/WAWDetermineallregisterandmemorydependencesPredictallbranchesProvideenoughfunctionalunitstoensureallreadyinstructionscanbeissuedWhatisapossiblemaximumwindowsize?Todetermineregisterdependencesoverninstructionsrequiresn2-ncomparisons2000instructions4,000,000comparisons50instructions2450comparisonsWindowsizeshaverangedbetween4and32withsomerecentmachineshavingsizesof2-8Amachinewithwindowsizeof32achievesabout1/5oftheidealspeedupformostbenchmarks2023/7/23余臘生版權(quán)所有,違者必究5-26OtherEffectsWithinfiniteregisters,registerrenamingcaneliminateallWAWandWARhazardsWithTomasulo’sapproach,thereservationstationsoffervirtualregistersMostmachinestodayhaveonlyafewvirtualregistersandperhaps32Intand32FPregistersavailableFigure3.41showstheresultingissuespercyclefordifferentnumbersofregistersSurprisingly,thenumberofregistersdoeshaveadramaticimpactandthat>32registersaredesirableAsidefromregisterrenaming,wehavenamedependenciesonmemoryreferencesThreemodelsofanalysisare:Global(perfectanalysisofallglobalvars)Stackperfect(perfectanalysisofallstackreferences)theseoffersomeimprovement,particularlyin2benchmarksInspection(examineaccessesforinterferenceatcompiletime)None(assumeallreferencesconflict)thesehavesimilarresults,between3-6instructions/cycle2023/7/23余臘生版權(quán)所有,違者必究5-27ExampleProcessorsLet’scomparethreehypotheticalprocessorsanddeterminetheirMIPSratingforthegccbenchmarkProcessor1:simpleMIPS2-issuesuperscalarpipelinewithclockrateof1GHz,CPIof1.0,cachesystemwith.01missesperinstructionProcessor2:deeplypipelinedMIPSwithaclockrateof1.2GHz,CPIof1.2,smallercacheyielding.015missesperinstructionProcessor3:speculativesuperscalarwith64-entrywindowthatachieves50%ofitsidealissueratewithaclockrateof800MHz,asmallcacheyielding.02missesperinstruction(although10%ofthemisspenaltyisnotvisibleduetodynamicscheduling)Assumememoryaccesstime(misspenalty)is100ns2023/7/23余臘生版權(quán)所有,違者必究5-28SolutionFirst,determinetheCPI(includingtheimpactofcachemisses)Processor1:1GHzclock=1nsperclockcyclememoryaccessof100nssomisspenalty=100/1=60cyclescachepenalty=.01*100=1.0cyclesperinstructionoverallCPI=1.0+1.0=2.0Processor2:1.2GHzclock=.83nsperclockcyclemisspenalty=100/.83=120cyclescachepenalty=.015*120=1.8cyclesperinstructionoverallCPI=1.2+1.8=3.0Processor3:800MHzclock=1.25nsperclockcyclemisspenaltytakesaffectonly90%ofthetime,somisspenalty=.90*100/1.25=72cyclescachepenalty=.02*72=1.44overallCPItobecomputednext…2023/7/23余臘生版權(quán)所有,違者必究5-29SolutionContinuedTheCPIofprocessor3requiresabitmoreeffortSincewewerenotgiventheCPI,wehavetocomputeitbyconsideringthenumberofinstructionissuespercycleWitha64-entrywindow,themaximumnumberofinstructionissuespercycleis9,wearetoldthatthisprocessoraverages50%itsidealrate,sothismachineissues4.5instructionspercyclegivingitaprocessorCPI=1/4.5=.22overallCPI=.22+1.44=1.66NowwecandeterminetheMIPSratingforeachProcessor1:1GHz/2.0=500MIPSProcessor2:1.2GHz/3.0=400MIPSProcessor3:800MHz/1.66=482MIPSThe2-issueprocessor(proc1)isagoodcompromisebetweenspeedofclockandissuerate,andyieldsthebestperformance2023/7/23余臘生版權(quán)所有,違者必究5-30超標(biāo)量流水線處理機(jī)典型處理機(jī)結(jié)構(gòu)

Motorola公司的MC88110微處理器、Intel公司的Pentium微處理器都是典型的超標(biāo)量流水線設(shè)計(jì)。前者是RISC機(jī)器,后者具有CISC和RISC兩者的特性。下面只介紹Pentium機(jī)的超標(biāo)量流水線.2023/7/23余臘生版權(quán)所有,違者必究5-31超標(biāo)量流水線處理機(jī)Pentium能在每個(gè)時(shí)鐘周期執(zhí)行兩條指令。它的某些指令完全是以硬連線實(shí)現(xiàn)的,并能在一個(gè)時(shí)鐘周期執(zhí)行完畢(RISC特征);另外一些指令是以微指令來(lái)實(shí)現(xiàn)的,可能需要2-3個(gè)時(shí)鐘周期的執(zhí)行時(shí)間(CISC特征)。因此,Pentium的超標(biāo)量流水線與RISC處理器超標(biāo)量流水線相比,既簡(jiǎn)單又復(fù)雜。簡(jiǎn)單是指它采用的超標(biāo)量技術(shù)簡(jiǎn)單且直截了當(dāng);復(fù)雜是指讓不定長(zhǎng)、不同尋址方式、不同實(shí)現(xiàn)方式的指令流經(jīng)并行度為2的指令流水線是要頗費(fèi)苦心的。2023/7/23余臘生版權(quán)所有,違者必究5-32超標(biāo)量流水線處理機(jī)1

Pentium指令流水線的結(jié)構(gòu)Pentium處理器內(nèi)包含一個(gè)浮點(diǎn)部件(FPU)。浮點(diǎn)運(yùn)算是流水化的,一條浮點(diǎn)運(yùn)算指令分成8段完成。下面主要介紹整數(shù)指令流水線,其結(jié)構(gòu)如圖11所示。2023/7/23余臘生版權(quán)所有,違者必究5-33超標(biāo)量流水線處理機(jī)從圖11中看出,Pentium有兩個(gè)32位的ALU來(lái)完成所有的整數(shù)運(yùn)算和邏輯操作,因而能支持U、V兩條指令流水線的并行執(zhí)行。芯片內(nèi)部獨(dú)立設(shè)置的指令Cache(I-cache)和數(shù)據(jù)Cache(D-cache),其容量各為8KB,是對(duì)流水線的有力支持。兩個(gè)預(yù)取緩沖器,每個(gè)都是32字節(jié),負(fù)責(zé)由I-cache或主存取指令,并緩存其中。指令譯碼器除完成譯碼指令外,還要完成指令配對(duì)檢查。如果遇到轉(zhuǎn)移指令,要在譯碼之后將轉(zhuǎn)移指令地址送至轉(zhuǎn)移目標(biāo)緩沖器BTB進(jìn)行查找??刂芌OM中存放用于控制指令執(zhí)行時(shí)操作順序的微指令。以上3個(gè)部件被U、V兩條流水線共用。2023/7/23余臘生版權(quán)所有,違者必究5-34超標(biāo)量流水線處理機(jī)兩個(gè)地址生成器用于產(chǎn)生(或計(jì)算)存儲(chǔ)器操作數(shù)地址,各種工作模式下的邏輯地址最終要轉(zhuǎn)換成物理地址來(lái)訪問(wèn)D-cache,并由轉(zhuǎn)換后援緩沖器TLB來(lái)加速這種地址轉(zhuǎn)換過(guò)程。D-cache是雙端口的,一個(gè)時(shí)鐘周期能存取兩個(gè)32位數(shù)據(jù)(或一個(gè)64位浮點(diǎn)數(shù))。通用寄存器組有8個(gè)32位整數(shù)寄存器,用于地址計(jì)算、保存ALU的源操作數(shù)和目的操作數(shù)。兩個(gè)32位的ALU都具有一個(gè)時(shí)鐘周期的等待時(shí)間。只有簡(jiǎn)單指令而且沒(méi)有寄存器→存儲(chǔ)器或存儲(chǔ)器→寄存器操作的算術(shù)邏輯指令才能在一個(gè)時(shí)鐘周期執(zhí)行完畢。大多數(shù)簡(jiǎn)單指令是以硬連線實(shí)現(xiàn)的,執(zhí)行段只需1個(gè)時(shí)鐘周期。少數(shù)涉及寄存器→存儲(chǔ)器或存儲(chǔ)器→寄存器操作的算術(shù)邏輯指令,它們需2-3個(gè)時(shí)鐘周期才能執(zhí)行完畢。但由于Pentium具有排序化硬件,允許將這些少數(shù)例外也作為簡(jiǎn)單指令對(duì)待。2023/7/23余臘生版權(quán)所有,違者必究5-35超標(biāo)量流水線處理機(jī)2

流水線的調(diào)度策略Pentium通過(guò)U、V兩條流水線能在每個(gè)時(shí)鐘周期執(zhí)行兩條整數(shù)指令。這兩條流水線都由5段組成,前兩段(PF、D1)是U、V共享的,見(jiàn)圖12(a)所示?,F(xiàn)說(shuō)明如下:預(yù)取(PF)段由I-cache取指令,指令長(zhǎng)度是可變的,存入一個(gè)預(yù)取緩沖器。譯碼1(D1)段譯碼指令確認(rèn)它的操作碼和尋址方式等有關(guān)信息。此段要完成指令配對(duì)檢查和轉(zhuǎn)移指令預(yù)測(cè)。兩條連續(xù)的指令I(lǐng)1、I2前后被譯碼,然后判決是否將這一對(duì)指令并行發(fā)射出去。發(fā)射一對(duì)指令必須滿(mǎn)足以下4個(gè)條件:1.兩條指令是簡(jiǎn)單指令;2.兩條指令間不存在WR相關(guān)和WW相關(guān),即I1的目標(biāo)寄存器既不是I2的源寄存器也不是I2的目標(biāo)寄存器。RW相關(guān)則用發(fā)射策略予以避免;3.每條指令都不同時(shí)含有立即數(shù)和偏移量;4.只有I1指令允許帶有指令前輟。如果不滿(mǎn)足上述條件,只允許I1指令發(fā)射到U流水線的下一段。2023/7/23余臘生版權(quán)所有,違者必究5-36超標(biāo)量流水線處理機(jī)譯碼2(D2)段計(jì)算并產(chǎn)生存儲(chǔ)器操作數(shù)的地址。如果TLB命中,只需1個(gè)時(shí)鐘周期,否則不只1個(gè)時(shí)鐘周期。當(dāng)然不是所有指令都有存儲(chǔ)器操作數(shù),但也必須流經(jīng)這個(gè)段。執(zhí)行(EX)段此段主要是在ALU、桶形移位器或其他功能部件中完成指定的運(yùn)算。需要時(shí)完成D-cache訪問(wèn)。寫(xiě)回(WB)段將運(yùn)算的結(jié)果打入目標(biāo)寄存器和標(biāo)志寄存器。U、V兩條流水線是不等價(jià)的,也不能交換使用。U流水線能執(zhí)行所有的整數(shù)和浮點(diǎn)數(shù)指令,而V流水線只能執(zhí)行簡(jiǎn)單的整數(shù)指令和浮點(diǎn)數(shù)交換這樣的少數(shù)浮點(diǎn)數(shù)指令。U、V兩條流水線的調(diào)度采用按序發(fā)射按序完成策略。檢查合格的一對(duì)指令同時(shí)被發(fā)射到U、V流水線的D2段,這一對(duì)指令也必須同時(shí)離開(kāi)D2段進(jìn)入EX段。如果一條指令在D2段滯留,另一條指令也必須在D2段停頓,如圖12(b)的I1、I2情況所示(時(shí)鐘4)。一旦成對(duì)進(jìn)入EX段,若能同時(shí)執(zhí)行完最好,否則就使U流水線的指令先執(zhí)行完。如圖12(b)所示的指令I(lǐng)3、I4情況是:I3執(zhí)行所需時(shí)間較長(zhǎng),此時(shí)V流水線的I4必須停頓,等待I3執(zhí)行完(時(shí)鐘7)。圖12(b)所示的指令I(lǐng)5、I6情況是:U流水線中的I5執(zhí)行所需時(shí)間較短,那么它可先執(zhí)行完畢并進(jìn)入寫(xiě)回段(時(shí)鐘9)。2023/7/23余臘生版權(quán)所有,違者必究5-37超標(biāo)量流水線處理機(jī)Pentium的超標(biāo)量流水線在每個(gè)時(shí)鐘周期能執(zhí)行兩條簡(jiǎn)單的整數(shù)指令,但一般只能執(zhí)行一條浮點(diǎn)數(shù)指令。這是因?yàn)楦↑c(diǎn)數(shù)指令流水線是8段,而前5段是與U、V流水線的5段共享的,而且某些浮點(diǎn)操作數(shù)是64位,因此除少數(shù)例外(如浮點(diǎn)數(shù)交換指令),浮點(diǎn)數(shù)指令不能與整數(shù)指令同時(shí)執(zhí)行。

2023/7/23余臘生版權(quán)所有,違者必究5-38PentiumII:RISCfeaturesAllRISCfeaturesareimplementedontheexecutionofmicroinstructionsinsteadofmachineinstructionsMicroinstruction-levelpipelinewithdynamicallyscheduledmicrooperationsFetchmachineinstruction(3stages)Decodemachineinstructionintomicroinstructions(2stages)Issuemicroinstructions(2stages,registerrenaming,reorderbufferallocationperformedhere)Executeofmicroinstructions(1stage,floatingpointunitspipelined,executiontakesbetween1and32cycles)Writeback(3stages)Commit(3stages)Superscalarcanissueupto3microoperationsperclockcycleReservationstations(20ofthem)andmultiplefunctionalunits(5ofthem)Reorderbuffer(40entries)andspeculationused2023/7/23余臘生版權(quán)所有,違者必究5-39MoreonthePipelineFunctionalUnitshavethefollowingstages IntegerALU 1 IntegerLoad 3 IntegerMultiply 4 FPadd 3 FPmultiply 5(partiallypipelined–multipliescanstarteveryothercycle) FPdivide 32(notpipelined)Thefetchunitcanfetchupto16bytespercycle,whichisenoughtodeterminehowmuchmoreneedstobefetchedfrommemory(recallinstructionsvaryinlengthfrom1-17bytes)sothefetchmighttake2-3cyclesinall2023/7/23余臘生版權(quán)所有,違者必究5-40CISC指令的RISC實(shí)現(xiàn)指令Cache16KB指令流緩沖器指令流長(zhǎng)度譯碼器譯碼器對(duì)齊段寄存器分配器去重排序緩沖器ROB(簡(jiǎn)單)譯碼器2(復(fù)雜)譯碼器0(簡(jiǎn)單)譯碼器1微代碼指令序列發(fā)生器譯碼后指令隊(duì)列靜態(tài)轉(zhuǎn)移預(yù)測(cè)動(dòng)態(tài)轉(zhuǎn)移預(yù)測(cè)下一個(gè)IPRATIFU1IFU2IFU3ID1ID22023/7/23余臘生版權(quán)所有,違者必究5-41FunctionalUnitArchitectureInstructionfetchedfrominstructioncacheInstructionunitdecodesintomicrocodeMicrocodeissuedtooneofthefunctionalunits(upto3issuespercycle)5functionalunits1setofintegerunits1setofFPunits1branchunit2load/storeunitsFunctionalunitsdirectlyconnectedtodatacacheforquickaccessSecondlevelcacheusedasbackuptobothinstructionanddatacaches2023/7/23余臘生版權(quán)所有,違者必究5-42ReservationStationsTheuseofreservationstationsallowsdynamicandmultipleissuewithareorderbufferunitingallofthistogetherNoticethat2stores,1load,1simpleintegerorMMXand1complexinteger/FP/MMXoperationcanbeissuedatatime2023/7/23余臘生版權(quán)所有,違者必究5-43HandlingSpeculationInstructionfetchanddecodeplacesmicroinstructionsininstructionpoolDispatchandExecutionUnitissuesmicroinstructionsFunctionalunitsareinsideoftheexecutionunitDispatchunitusesspeculationwhenissuingmicroinstructionsAsmicroinstructionsfinish,theydonotwriteresultstoregisters(orcache)butinsteadwaitfortheretireunitTheretireunitwritesallresultsbacktodataregistersand/orcache2023/7/23余臘生版權(quán)所有,違者必究5-44SourceofStallsThisarchitectureisverycomplexandreliesonbeingabletofetchanddecodeinstructionsquicklyTheprocessbreaksdownwhenLessthan3instructionscanbefetchedin1cycleLessthan3instructionscanbeissuedbecauseinstructionshavedifferentnumberofmicrooperationsLimitationofreservationstationsandreorderbufferslotsDatadependencesDatacacheaccessresultsinamissBranchesaremispredictedInthelast3cases,thiscouldcausethereorderbuffertostall,resultinginmultiplemicroinstructionsnotbeingabletocommitforseveralcyclesOverall,thePentiumProhasbetween.2and2.8stallsperinstructiononSPEC95benchmarks,average1+stallperinstructionAndhasanaverageCPIofaround2.52023/7/23余臘生版權(quán)所有,違者必究5-45FallaciesandPitfallsF:ProcessorswithlowerCPIswillalwaysbefasterF:ProcessorswithfasterclockrateswillalwaysbefasterP:EmphasizinganimprovementinCPIbyincreasingissueratewhilesacrificingclockratecanleadtolowerperformanceP:Improvingonlyoneaspectofamultiple-issueprocessorandexpectingoverallperformanceimprovementP:SometimesbiggeranddumberisbetterThisspecificallyreferstousingsimplerbranchpredictionschemesthanmorecomplexones2023/7/23余臘生版權(quán)所有,違者必究5-46超標(biāo)量流水處理機(jī)性能

為便于比較,將單流水線普通標(biāo)量處理機(jī)的指令級(jí)并行度記作(1,1),超標(biāo)量處理機(jī)的指令級(jí)并行度記為(m,1)。

在理想情況下,N條指令在單流水線普通標(biāo)量處理機(jī)上的執(zhí)行時(shí)間為T(mén)(1,1)=(k-N-1)Δt

其中,k是流水線的級(jí)數(shù),Δt是一個(gè)時(shí)鐘周期的時(shí)間長(zhǎng)度。

如果把相同的N條指令在一臺(tái)每個(gè)時(shí)鐘周期發(fā)射m條指令的超標(biāo)量處理機(jī)上執(zhí)行,所需的執(zhí)行時(shí)間為

其中,第一項(xiàng)是第一批m條指令同時(shí)通過(guò)m條指令流水線所需要的執(zhí)行間,而第二項(xiàng)是執(zhí)行其余N-m條指令所需的時(shí)間。這時(shí),每一個(gè)時(shí)鐘周期有m條指令分別通過(guò)m條指令流水線。

超標(biāo)量處理機(jī)相對(duì)于單流水普通標(biāo)量處理機(jī)的加速比為

當(dāng)N→∞時(shí),在沒(méi)有資源沖突,沒(méi)有數(shù)據(jù)相關(guān)和控制相關(guān)的理想情況下超標(biāo)量處理機(jī)的加速比最大為

S(m,1)max=m如果與順序執(zhí)行結(jié)構(gòu)相比,加速比為km

2023/7/23余臘生版權(quán)所有,違者必究5-47超流水線處理機(jī)指令執(zhí)行時(shí)序典型處理機(jī)結(jié)構(gòu)超流水線處理機(jī)性能余臘生版權(quán)所有,違者必究

兩種定義:

一個(gè)周期內(nèi)能夠分時(shí)發(fā)射多條指令的處理機(jī)稱(chēng)為超流水線處理機(jī)。

指令流水線有8個(gè)或更多功能段的流水線處理機(jī)稱(chēng)為超流水線處理機(jī)。提高處理機(jī)性能的不同方法:

超標(biāo)量處理機(jī)是通過(guò)增加硬件資源為代價(jià)來(lái)?yè)Q取處理機(jī)性能的。超流水線處理機(jī)則通過(guò)各硬件部件充分重疊工作來(lái)提高處理機(jī)性能。兩種不同并行性:

超標(biāo)量處理機(jī)采用的是空間并行性

超流水線處理機(jī)采用的是時(shí)間并行性余臘生版權(quán)所有,違者必究指令執(zhí)行時(shí)序每隔1/n個(gè)時(shí)鐘周期發(fā)射一條指令,流水線周期為1/n個(gè)時(shí)鐘周期在超標(biāo)量處理機(jī)中,流水線的有些功能段還可以進(jìn)一步細(xì)分例如:ID功能段可以再細(xì)分為譯碼、讀第一操作數(shù)和讀第二操作數(shù)三個(gè)流水段。也有些功能段不能再細(xì)分,如WR功能段一般不再細(xì)分。因此有超流水線的另外一種定義:有8個(gè)或8個(gè)以上流水段的處理機(jī)稱(chēng)為超流水線處理機(jī)余臘生版權(quán)所有,違者必究每個(gè)時(shí)鐘周期分時(shí)發(fā)送3條指令的超流水線IF時(shí)鐘

周期指令I(lǐng)1I2I3IDEXWR123456I4I5I6IFIDEXWRI7I8I9IFIDEXWRIFIDEXWRIFIDEXWRIFIDEXWRIFIDEXWRIFIDEXWRIFIDEXWR余臘生版權(quán)所有,違者必究典型處理機(jī)結(jié)構(gòu)MIPSR4000處理機(jī)每個(gè)時(shí)鐘周期包含兩個(gè)流水段,是一種很標(biāo)準(zhǔn)的超流水線處理機(jī)結(jié)構(gòu)。指令流水線有8個(gè)流水段有兩個(gè)Cache,指令Cache和數(shù)據(jù)Cache的容量各8KB,每個(gè)時(shí)鐘周期可以訪問(wèn)Cache兩次,因此在一個(gè)時(shí)鐘周期內(nèi)可以從指令Cache中讀出兩條指令,從數(shù)據(jù)Cache中讀出或?qū)懭雰蓚€(gè)數(shù)據(jù)。主要運(yùn)算部件有整數(shù)部件和浮點(diǎn)部件余臘生版權(quán)所有,違者必究余臘生版權(quán)所有,違者必究MIPSR4000處理機(jī)的流水線操作指令CacheIF:取第一條指令 IS:取第二條指令

RF:讀寄存器堆,指令譯碼

EX:執(zhí)行指令 DF:取第一個(gè)數(shù)據(jù)

DS:取第二個(gè)數(shù)據(jù) TC:數(shù)據(jù)標(biāo)志

校驗(yàn);WB:寫(xiě)回結(jié)果指令

譯碼讀寄

存器堆ALU數(shù)據(jù)Cache標(biāo)志檢驗(yàn)寄存器堆IFISRFEXDFDSWBTC余臘生版權(quán)所有,違者必究IF流水線周期當(dāng)前CPU周期ISRFEXDFDSTCWBIFISRFEXDFDSTCWBIFISRFEXDFDSTCWBIFISRFEXDFDSTCWBIFISRFEXDFDSTCWBIFISRFEXDFDSTCWBIFISRFEXDFDSTCWBIFISRFEXDFDSTCWB主時(shí)

周期MIPSR4000正常指令流水線工作時(shí)序余臘生版權(quán)所有,違者必究如果在LOAD指令之后的兩條指令中,任何一條指令要在它的EX流水級(jí)使用這個(gè)數(shù)據(jù),則指令流水線要暫停一個(gè)時(shí)鐘周期采用順序發(fā)射方式余臘生版權(quán)所有,違者必究MIPSR4000正常指令流水線工作時(shí)序暫停IFISRFEXDFDSTCWBISRFEXDFDSTCWBRFEXDFDSTCWBEXDFDSTCWBEXDFDSTWBDFDSTCWBIFISRFI1I2I3I4I5I6運(yùn)行運(yùn)行Load指令使用Load數(shù)據(jù)余臘生版權(quán)所有,違者必究超流水線處理機(jī)性能指令級(jí)并行度為(1,n)的超流水線處理機(jī),執(zhí)行N條指令所的時(shí)間為:超流水線處理機(jī)相對(duì)于單流水線普通標(biāo)量處理機(jī)的加速比為:余臘生版權(quán)所有,違者必究即:

超流水線處理機(jī)的加速比的最大值為:S(1,n)MAX=n2023/7/23余臘生版權(quán)所有,違者必究5-59超標(biāo)量超流水線處理機(jī)指令執(zhí)行時(shí)序典型處理機(jī)結(jié)構(gòu)超標(biāo)量流水線處理機(jī)性能余臘生版權(quán)所有,違者必究

把超標(biāo)量與超流水線技術(shù)結(jié)合在一起,就成為超標(biāo)量超流水線處理機(jī)

指令執(zhí)行時(shí)序超標(biāo)量超流水線處理機(jī)在一個(gè)時(shí)鐘周期內(nèi)分時(shí)發(fā)射指令n次,每次同時(shí)發(fā)射指令m條,每個(gè)時(shí)鐘周期總共發(fā)射指令m×

n條。余臘生版權(quán)所有,違者必究IF時(shí)鐘周期指令I(lǐng)1I2I3IDEXWR12345I4I5I6I7I8I9IFIDEXWRIFIDEXWRIFIDEXWRIFIDEXWRIFIDEXWRIFIDEXWRIFIDEXWRIFIDEXWRIFIDEXWRIFIDEXWRIFIDEXWRI10I11I12每時(shí)鐘周期發(fā)射3次,每次3條指令余臘生版權(quán)所有,違者必究典型處理機(jī)結(jié)構(gòu)DEC公司的Alpha處理機(jī)采用超標(biāo)量超流水線結(jié)構(gòu)。主要由四個(gè)功能部件和兩個(gè)Cache組成:整數(shù)部件EBOX、浮點(diǎn)部件FBOX、地址部件ABOX和中央控制部件IBOX。中央控制部件IBOX可以同時(shí)從指令Cache中讀入兩條指令,同時(shí)對(duì)讀入的兩條指令進(jìn)行譯碼,并且對(duì)這兩條指令作資源沖突檢測(cè),進(jìn)行數(shù)據(jù)相關(guān)性和控制相關(guān)性分析。如果資源和相關(guān)性允許,IBOX就把兩條指令同時(shí)發(fā)射給EBOX、ABOX和FBOX三個(gè)指令執(zhí)行部件中的兩個(gè)。指令流水線采用順序發(fā)射亂序完成的控制方式。在指令Cache中有一個(gè)轉(zhuǎn)移歷史表,實(shí)現(xiàn)條件轉(zhuǎn)移的動(dòng)態(tài)預(yù)測(cè)。在EBOX內(nèi)還有多條專(zhuān)用數(shù)據(jù)通路,可以把運(yùn)算結(jié)果直接送到執(zhí)行部件。余臘生版權(quán)所有,違者必究Alpha21064處理機(jī)共有三條指令流水線

整數(shù)操作流水線和訪問(wèn)存儲(chǔ)器流水線分為7個(gè)流水段,其中,取指令和分析指令為4個(gè)流水段,運(yùn)算2個(gè)流水段,寫(xiě)結(jié)果1個(gè)流水段。浮點(diǎn)操作流水線分為10個(gè)流水段,其中,浮點(diǎn)執(zhí)行部件FBOX的延遲時(shí)間為6個(gè)流水段。所有指令執(zhí)行部件EBOX、IBOX、ABOX和FBOX中都設(shè)置由專(zhuān)用數(shù)據(jù)通路。 析指令為4個(gè)流水段,運(yùn)算2個(gè)流水段,寫(xiě)結(jié)果1個(gè)流水段。浮點(diǎn)操作流水線分為10個(gè)流水段,其中,浮點(diǎn)執(zhí)行部件FBOX的延遲時(shí)間為6個(gè)流水段。所有指令執(zhí)行部件EBOX、IBOX、ABOX和FBOX中都設(shè)置由專(zhuān)用數(shù)據(jù)通路。Alpha21064處理機(jī)的三條指令流水線的平均段數(shù)為8段,每個(gè)時(shí)鐘周期發(fā)射兩條指令。因此,Alpha21064處理機(jī)是超標(biāo)量超流水線處理機(jī)。余臘生版權(quán)所有,違者必究余臘生版權(quán)所有,違者必究IF

取值

SWAP

交換雙發(fā)射指令、轉(zhuǎn)移預(yù)測(cè)I0

指令譯碼

I1

訪問(wèn)通用寄存器堆,發(fā)射校驗(yàn)A1

計(jì)算周期1,IBOX計(jì)算新的PC值A(chǔ)2

計(jì)算周期2,查指令快表WR

寫(xiě)整數(shù)寄存器堆,指令Cache命中檢測(cè)7個(gè)流水段的整數(shù)操作流水線SWAP1IFI0I1A0A1WR234560余臘生版權(quán)所有,違者必究IF

取值

SWAP

交換雙發(fā)

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶(hù)所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶(hù)上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶(hù)因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論