多發(fā)射指令的算法細節(jié)._第1頁
多發(fā)射指令的算法細節(jié)._第2頁
多發(fā)射指令的算法細節(jié)._第3頁
多發(fā)射指令的算法細節(jié)._第4頁
多發(fā)射指令的算法細節(jié)._第5頁
已閱讀5頁,還剩57頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

1、Ch 4 指令級并行Embedded System Lab Fall 20124.1 指令級并行(Instruction Level Parallelism) 相關(guān)是程序運行的本質(zhì)特征 相關(guān)帶來數(shù)據(jù)冒險 冒險導(dǎo)致CPU停頓 Stall相關(guān)的分類: 數(shù)據(jù)相關(guān) 結(jié)構(gòu)相關(guān) 控制相關(guān) ILP: 無關(guān)的指令重疊執(zhí)行Loop: LD F0,0(R1)SUBI R2,R2,8SUBI R3,R3,8 ADDD F4,F0,F2 名相關(guān) 另一種相關(guān)稱為名相關(guān)( name dependence): 兩條指令使用同一個名字(register or memory location) 但不交換數(shù)據(jù) 反相關(guān)(Antid

2、ependence) (WAR) Instruction j 所寫的寄存器或存儲單元,與 instruction i 所讀的寄存器或存儲單元相同,注instruction i 先執(zhí)行 輸出相關(guān)(Output dependence) (WAW) Instruction i 和instruction j 對同一寄存器或存儲單元進行寫操作,必須保證兩條指令的寫順序 下列是否有名相關(guān)? 1 Loop: LDF0,0(R1) 2ADDDF4,F0,F2 3SD0(R1),F4 4LDF0,-8(R1) 5ADDDF4,F0,F2 6SD-8(R1),F4 7LDF0,-16(R1) 8ADDDF4,F0

3、,F2 9SD-16(R1),F4 ; 10LDF0,-24(R1) 11ADDDF4,F0,F2 12SD-24(R1),F4 13SUBIR1,R1,#32 14BNEZR1,LOOP 15NOP 如何消除名相關(guān)如何消除名相關(guān)?名相關(guān)的消除 1 Loop: LDF0,0(R1) 2ADDDF4,F0,F2 3SD0(R1),F4 ;drop SUBI & BNEZ 4LDF6,-8(R1) 5ADDDF8,F6,F2 6SD-8(R1),F8 ;drop SUBI & BNEZ 7LDF10,-16(R1) 8ADDDF12,F10,F2 9SD-16(R1),F12 ;d

4、rop SUBI & BNEZ 10LDF14,-24(R1) 11ADDDF16,F14,F2 12SD-24(R1),F16 13SUBIR1,R1,#32;alter to 4*8 14BNEZR1,LOOP 15NOP 這種方法稱為寄存器重命名這種方法稱為寄存器重命名“register renaming”指令級并行的若干定義 基本塊的定義 直線型代碼,無分支 整個程序是由分支語句連接基本塊構(gòu)成 MIPS 的分支指令占15%左右,基本塊的大小在47條指令指令級并行的若干定義 OS代碼中的分支較少負責(zé)資源管理填寫狀態(tài)寄存器填寫控制寄存器設(shè)置控制變量 跨基本塊的并行(循環(huán)級并行) 循

5、環(huán)的特征 控制循環(huán)的分支指令是有執(zhí)行偏好的 絕大多數(shù)是成功的, 預(yù)測比較容易,但必須有預(yù)測方案 流水線的平均CPI Pipeline CPI = Ideal Pipeline CPI + Struct Stalls + RAW Stalls + WAR Stalls + WAW Stalls + Control Stalls 本章研究 減少停頓(stalls)數(shù)的方法和技術(shù)指令集調(diào)度的基本途徑基本途徑軟件方法(編譯器優(yōu)化)Gcc: 17%控制類指令5 instructions + 1 branch在基本塊上,得到更多的并行性挖掘循環(huán)級并行硬件方法動態(tài)調(diào)度方法靜態(tài)與動態(tài)調(diào)度 8086 IO周期和

6、CPU周期 386 指令重疊執(zhí)行 486 指令級并行 動態(tài)指令集調(diào)度Pentium Pro Pentium II,III,IV, AMD Athlon, MIPS R10K R12K, Sun UltraSpac, PowerPC 603,G3,G4,G5(IBM-Motorola-Apple),Alpha 21264 靜態(tài)調(diào)度 Itanium & Transmeta: Crusoe 一個循環(huán)的例子for (i = 1; i = 1000; i+) x(i) = x(i) + y(i); 特征 計算x(i)時沒有相關(guān) 并行方式 最簡單的方法,循環(huán)展開。 采用向量的方式X=X+Y60年代

7、開始 Cray HITACHI NEC Fujitsu目前均采用向量加速部件的形式 GPU DSP簡單循環(huán)及其對應(yīng)的匯編程序for (i=1; i=1000; i+) x(i) = x(i) + s; Loop: LD F0,0(R1);F0=vector element ADDD F4,F0,F2;add scalar from F2 SD 0(R1),F4;store result SUBI R1,R1,8;decrement pointer 8B (DW) BNEZ R1,Loop;branch R1!=zero NOP;delayed branch slotFP 循環(huán)中的相關(guān)Loop:

8、LDF0,0(R1);F0=vector element ADDDF4,F0,F2;add scalar from F2 SD0(R1),F4;store result SUBIR1,R1,8;decrement pointer 8B (DW) BNEZR1,Loop;branch R1!=zero NOP;delayed branch slot產(chǎn)生結(jié)果的指令產(chǎn)生結(jié)果的指令 使用結(jié)果的指令使用結(jié)果的指令所需的延時所需的延時FP ALU opAnother FP ALU op3FP ALU opStore double2 Load doubleFP ALU op1Load doubleStore

9、 double0Integer opInteger op0 需要在哪里加需要在哪里加stalls?(假設(shè)分支在(假設(shè)分支在ID段得到地址和條件)段得到地址和條件)FP 循環(huán)中的Stalls 10 clocks: 是否可以通過調(diào)整代碼順序使stalls減到最小 1 Loop:LDF0,0(R1);F0=vector element 2stall 3ADDD F4,F0,F2;add scalar in F2 4stall 5stall 6 SD0(R1),F4;store result 7 SUBIR1,R1,8;decrement pointer 8B (DW) 8 stall 9 BNEZR

10、1,Loop;branch R1!=zero 10stall;delayed branch slot產(chǎn)生結(jié)果的指令產(chǎn)生結(jié)果的指令 使用結(jié)果的指令使用結(jié)果的指令所需的延時所需的延時FP ALU opAnother FP ALU op3FP ALU opStore double2 Load doubleFP ALU op1Load doubleStore double0Integer opInteger op0FP 循環(huán)中的最少Stalls數(shù) 6 clocks: 通過循環(huán)展開通過循環(huán)展開4次是否可以提高性能次是否可以提高性能? 1 Loop:LDF0,0(R1) 2SUBIR1,R1,8 3ADD

11、DF4,F0,F2 4 stall 5BNEZR1,Loop;delayed branch 6 SD8(R1),F4;altered when move past SUBISwap BNEZ and SD by changing address of SD 1 Loop:LDF0,0(R1);F0=vector element 2stall 3ADDDF4,F0,F2;add scalar in F2 4stall 5stall 6 SD0(R1),F4;store result 7 SUBIR1,R1,8;decrement pointer 8B (DW) 8 stall 9 BNEZR1,

12、Loop;branch R1!=zero 10stall;delayed branch slot循環(huán)展開4次(straightforward way) Rewrite loop to minimize stalls? 1 Loop: LDF0,0(R1) stall 2ADDDF4,F0,F2 stall stall 3SD0(R1),F4 ;drop SUBI & BNEZ 4LDF6,-8(R1) stall 5ADDDF8,F6,F2 stall stall 6SD-8(R1),F8 ;drop SUBI & BNEZ 7LDF10,-16(R1) stall 8ADDD

13、F12,F10,F2 stall stall 9SD-16(R1),F12 ;drop SUBI & BNEZ 10LDF14,-24(R1) stall 11ADDDF16,F14,F2 stall stall 12SD-24(R1),F16 13SUBIR1,R1,#32 stall ;alter to 4*8 14BNEZR1,LOOP 15NOP 15 + 4 x (1+2) + 1 = 28 cycles, or 7 per iteration Assumes R1 is multiple of 4名相關(guān)如何解決名相關(guān)如何解決Stalls數(shù)最小的循環(huán)展開 代碼移動后 SD移動

14、到SUBI后,注意偏移量的修改 Loads移動到SD前,注意偏移量的修改1 Loop: LDF0,0(R1)2LDF6,-8(R1)3LDF10,-16(R1)4LDF14,-24(R1)5ADDDF4,F0,F26ADDDF8,F6,F27ADDDF12,F10,F28ADDDF16,F14,F29SD0(R1),F410SD-8(R1),F811SUBIR1,R1,#3212SD16(R1),F1213BNEZR1,LOOP14SD8(R1),F16; 8-32 = -24 14 clock cycles, or 3.5 per iteration循環(huán)展開示例小結(jié)移動SD到SUBI和BNE

15、Z后,需要調(diào)整SD中的偏移循環(huán)展開對循環(huán)間無關(guān)的程序是有效降低stalls的手段(對循環(huán)級并行).不同次的循環(huán),使用不同的寄存器.指令調(diào)度,必須保證程序運行的結(jié)果不變 指令重排+循環(huán)展開 不做任何優(yōu)化 10000 采用指令重排 6000 4次循環(huán)展開 7000 4次循環(huán)展開+指令重排 3500循環(huán)展開(1/3) Example: 下列程序段存在哪些數(shù)據(jù)相關(guān)? (A,B,C 指向不同的存儲區(qū)且不存在覆蓋區(qū)) for (i=1; i=100; i=i+1) Ai+1 = Ai + Ci; /* S1 */Bi+1 = Bi + Ai+1; /* S2 */ 1. S2使用由S1在同一循環(huán)計算出的

16、Ai+1. 2. S1 使用由S1在前一次循環(huán)中計算的值,同樣S2也使用由S2在前一次循環(huán)中計算的值. 這種存在于循環(huán)間的相關(guān),我們稱為 “l(fā)oop-carried dependence” 這表示循環(huán)間存在相關(guān),不能并行執(zhí)行,它與我們前面的例子中循環(huán)間無關(guān)是有區(qū)別的循環(huán)展開(2/3) Example:A,B,C,D distinct & nonoverlapping for (i=1; i=100; i=i+1) Ai = Ai + Bi; /* S1 */Bi+1 = Ci + Di; /* S2 */1. S1和S2沒有相關(guān),S1和S2互換不會影響程序的正確性 2. 在第一次循環(huán)中

17、,S1依賴于前一次循環(huán)的Bi.循環(huán)展開(3/3)A1 = A1 + B1;for (i=1; i=99; i=i+1) Bi+1 = Ci + Di;Ai+1 = Ai+1 + Bi+1;B101 = C100 + D100;for (i=1; i out-of-order completion 記分牌算法 Tomasulo算法硬件方案之一: 記分牌 記分牌的基本概念示意圖記分牌控制的四階段(1/2)1. Issue指令流出,檢測結(jié)構(gòu)相關(guān) 如果當(dāng)前指令所使用的功能部件空閑,并且沒有其他活動的指令使用相同的目的寄存器(WAW), 記分牌發(fā)射該指令到功能部件,并更新記分牌內(nèi)部數(shù)據(jù),如果有結(jié)構(gòu)相關(guān)或

18、WAW相關(guān),則該指令的發(fā)射暫停,并且也不發(fā)射后繼指令,直到相關(guān)解除. 2. Read operands沒有數(shù)據(jù)相關(guān)時,讀操作數(shù) 如果先前已發(fā)射的正在運行的指令不對當(dāng)前指令的源操作數(shù)寄存器進行寫操作,或者一個正在工作的功能部件已經(jīng)完成了對該寄存器的寫操作,則該操作數(shù)有效. 操作數(shù)有效時,記分牌控制功能部件讀操作數(shù),準備執(zhí)行。 記分牌在這一步動態(tài)地解決了RAW相關(guān),指令可能會亂序執(zhí)行。記分牌控制的四階段(2/2)3.Execution取到操作數(shù)后執(zhí)行 (EX) 接收到操作數(shù)后,功能部件開始執(zhí)行. 當(dāng)計算出結(jié)果后,它通知記分牌,可以結(jié)束該條指令的執(zhí)行. 4.Write resultfinish ex

19、ecution (WR) 一旦記分牌得到功能部件執(zhí)行完畢的信息后,記分牌檢測WAR相關(guān),如果沒有WAR相關(guān),就寫結(jié)果,如果有WAR 相關(guān),則暫停該條指令。Example: DIVDF0,F2,F4 ADDDF10,F0,F8 SUBDF8,F8,F14 CDC 6600 scoreboard 將暫停 SUBD 直到ADDD 讀取操作數(shù)后,才進入WR段處理。思考 記分牌和DLX流水線有什么關(guān)系ISROEXWRScoreboard記分牌的結(jié)構(gòu)1. Instruction status記錄正在執(zhí)行的各條指令處于四步中的哪一步2. Functional unit status記錄功能部件(FU)的狀態(tài)

20、。用9個域記錄每個功能部件的9個參量:Busy指示該部件是否空閑Op該部件所完成的操作Fi其目標寄存器編號Fj, Fk源寄存器編號Qj, Qk產(chǎn)生源操作數(shù)Fj, Fk的功能部件Rj, Rk標識源操作數(shù)Fj, Fk是否就緒的標志,讀走之后設(shè)置為No3. Register result status如果存在功能部件對某一寄存器進行寫操作,指示具體是哪個功能部件對該寄存器進行寫操作。如果沒有指令對該寄存器進行寫操作,則該域為BlankScoreboard ExampleInstruction status ReadExecutionWriteInstructionjkIssueoperands co

21、mplete ResultLDF634+R2LDF245+R3MULTD F0F2F4SUBDF8F6F2DIVDF10F0F6ADDDF6F8F2Functional unit statusdestS1S2FU for j FU for kFj?Fk?TimeNameBusyOpFiFjFkQjQkRjRkIntegerNoMult1NoMult2NoAddNoDivideNoRegister result statusClockF0F2F4F6F8F10F12.F30FU* *加法指令執(zhí)行需要加法指令執(zhí)行需要2 2個周期,乘法需要個周期,乘法需要1010個周期,除法需要個周期,除法需要40

22、40個周期個周期LDLD指令使用指令使用IntegerInteger整型部件整型部件Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21LDF245+ R3MULTDF0F2F4SUBDF8F6F2DIVDF10F0F6ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Ti m e Nam eBusyOpFiFjFkQjQkRjRkIntegerYesLoadF6R2YesMult1NoMult2NoAddNoDivideNoRegis

23、ter result status:ClockF0F2F4F6F8F10 F12.F301FUIntegerScoreboard Example: Cycle 1Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R212LDF245+ R3MULTDF0F2F4SUBDF8F6F2DIVDF10F0F6ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerYesLoadF6

24、R2YesMult1NoMult2NoAddNoDivideNoRegister result status:ClockF0F2F4F6F8F10 F12.F302FUInteger Issue 2nd LD?Scoreboard Example: Cycle 2Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R2123LDF245+ R3MULTDF0F2F4SUBDF8F6F2DIVDF10F0F6ADDDF6F8F2Functional unit status:destS1S2FUF

25、UFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerYesLoadF6R2NoMult1NoMult2NoAddNoDivideNoRegister result status:ClockF0F2F4F6F8F10 F12.F303FUInteger Issue MULT?Scoreboard Example: Cycle 3Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R3MULTDF0F2F4SUBDF8F6F2DIVDF

26、10F0F6ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNoMult1NoMult2NoAddNoDivideNoRegister result status:ClockF0F2F4F6F8F10 F12.F304FUIntegerScoreboard Example: Cycle 4Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R

27、35MULTDF0F2F4SUBDF8F6F2DIVDF10F0F6ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerYesLoadF2R3YesMult1NoMult2NoAddNoDivideNoRegister result status:ClockF0F2F4F6F8F10 F12.F305FUIntegerScoreboard Example: Cycle 5Instruction status:Read Exec WriteInstructionjkIssue

28、 Oper Comp ResultLDF634+ R21234LDF245+ R356MULTDF0F2F46SUBDF8F6F2DIVDF10F0F6ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerYesLoadF2R3YesMult1YesMultF0F2F4IntegerNoYesMult2NoAddNoDivideNoRegister result status:ClockF0F2F4F6F8F10 F12.F306FUMult1 IntegerScoreboa

29、rd Example: Cycle 6Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R3567M ULTDF0F2F46SUBDF8F6F27DIVDF10F0F6ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerYesLoadF2R3NoMult1YesMultF0F2F4IntegerNoYesMult2NoAddYesSubF8F6

30、F2IntegerYesNoDivideNoRegister result status:ClockF0F2F4F6F8F10 F12.F307FUMult1 IntegerAdd Read multiply operands?Scoreboard Example: Cycle 7Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R3567MULTDF0F2F46SUBDF8F6F27DIVDF10F0F68ADDDF6F8F2Functional unit st

31、atus:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerYesLoadF2R3NoMult1YesMultF0F2F4IntegerNoYesMult2NoAddYesSubF8F6F2IntegerYesNoDivideYesDivF10F0F6Mult1NoYesRegister result status:ClockF0F2F4F6F8F10 F12.F308FUMult1 IntegerAddDivideScoreboard Example: Cycle 8a (First half of clock cycle)Instr

32、uction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F46SUBDF8F6F27DIVDF10F0F68ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNoMult1YesMultF0F2F4YesYesMult2NoAddYesSubF8F6F2YesYesDivideYesDivF10F0F6Mult1NoYesReg

33、ister result status:ClockF0F2F4F6F8F10 F12.F308FUMult1AddDivideScoreboard Example: Cycle 8b (Second half of clock cycle)Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F469SUBDF8F6F279DIVDF10F0F68ADDDF6F8F2Functional unit status:destS1S2FUFUF

34、j?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNo10 Mult1YesMultF0F2F4YesYesMult2No2 AddYesSubF8F6F2YesYesDivideYesDivF10F0F6Mult1NoYesRegister result status:ClockF0F2F4F6F8F10 F12.F309FUMult1AddDivide Read operands for MULT & SUB? Issue ADDD?Note RemainingScoreboard Example: Cycle 9Instruction status

35、:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F469SUBDF8F6F279DIVDF10F0F68ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNo9 Mult1YesMultF0F2F4NoNoMult2No1 AddYesSubF8F6F2NoNoDivideYesDivF10F0F6Mult1NoYesRegister resul

36、t status:ClockF0F2F4F6F8F10 F12.F3010FUMult1AddDivideScoreboard Example: Cycle 10Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F469SUBDF8F6F27911DIVDF10F0F68ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkInt

37、egerNo8 Mult1YesMultF0F2F4NoNoMult2No0 AddYesSubF8F6F2NoNoDivideYesDivF10F0F6Mult1NoYesRegister result status:ClockF0F2F4F6F8F10 F12.F3011FUMult1AddDivideScoreboard Example: Cycle 11Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F469SUBDF8F6

38、F2791112DIVDF10F0F68ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNo7 Mult1YesMultF0F2F4NoNoMult2NoAddNoDivideYesDivF10F0F6Mult1NoYesRegister result status:ClockF0F2F4F6F8F10 F12.F3012FUMult1Divide Read operands for DIVD?Scoreboard Example: Cycle 12Instructio

39、n status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F469SUBDF8F6F2791112DIVDF10F0F68ADDDF6F8F213Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNo6 Mult1YesMultF0F2F4NoNoMult2NoAddYesAddF6F8F2YesYesDivideYesDivF10F0F6Mult1NoYes

40、Register result status:ClockF0F2F4F6F8F10 F12.F3013FUMult1AddDivideScoreboard Example: Cycle 13Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F469SUBDF8F6F2791112DIVDF10F0F68ADDDF6F8F21314Functional unit status:destS1S2FUFUFj?Fk?Time NameBus

41、yOpFiFjFkQjQkRjRkIntegerNo5 Mult1YesMultF0F2F4NoNoMult2No2 AddYesAddF6F8F2YesYesDivideYesDivF10F0F6Mult1NoYesRegister result status:ClockF0F2F4F6F8F10 F12.F3014FUMult1AddDivideScoreboard Example: Cycle 14Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R3567

42、8MULTDF0F2F469SUBDF8F6F2791112DIVDF10F0F68ADDDF6F8F21314Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNo4 Mult1YesMultF0F2F4NoNoMult2No1 AddYesAddF6F8F2NoNoDivideYesDivF10F0F6Mult1NoYesRegister result status:ClockF0F2F4F6F8F10 F12.F3015FUMult1AddDivideScoreboard Exampl

43、e: Cycle 15Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F469SUBDF8F6F2791112DIVDF10F0F68ADDDF6F8F2131416Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNo3 Mult1YesMultF0F2F4NoNoMult2No0 AddYesAddF6F8F2NoNoDivi

44、deYesDivF10F0F6Mult1NoYesRegister result status:ClockF0F2F4F6F8F10 F12.F3016FUMult1AddDivideScoreboard Example: Cycle 16Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F469SUBDF8F6F2791112DIVDF10F0F68ADDDF6F8F2131416Functional unit status:des

45、tS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNo2 Mult1YesMultF0F2F4NoNoMult2NoAddYesAddF6F8F2NoNoDivideYesDivF10F0F6Mult1NoYesRegister result status:ClockF0F2F4F6F8F10 F12.F3017FUMult1AddDivide Why not write result of ADD? WAR Hazard!Scoreboard Example: Cycle 17Instruction status:Read Exec Wri

46、teInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F469SUBDF8F6F2791112DIVDF10F0F68ADDDF6F8F2131416Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNo1 Mult1YesMultF0F2F4NoNoMult2NoAddYesAddF6F8F2NoNoDivideYesDivF10F0F6Mult1NoYesRegister result stat

47、us:ClockF0F2F4F6F8F10 F12.F3018FUMult1AddDivideScoreboard Example: Cycle 18Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F46919SUBDF8F6F2791112DIVDF10F0F68ADDDF6F8F2131416Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjR

48、kIntegerNo0 Mult1YesMultF0F2F4NoNoMult2NoAddYesAddF6F8F2NoNoDivideYesDivF10F0F6Mult1NoYesRegister result status:ClockF0F2F4F6F8F10 F12.F3019FUMult1AddDivideScoreboard Example: Cycle 19Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F4691920SU

49、BDF8F6F2791112DIVDF10F0F68ADDDF6F8F2131416Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNoMult1NoMult2NoAddYesAddF6F8F2NoNoDivideYesDivF10F0F6YesYesRegister result status:ClockF0F2F4F6F8F10 F12.F3020FUAddDivideScoreboard Example: Cycle 20Instruction status:Read Exec Wr

50、iteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F4691920SUBDF8F6F2791112DIVDF10F0F6821ADDDF6F8F2131416Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNoMult1NoMult2NoAddYesAddF6F8F2NoNoDivideYesDivF10F0F6YesYesRegister result status:ClockF0F2F4

51、F6F8F10 F12.F3021FUAddDivide WAR Hazard is now gone. Scoreboard Example: Cycle 21Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F4691920SUBDF8F6F2791112DIVDF10F0F6821ADDDF6F8F213141622Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpF

52、iFjFkQjQkRjRkIntegerNoMult1NoMult2NoAddNo39 DivideYesDivF10F0F6NoNoRegister result status:ClockF0F2F4F6F8F10 F12.F3022FUDivideScoreboard Example: Cycle 22Continue.Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F4691920SUBDF8F6F2791112DIVDF10

53、F0F682161ADDDF6F8F213141622Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNoMult1NoMult2NoAddNo0 DivideYesDivF10F0F6NoNoRegister result status:ClockF0F2F4F6F8F10 F12.F3061FUDivideScoreboard Example: Cycle 61Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F4691920SUBDF8F6F2791112DIVDF10F0F68216162ADDDF6F8F213141622Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNoMult1NoMult2NoAddNoDivideNoRegister result status:ClockF0F2F4F6F8F10 F12.F3062FUScoreboard

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論