體系結(jié)構(gòu)復(fù)習(xí)資料(1-2章)_第1頁
體系結(jié)構(gòu)復(fù)習(xí)資料(1-2章)_第2頁
體系結(jié)構(gòu)復(fù)習(xí)資料(1-2章)_第3頁
體系結(jié)構(gòu)復(fù)習(xí)資料(1-2章)_第4頁
體系結(jié)構(gòu)復(fù)習(xí)資料(1-2章)_第5頁
已閱讀5頁,還剩128頁未讀 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認(rèn)領(lǐng)

文檔簡介

計算機體系結(jié)構(gòu)復(fù)習(xí)資料計算機體系結(jié)構(gòu)相關(guān)公式1.CPU時間=指令數(shù)*每條指令的時鐘周期數(shù)*時鐘周期所占時間2.Amdahl阿姆達(dá)爾定律——總加速比=3.晶體產(chǎn)量=晶片成品率*(1+單位面積缺陷*晶片面積/α)-α其中,晶片成品率表示因已經(jīng)報廢而無須測試的晶片數(shù),α表示掩膜層數(shù),通常α=4.0。4.平均存儲器訪問時間=命中時間+失效率*失效開銷計算機體系結(jié)構(gòu)相關(guān)公式5.每條指令缺失數(shù)=每條指令存儲器訪問數(shù)*缺失率6.cache索引空間:2index=cache容量/(塊大小*組關(guān)聯(lián)度)經(jīng)驗規(guī)律第一章計算機設(shè)計基本原理集成電路功耗(主要是動態(tài)功耗)功率計算:能量計算:電容性負(fù)載電壓開關(guān)頻率集成電路成本例題:設(shè)單位面積殘次品密度為0.4/cm2,分別求出邊長為1.0cm和1.5cm的晶片的成品率。答:晶片面積分別為1cm2和2.25cm2:面積較大的成品率為:可靠性MTTF平均無故障時間,MTTR平均修復(fù)時間Amdahl定律處理器性能公式CPI=CPU時間=CPI*該程序指令數(shù)*時鐘周期長度CPU時鐘周期數(shù)=第i條指令的執(zhí)行時間第i條指令例題假設(shè)我們有如下測量值:浮點操作頻率為25%,浮點操作指令平均CPI為4.0,其他指令平均CPI為1.33,F(xiàn)PSQR指令的執(zhí)行頻率為2%,F(xiàn)PSQR的平均CPI為20,以下有兩種方案,一種是將FPSQRCPI減少至2,另一種是將所有浮點操作的CPI減少至2.5比較兩種方案性能。第二章指令級并行及其開發(fā)主要內(nèi)容:流水線指令級并行MIPS五段流水線1. 流水線的性能受限于流水線中指令之間的相關(guān)性:結(jié)構(gòu)相關(guān)數(shù)據(jù)相關(guān)(寫后讀RAW,讀后寫WAR,寫后寫WAW)控制相關(guān)CPI流水線=CPI理想+停頓結(jié)構(gòu)相關(guān)+停頓寫后讀+停頓讀后寫

+停頓寫后寫+停頓控制相關(guān)本章研究的內(nèi)容:如何消除這些停頓,使得進入流水線的指令序列運行時能有更好的并行性4.1指令級并行的概念2. 本章所研究的提高指令級并行的技術(shù)(1)循環(huán)展開: 控制相關(guān)停頓(2)基本流水線調(diào)度:數(shù)據(jù)寫后讀停頓(3)指令動態(tài)調(diào)度: 各種數(shù)據(jù)相關(guān)停頓(4)分支預(yù)測: 控制相關(guān)停頓(5)推斷: 所有數(shù)據(jù)/控制相關(guān)停頓(6)多指令流出: 提高理想CPI其他技術(shù):如向量計算機(不在本章討論)研究范圍:一個基本程序塊,如一個循環(huán)體4.1.1 循環(huán)展開調(diào)度的基本方法提高指令級并行的最基本方法:(1)指令調(diào)度

(2)循環(huán)展開一般由編譯器來完成。指令調(diào)度:通過改變指令在程序中的位置,將相關(guān)指令

之間的距離加大到不小于指令執(zhí)行延遲的時

鐘數(shù),使相關(guān)指令成為實際上的無關(guān)指令。操作意義分析: 每次循環(huán)一共使用了五個操作三個操作為實際操作(LD,ADDD,SD)

兩個操作為循環(huán)控制(SUBI,BENZ)事實上,循環(huán)控制所需要的指令數(shù)一般是恒定的,不會因每次循環(huán)所含的操作個數(shù)的多少而變化,但它所花費的時間顯然與循環(huán)次數(shù)有關(guān)---通過增加每次循環(huán)完成的操作來降低循環(huán)次數(shù),從而降低循環(huán)控制所花費的時間。循環(huán)展開:通過多次復(fù)制循環(huán)體(并改變循環(huán)結(jié)束條件)來減少循環(huán)控制對性能的影響(循環(huán)控制指令以及控制相關(guān)引起的停頓)。

循環(huán)展開+指令調(diào)度要注意這幾方面問題:(1)正確性(主要是循環(huán)控制和操作數(shù)偏移量修改)(2)有效性(主要是不同循環(huán)次之間的無關(guān)性)(3)使用不同的寄存器(避免沖突)(4)盡可能減少循環(huán)控制中的測試和分支(5)注意對存儲器數(shù)據(jù)的相關(guān)性分析(6)注意新的相關(guān)性關(guān)鍵:要分析清指令之間存在怎樣的相關(guān)性以及在這種相關(guān)性下指令應(yīng)該如何被修改和調(diào)度。4.1.2 相關(guān)性相關(guān)性指的是一條指令的運行如何依賴于另一條指令的運行。研究相關(guān)性,不但可作為是否可指令調(diào)度的依據(jù),而且可了解程序固有的并行性以及可以獲得的并行性。相關(guān)意味指令的運行、結(jié)果產(chǎn)生的順序有要求,意味指令的并行運行和改變順序可能會產(chǎn)生問題,不意味指令的流水線運行一定會產(chǎn)生停頓。

相關(guān)類型數(shù)據(jù)相關(guān)(datadependence)

名相關(guān)(namedependence)

控制相關(guān)(controldependence)1.數(shù)據(jù)相關(guān)對指令i和j,如果

(1)指令j使用指令i產(chǎn)生的結(jié)果,或

(2)指令j與指令k數(shù)據(jù)相關(guān),指令k與指令i數(shù)據(jù)相關(guān)(傳遞性)分析數(shù)據(jù)相關(guān)的主要工作:(1)確定指令的相關(guān)性(2)確定數(shù)據(jù)的計算順序(3)確定最大并行性數(shù)據(jù)相關(guān)是程序相關(guān)性中最本質(zhì)的相關(guān)性之一。2.名相關(guān)兩條指令使用相同的寄存器或內(nèi)存單元(稱為名),但它們之間沒有數(shù)據(jù)流。指令j和指令i之間的名相關(guān)有以下兩種:(1)反相關(guān):指令i先執(zhí)行,指令j寫的名是指令i讀

的名(讀后寫相關(guān))。(2)輸出相關(guān):指令i和指令j寫的是同一個寄存器或內(nèi)

存單元(寫后寫相關(guān))。

LOOP: LD F0,0(R1) ADDD F4,F0,F2 SD 0(R1),F4 LD F0,-8(R1) ADDD F4,F0,F2 SD -8(R1),F4......名相關(guān)不能改變指令順序,但由于沒有數(shù)據(jù)流,但可以通過改變操作數(shù)名來消除名相關(guān),稱為重命名(renaming)技術(shù):

LOOP: LD F0,0(R1) ADDD F4,F0,F2 SD 0(R1),F4 LD F8,-8(R1) ADDD F12,F8,F2 SD -8(R1),F12......3.控制相關(guān)分支指令引起的相關(guān),如果一條指令是否執(zhí)行的情況依賴于一條分支指令,則稱它與該分支指令控制相關(guān)。例

ifp1{s1};ifp2{s2};=>s1控制相關(guān)于p1,s2控制相關(guān)于p2s1與p2、s2與p1控制無關(guān)。基本處理原則(1)與控制相關(guān)的指令不能移到分支指令之前;(2)與控制無關(guān)的指令不能移到分支指令之后;減少或消除控制相關(guān)的方法是減少或消除分支指令。可能的數(shù)據(jù)冒險4.3控制相關(guān)的動態(tài)解決技術(shù)上一章解決控制相關(guān):(1)“凍結(jié)”或“排空”流水線的方法(2)“預(yù)測分支失敗”的方法(3)“預(yù)測分支成功”的方法(4)“延遲分支”的方法

a)從前調(diào)度

b)從目標(biāo)處調(diào)度

c)從失敗處調(diào)度除了“延遲分支”方法的“從前調(diào)度”以外,性能的獲得都是以預(yù)測成功為前提。如果預(yù)測在 編譯時進行(或固定)

----控制相關(guān)的靜態(tài)解決技術(shù) 執(zhí)行時進行動態(tài)進行

----控制相關(guān)的動態(tài)解決技術(shù)上一章的方法都是靜態(tài)解決技術(shù)。4.3.1減少分支延遲:分支預(yù)測緩沖技術(shù)基本思想:基于該分支指令的歷史記錄----根據(jù)該分支指令在最近一次或幾次的運行情況(分支成功或失敗),來預(yù)測該分支指令的本次運行情況(分支成功或失敗)。實現(xiàn)方法:建立一片緩沖區(qū),記錄各運行過的分支指令的運行情況(分支成功或失敗)。緩沖區(qū)如何尋址----根據(jù)分支指令地址的低位,究竟 多少位取決于緩沖區(qū)大小。緩沖區(qū)的內(nèi)容----預(yù)測位,其長度(多少位)決定能 記錄該指令前多少次運行情況。分支指令的執(zhí)行過程:(1)現(xiàn)場保留。(2)按預(yù)測方向取后繼指令。(3)得到分支結(jié)果后 如果預(yù)測成功,繼續(xù)運行; 如果預(yù)測失敗,恢復(fù)保留的現(xiàn)場,從分支處重新 執(zhí)行;(4)修改預(yù)測位。(1)預(yù)測位長度為1預(yù)測位內(nèi)容:記錄該指令最近一次分支是否成功,

如“1”表示分支成功,“0”表示分

支失敗。預(yù)測方法: 如果該指令最近一次分支成功則預(yù)測 分支成功,反之則預(yù)測分支失敗。預(yù)測位修改:如果實際運行該指令發(fā)現(xiàn)分支成功,則 置預(yù)測位為“1”,反之為“0”。(2)預(yù)測位長度為n預(yù)測位內(nèi)容:為0到2n-1計數(shù)器,每次分支結(jié)果 出來后,如分支成功則加1,分支失 則減1,計數(shù)器值增加到2n-1后不 再增加,減小到0后不再減小。預(yù)測方法: 如果計數(shù)器值大于或等于最大值的一 半2n-1,預(yù)測分支成功,反之預(yù)測分 支失敗。N為2時的預(yù)測位:實際試驗:(1)預(yù)測位為2和預(yù)測位為n的預(yù)測性能差別不大。(2)預(yù)測緩沖區(qū)大小增加到4096個記錄項后預(yù)測性能不再明顯增加(只用取指令地址的低12位)(3)在預(yù)測位為2,預(yù)測緩沖區(qū)為4096個記錄項情況下,預(yù)測準(zhǔn)確率為82%99%,即預(yù)測失敗率為

1%18%。起作用的前提:目標(biāo)地址的計算要快于分支結(jié)果計算。1. 基本流水線的數(shù)據(jù)相關(guān)解決方法:采用定向技術(shù)(相關(guān)隱藏)停頓2. 解決停頓的方法:靜態(tài)調(diào)度方法(編譯器)產(chǎn)生于60年代,目前比較流行。動態(tài)調(diào)度方法(處理器)產(chǎn)生于更早時期,目前在一些RISC機中仍在采用。4.2指令的動態(tài)調(diào)度3. 動態(tài)調(diào)度的優(yōu)點:能處理某些在編譯時無法知道的相關(guān)情況能簡化編譯器的設(shè)計使代碼適合移植4. 動態(tài)調(diào)度的主要缺點:硬件復(fù)雜度大調(diào)度的范圍比較小4.2.1 動態(tài)調(diào)度的原理1.基本流水線的最大問題是指令必須順序流出:例如: DIVD F0,F2,F4 ADDD F10,F0,F8 SUBD F12,F8,F14SUBD指令并不與前面指令數(shù)據(jù)相關(guān),但仍需等待。2.動態(tài)調(diào)度的解決方法:

(1)結(jié)構(gòu)相關(guān):設(shè)置多個功能部件或功能部件流水化

(2)數(shù)據(jù)相關(guān):掛起停頓:后續(xù)指令全部被停頓掛起:后續(xù)指令仍可執(zhí)行,被流出(如果沒有數(shù)據(jù)相關(guān))或也被掛起(如果有數(shù)據(jù)相關(guān))處理器中:可以有多條指令同時被執(zhí)行(多個功能部件)可以有多條指令被掛起(一旦解除數(shù)據(jù)相關(guān)則運行)指令運行亂序亂序帶來的問題:異常處理比較復(fù)雜,難以確定和恢復(fù)現(xiàn)場。3.指令譯碼階段分成兩個階段:流出(Issue,IS):指令譯碼,檢查結(jié)構(gòu)相關(guān)(停頓)讀操作數(shù)(ReadOperands,RO):檢查數(shù)據(jù)相關(guān)(掛起)4.2.2 動態(tài)算法之二:Tomasulo算法1.采用于IBM360/91浮點部件(1967年);2.將記分牌技術(shù)和寄存器重命名技術(shù)結(jié)合起來,更有效地解決寫后寫、讀后寫相關(guān);3.寄存器重命名技術(shù)使得在不改變指令系統(tǒng)前提下實際寄存器數(shù)量得到增加。開發(fā)這種技術(shù)的原因:IBM360/91一方面需要獲得很高的浮點性能,一方面又希望整個360系列只用一個指令系統(tǒng)和編譯器只能有四個浮點寄存器,指令和指令之間較易產(chǎn)生讀后寫、寫后寫相關(guān)通過寄存器重命名技術(shù)增加實際的寄存器數(shù)量TomasuloOrganizationAdvancedComputerArchitecture5AllresultsfromFPfunc.unitsandloadsarebroadcastedontheCBDThereservationstationsholdinstructionsthatbeenissuedandareawaitingexecutionatafunctionunit保留站一旦浮點運算指令流出,進入保留站保留站記錄指令的操作,如果任何一個操作數(shù)就緒,則將其值立即取入保留站,使得指令執(zhí)行時無需再訪問相應(yīng)的寄存器(解決了讀后寫)保留站中相關(guān)指令之間的數(shù)據(jù)傳遞直接進行(不通過浮點寄存器),使得保留站中某些指令可以沒有目的寄存器(減少了寄存器使用量增加了寄存器數(shù)量)如果保留站中有兩條指令的目的寄存器相同,則前面指令的目的寄存器會被刪除(解決了寫后寫)基本結(jié)構(gòu)(DLX)三個FP加法保留站可記錄三條浮點加減法指令兩個FP乘法保留站可記錄兩條浮點乘除法指令六個取緩沖可記錄六條讀存儲器指令三個存緩沖可記錄三條寫存儲器指令(149頁圖4.5)浮點運算指令留在保留站的原因是等待操作數(shù)的形成(寫后讀)或等待運算操作的完成訪存指令留在存取緩沖的原因是等待訪存操作的完成或等待存操作數(shù)的形成指令運行過程(1)指令流出(IS):取一條浮點指令,如果有相應(yīng)的空閑保留站就流出,如果操作數(shù)就緒(在寄存器中)就將值送入保留站;如果是訪存指令,有空的緩沖則流出。否則等待。 解決了結(jié)構(gòu)相關(guān)。

(2)執(zhí)行(EX):如果操作數(shù)未就緒,監(jiān)視公共數(shù)據(jù)總線等待結(jié)果(某個操作完成后會以廣播方式通知所有等待該結(jié)果的保留站),當(dāng)兩個操作數(shù)都就緒則開始運行。 解決了寫后讀相關(guān)。(3)寫結(jié)果(WB):結(jié)果計算完,寫入公共數(shù)據(jù)總線,廣播至所有等待該結(jié)果的保留站和目的寄存器(如果存在)。數(shù)據(jù)結(jié)構(gòu)(1)指令狀態(tài)表:表示正在執(zhí)行的各指令處于三步中

的哪一步。(2)寄存器狀態(tài)表:表示各寄存器分別是哪一個保留站的目的寄存器。(3)保留站:一共有六個域

Busy:該保留站是否空閑

Op:對操作數(shù)S1、S2的操作

Vj,Vk:操作數(shù)值

Qj,Qk:將產(chǎn)生操作數(shù)值的保留站號,為零表示操

作數(shù)值已在Vj、Vk中或不需要。 (4)取緩沖:一共有兩個域

Busy:該保留站是否空閑

Address:地址值

(5)存緩沖:一共有四個域

Busy:該保留站是否空閑

Address:地址值

Vj:操作數(shù)值

Qj:將產(chǎn)生操作數(shù)值的保留站號,為

零表示操作數(shù)值已在Vj中。TomasuloExampleCycle0InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R2Load1NoLDF245+R3Load2NoMULTF0F2F4Load3NoSUBDF8F6F2DIVDF10F0F6ADDDF6F8F2ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk0Add1No0Add2No0Add3No0Mult1No0Mult2NoRegisterresultstatusClockF0F2F4F6F8F10F12...F300FUAdvancedComputerArchitecture8Latency:load1,add2,multiply10anddivide40clockcyclesTomasuloExampleCycle1InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R21Load1Yes34+R2LDF245+R3Load2NoMULTF0F2F4Load3NoSUBDF8F6F2DIVDF10F0F6ADDDF6F8F2ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk0Add1No0Add2No

Add3No0Mult1No0Mult2NoRegisterresultstatusClockF0F2F4F6F8F10F12...F301FULoad1AdvancedComputerArchitecture9TomasuloExampleCycle2InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R21Load1Yes34+R2LDF245+R32Load2Yes45+R3MULTF0F2F4Load3NoSUBDF8F6F2DIVDF10F0F6ADDDF6F8F2ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk0Add1No0Add2No

Add3No0Mult1No0Mult2NoRegisterresultstatusClockF0F2F4F6F8F10F12...F302FULoad2Load1AdvancedComputerArchitecture10Note:Unlike6600,canhavemultipleloadsoutstandingAdvancedComputerArchitecture11TomasuloExampleCycle3InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R213Load1Yes34+R2LDF245+R32Load2Yes45+R3MULTF0F2F43Load3NoSUBDF8F6F2DIVDF10F0F6ADDDF6F8F2ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk0Add1No0Add2No

Add3NoR(F4)Load20Mult1YesMULTD0Mult2NoRegisterresultstatusClockF0F2F4F6F8F10F12...F303FUMult1Load2Load1Note:registernamesareremoved(“renamed”)inReservationStations;MULTissuedvs.scoreboardLoad1completing;whatiswaitingforLoad1?TomasuloExampleCycle4InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R2134Load1NoLDF245+R324Load2Yes45+R3MULTF0F2F43Load3NoSUBDF8F6F24DIVDF10F0F6ADDDF6F8F2ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQkLoad20Add1YesSUBDM(34+R2)0Add2No

Add3NoR(F4)Load20Mult1YesMULTD0Mult2NoRegisterresultstatusClockF0F2F4F6F8F10F12...F304FUMult1Load2M(34+R2)Add1AdvancedComputerArchitecture12?Load2completing;whatiswaitingforit?TomasuloExampleCycle5InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R2134Load1NoLDF245+R3245Load2NoMULTF0F2F43Load3NoSUBDF8F6F24DIVDF10F0F65ADDDF6F8F2ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk

2Add1YesSUBDM(34+R2) 0Add2No

Add3No10Mult1YesMULTDM(45+R3) 0Mult2YesDIVDM(45+R3)R(F4)M(34+R2)Mult1RegisterresultstatusClockF0F2F4F6F8F10F12...F305FUMult1M(45+R3)M(34+R2)Add1Mult2AdvancedComputerArchitecture13TomasuloExampleCycle6InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R2134Load1NoLDF245+R3245Load2NoMULTF0F2F43Load3NoSUBDF8F6F24DIVDF10F0F65ADDDF6F8F26ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk1Add1YesSUBDM(34+R2)0Add2YesADDD Add3No9Mult1YesMULTDM(45+R3)0Mult2YesDIVDM(45+R3)M(45+R3)Add1R(F4)M(34+R2)Mult1RegisterresultstatusClockF0F2F4F6F8F10F12...F306FUMult1M(45+R3)Add2Add1Mult2AdvancedComputerArchitecture14?IssueADDDherevs.scoreboard?TomasuloExampleCycle7InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R2134Load1NoLDF245+R3245Load2NoMULTF0F2F43Load3NoSUBDF8F6F247DIVDF10F0F65ADDDF6F8F26ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk0Add1YesSUBDM(34+R2)0Add2YesADDD Add3No8Mult1YesMULTDM(45+R3)0Mult2YesDIVDM(45+R3)M(45+R3)Add1R(F4)M(34+R2)Mult1RegisterresultstatusClockF0F2F4F6F8F10F12...F307FUMult1M(45+R3)Add2Add1Mult2AdvancedComputerArchitecture15?Add1completing;whatiswaitingforit?TomasuloExampleCycle8InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R2134Load1NoLDF245+R3245Load2NoMULF0F2F43Load3NoSUBDF8F6F2478DIVDF10F0F65ADDDF6F8F26ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk0Add1No2Add2YesADDDM()-M()M(45+R3)0Add3No7Mult1YesMULTM(45+R3)R(F4)0Mult2YesDIVDM(34+R2)Mult1RegisterresultsClockF0F2F4F6F8F10F12...F308FUMult1M(45+R3)Add2M()-M()Mult2AdvancedComputerArchitecture16TomasuloExampleCycle9InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R2134Load1NoLDF245+R3245Load2NoMULTF0F2F43Load3NoSUBDF8F6F2478DIVDF10F0F65ADDDF6F8F26ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk0Add1No1Add2YesADDDM()–M()0Add3No6Mult1YesMULTDM(45+R3)0Mult2YesDIVDM(45+R3)R(F4)M(34+R2)Mult1RegisterresultstatusClockF0F2F4F6F8F10F12...F309FUMult1M(45+R3)Add2M()–M()Mult2AdvancedComputerArchitecture17InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R2134Load1NoLDF245+R3245Load2NoMULTF0F2F43Load3NoSUBDF8F6F2478DIVDF10F0F65ADDDF6F8F2610ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk0Add1No0Add2YesADDDM()–M()0Add3No5Mult1YesMULTDM(45+R3)0Mult2YesDIVDM(45+R3)R(F4)M(34+R2)Mult1RegisterresultstatusClockF0F2F4F6F8F10F12...F3010FUMult1M(45+R3)Add2M()–M()Mult2TomasuloExampleCycle10AdvancedComputerArchitecture18?Add2completing;whatiswaitingforit?TomasuloExampleCycle11AdvancedComputerArchitecture19?WriteresultofADDDherevs.scoreboard?TomasuloExampleCycle12InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R2134Load1NoLDF245+R3245Load2NoMULTF0F2F43Load3NoSUBDF8F6F2467DIVDF10F0F65ADDDF6F8F261011ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk0Add1No0Add2No0Add3No3Mult1YesMULTDM(45+R3)0Mult2YesDIVDR(F4)M(34+R2)Mult1RegisterresultstatusClockF0F2F4F6F8F10F12...F3012FUMult1M(45+R3)(M-M)+M()M()–M()Mult2AdvancedComputerArchitecture20?Note:allquickinstructionscompletealreadyTomasuloExampleCycle13InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R2134Load1NoLDF245+R3245Load2NoMULTF0F2F43Load3NoSUBDF8F6F2478DIVDF10F0F65ADDDF6F8F261011ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk0Add1No0Add2No Add3No2Mult1YesMULTDM(45+R3)0Mult2YesDIVDR(F4)M(34+R2)Mult1RegisterresultstatusClockF0F2F4F6F8F10F12...F3013FUMult1M(45+R3)(M–M)+M()M()–M()Mult2AdvancedComputerArchitecture21TomasuloExampleCycle14InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R2134Load1NoLDF245+R3245Load2NoMULTF0F2F43Load3NoSUBDF8F6F2478DIVDF10F0F65ADDDF6F8F261011ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk0Add1No0Add2No0Add3No1Mult1YesMULTDM(45+R3)0Mult2YesDIVDR(F4)M(34+R2)Mult1RegisterresultstatusClockF0F2F4F6F8F10F12...F3014FUMult1M(45+R3)(M–M)+M()M()–M()Mult2AdvancedComputerArchitecture22TomasuloExampleCycle15InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R2134Load1NoLDF245+R3245Load2NoMULTF0F2F4315Load3NoSUBDF8F6F2478DIVDF10F0F65ADDDF6F8F261011ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk0Add1No0Add2No Add3No0Mult1YesMULTDM(45+R3)0Mult2YesDIVDR(F4)M(34+R2)Mult1RegisterresultstatusClockF0F2F4F6F8F10F12...F3015FUMult1M(45+R3)(M–M)+M()M()–M()Mult2AdvancedComputerArchitecture23?Mult1completing;whatiswaitingforit?TomasuloExampleCycle16InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R2134Load1NoLDF245+R3245Load2NoMULTF0F2F431516Load3NoSUBDF8F6F2478DIVDF10F0F65ADDDF6F8F261011ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk0Add1No0Add2No Add3No0Mult1No40Mult2YesDIVDM*F4M(34+R2)RegisterresultstatusClockF0F2F4F6F8F10F12...F3016FUM*F4M(45+R3)(M–M)+M()M()–M()Mult2AdvancedComputerArchitecture24?Note:JustwaitingfordivideTomasuloExampleCycle55AdvancedComputerArchitecture25TomasuloExampleCycle56InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R2134Load1NoLDF245+R3245Load2NoMULTF0F2F431516Load3NoSUBDF8F6F2478DIVDF10F0F6556ADDDF6F8F261011ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk0Add1No0Add2No Add3No0Mult1No0Mult2YesDIVDM*F4M(34+R2)RegisterresultstatusClockF0F2F4F6F8F10F12...F3056FUM*F4M(45+R3)(M–M)+M()M()–M()Mult2AdvancedComputerArchitecture26?Mult2completing;whatiswaitingforit?TomasuloExampleCycle57InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R2134Load1NoLDF245+R3245Load2NoMULTF0F2F431516Load3NoSUBDF8F6F2478DIVDF10F0F655657ADDDF6F8F261011ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk0Add1No0Add2No Add3No0Mult1No0Mult2NoRegisterresultstatusClockF0F2F4F6F8F10F12...F3057FUM*F4M(45+R3)(M–M)+M()M()–M()M*F4/M?Again,in-orderissue,out-of-orderexecution,completionAdvancedComputerArchitecture27ComparetoScoreboardCycle62InstructionstatusReadExecutiWriteInstructionjkIssueoperandcompletResultLDF634+R21234LDF245+R35678MULTF0F2F4691920SUBDF8F6F2791112DIVDF10F0F68216162ADDDF6F8F213141622FunctionalunitstatusdestS1S2FUforjFUforkFj?Fk?TimeNameBusyOpFiFjFkQjQkRjRk

Integer Mult1 Mult2 Add0DivideNoNoNoNoNoRegisterresultstatusClockF0F2F4F6F8F10F12...F3062FUAdvancedComputerArchitecture28?WhytakeslongeronScoreboard/6600?TomasuloLoopExampleLoop:LDF00R1F2R1#8MULTDSDSUBIBNEZF4F4R1R1F00R1Loop

AssumeMultiplytakes4clocks Assumefirstloadtakes8clocks(cachemiss?),secondloadtakes4clocks(hit) Tobeclear,willshowclocksforSUBI,BNEZ Reality,integerinstructionsaheadAdvancedComputerArchitecture30LoopExampleCycle0InstructionstatusExecutionWriteInstructionjkiterationIssuecompleteResultBusyAddressLDF00R11Load1NoMULTF4F0F21Load2NoSDF40R11Load3NoQiLDF00R12Store1NoMULTF4F0F22Store2NoSDF40R12Store3NoReservationStationsS1S2RSforjRSforkVjVkQjQkTimeNameBusyOp

0Add1 NoCode:LDF00R10Add2NoMULTF4F0F20Add3NoSDF40R10Mult1No0Mult2NoSUBIR1BNEZR1R1#8LoopRegisterresultstatusClockR1F0F2F4F6F8F10F12...F30080QiAdvancedComputerArchitecture31LoopExampleCycle1InstructionstatusExecutionWriteInstructionjkiterationIssuecompleteResultBusyAddressLDF00R111Load1Yes80MULTF4F0F21Load2NoSDF40R11Load3NoQiLDF00R12Store1NoMULTF4F0F22Store2NoSDF40R12Store3NoReservationStationsS1S2RSforjRSforkVjVkQjQkTimeNameBusyOp

0Add1 NoCode:LDF00R10Add2NoMULTF4F0F20Add3NoSDF40R10Mult1No0Mult2NoSUBIR1BNEZR1R1#8LoopRegisterresultstatusClockR1F0F2F4F6F8F10F12...F30180QiLoad1AdvancedComputerArchitecture32LoopExampleCycle2InstructionstatusExecutionWriteInstructionjkiterationIssuecompleteResultBusyAddressLDF00R111Load1Yes80MULTF4F0F212Load2NoSDF40R11Load3NoQiLDF00R12Store1NoMULTF4F0F22Store2NoSDF40R12Store3NoReservationStationsS1S2RSforjRSforkVjVkQjQkTimeNameBusyOp

0Add1 NoCode:LDF00R10Add2NoMULTF4F0F20Add3NoSDF40R1R(F2)Load10Mult1YesMULTD0Mult2NoSUBIR1BNEZR1R1#8LoopRegisterresultstatusClockR1F0F2F4F6F8F10F12...F30280QiLoad1Mult1AdvancedComputerArchitecture33LoopExampleCycle3InstructionstatusExecutionWriteInstructionjkiterationIssuecompleteResultBusyAddressLDF00R111Load1Yes80MULTF4F0F212Load2NoSDF40R113Load3NoQiLDF00R12Store1Yes80Mult1MULTF4F0F22Store2NoSDF40R12Store3NoReservationStationsS1S2RSforjRSforkVjVkQjQkTimeNameBusyOp

0Add1NoCode:LDF00R10Add2NoMULTF4F0F20Add3NoSDF40R1R(F2)Load10Mult1YesMULTD0Mult2NoSUBIR1BNEZR1R1#8LoopRegisterresultstatusClockR1F0F2F4F6F8F10F12...F30380QiLoad1Mult1AdvancedComputerArchitecture34?Note:MULT1hasnoregistersnamesinRSLoopExampleCycle4InstructionstatusExecutionWriteInstructionjkiterationIssuecompleteResultBusyAddressLDF00R111Load1Yes80MULTF4F0F212Load2NoSDF40R113Load3NoQiLDF00R12Store1Yes80Mult1MULTF4F0F22Store2NoSDF40R12Store3NoReservationStationsS1S2RSforjRSforkVjVkQjQkTimeNameBusyOp

0Add1NoCode:LDF00R10Add2NoMULTF4F0F20Add3NoSDF40R1R(F2)Load10Mult1YesMULTD0Mult2NoSUBIR1BNEZR1R1#8LoopRegisterresultstatusClockR1F0F2F4F6F8F10F12...F30472QiLoad1Mult1AdvancedComputerArchitecture35LoopExampleCycle5InstructionstatusExecutionWriteInstructionjkiterationIssuecompleteResultBusyAddressLDF00R111Load1Yes80MULTF4F0F212Load2NoSDF40R113Load3NoQiLDF00R12Store1Yes80Mult1MULTF4F0F22Store2NoSDF40R12Store3NoReservationStationsS1S2RSforjRSforkVjVkQjQkTimeNameBusyOp

0Add1 NoCode:LDF00R10Add2NoMULTF4F0F20Add3NoSDF40R1R(F2)Load10Mult1YesMULTD0Mult2NoSUBIR1BNEZR1R1#8LoopRegisterresultstatusClockR1F0F2F4F6F8F10F12...F30572QiLoad1Mult1AdvancedComputerArchitecture36LoopExampleCycle6InstructionstatusExecutionWriteInstructionjkiterationIssuecompleteResultBusyAddressLDF00R111Load1Yes80MULTF4F0F212Load2Yes72SDF40R113Load3NoQiLDF00R126Store1Yes80Mult1MULTF4F0F22Store2NoSDF40R12Store3NoReservationStationsS1S2RSforjRSforkVjVkQjQkTimeNameBusyOp

0Add1 NoCode:LDF00R10Add2NoMULTF4F0F20Add3NoSDF40R1R(F2)Load10Mult1YesMULTD0Mult2NoSUBIR1BNEZR1R1#8LoopRegisterresultstatusClockR1F0F2F4F6F8F10F12...F30672QiLoad2Mult1AdvancedComputerArchitecture37?Note:F0neverseesLoad1resultLoopExampleCycle7InstructionstatusExecutionWriteInstructionjkiterationIssuecompleteResultBusyAddressLDF00R111Load1Yes80MULTF4F0F212Load2Yes72SDF40R113Load3NoQiLDF00R126Store1Yes80Mult1MULTF4F0F227Store2NoSDF40R12Store3NoReservationStationsS1S2RSforjRSforkVjVkQjQkTimeNameBusyOp

0Add1 NoCode:LDF00R10Add2NoMULTF4F0F20Add3NoSDF40R10Mult1YesMULTD0Mult2YesMULTDR(F2)R(F2)Load1Load2SUBIR1BNEZR1R1#8LoopRegisterresultstatusClockR1F0F2F4F6F8F10F12...F30772QiLoad2Mult2AdvancedComputerArchitecture38?Note:MULT2hasnoregistersnamesinRSLoopExampleCycle8InstructionstatusExecutionWriteInstructionjkiterationIssuecompleteResultBusyAddressLDF00R111Load1Yes80MULTF4F0F212Load2Yes72SDF40R113Load3NoQiLDF00R126Store1Yes80Mult1MULTF4F0F227Store2Yes72Mult2SDF40R128Store3NoReservationStationsS1S2RSforjRSforkVjVkQjQkTimeNameBusyOp

0Add1 NoCode:LDF00R10Add2NoMULTF4F0F20Add3NoSDF40R10Mult1YesMULTD0Mult2YesMULTDR(F2)R(F2)Load1Load2SUBIR1BNEZR1R1#8LoopRegisterresultstatusClockR1F0F2F4F6F8F10F12...F30872QiLoad2Mult2AdvancedComputerArchitecture39LoopExampleCycle9InstructionstatusExecutionWriteInstructionjkiterationIssuecompleteResultBusyAddressLDF00R1119Load1Yes80MULTF4F0F212Load2Yes72SDF40R113Load3NoQiLDF00R126Store1Yes80Mult1MULTF4F0F227Store2Yes72Mult2SDF40R128Store3NoReservationStationsS1S2RSforjRSforkVjVkQjQkTimeNameBusyOp

0Add1 NoCode:LDF00R10Add2NoMULTF4F0F20Add3NoSDF40R10Mult1YesMULTD0Mult2YesMULTDR(F2)R(F2)Load1Load2SUBIR1BNEZR1R1#8LoopRegisterresultstatusClockR1F0F2F4F6F8F10F12...F30964QiLoad2Mult2AdvancedComputerArchitecture40?Load1completing;whatiswaitingforit?LoopExampleCycle10InstructionstatusExecutionWriteInstructionjkiterationIssuecompleteResultBusyAddressLDF00R111910Load1NoMULTF4F0F212Load2Yes72SDF40R113Load3NoQiLDF00R12610Store1Yes80Mult1MULTF4F0F227Store2Yes72Mult2SDF40R128Store3NoReservationStationsS1S2RSforjRSforkVjVkQjQkTimeNameBusyOp

0Add1 NoCode:LDF00R10Add2NoMULTF4F0F20Add3NoSDF40R14Mult1YesMULTDM(80)R(F2)SUBIR1R1#80Mult2YesMULTDR(F2)Load2BNEZR1LoopRegisterresultstatusClockR1F0F2F4F6F8F10F12...F301064QiLoad2Mult2AdvancedComputerArchitecture41?Load2completing;whatiswaitingforit?LoopExampleCycle11AdvancedComputerArchitecture42LoopExampleCycle12InstructionstatusExecutionWriteInstructionjkiterationIssuecompleteResultBusyAddressLDF00R111910Load1NoMULTF4F0F212Load2NoSDF40R113Load3Yes64QiLDF00R1261011Store1Yes80Mult1MULTF4F0F227Store2Yes72Mult2SDF40R128Store3NoReservationStationsS1S2RSforjRSforkVjVkQjQkTimeNameBusyOp

0Add1 NoCode:LDF00R10Add2NoMULTF4F0F20Add3NoSDF40R12Mult1YesMULTD3Mult2YesMULTDM(80)R(F2)M(72)R(F2)SUBIR1BNEZR1R1#8LoopRegisterresultstatusClockR1F0F2F4F6F8F10F12...F301264QiLoad3Mult2AdvancedComputerArchitecture43LoopExampleCycle13InstructionstatusExecutionWriteInstructionjkiterationIssuecompleteResultBusyAddressLDF00R111910Load1NoMULTF4F0F212Load2NoSDF40R113Load3Yes64QiLDF00R1261011Store1Yes80Mult1MULTF4F0F227Store2Yes72Mult2SDF40R128Store3NoReservationStationsS1S2RSforjRSforkVjVkQjQkTimeNameBusyOp

0Add1 NoCode:LDF00R10Add2NoMUL

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論