![體系結(jié)構(gòu)復(fù)習(xí)資料(1-2章)_第1頁(yè)](http://file4.renrendoc.com/view/f2e579453a39e5ebc0f2b0422a1f760d/f2e579453a39e5ebc0f2b0422a1f760d1.gif)
![體系結(jié)構(gòu)復(fù)習(xí)資料(1-2章)_第2頁(yè)](http://file4.renrendoc.com/view/f2e579453a39e5ebc0f2b0422a1f760d/f2e579453a39e5ebc0f2b0422a1f760d2.gif)
![體系結(jié)構(gòu)復(fù)習(xí)資料(1-2章)_第3頁(yè)](http://file4.renrendoc.com/view/f2e579453a39e5ebc0f2b0422a1f760d/f2e579453a39e5ebc0f2b0422a1f760d3.gif)
![體系結(jié)構(gòu)復(fù)習(xí)資料(1-2章)_第4頁(yè)](http://file4.renrendoc.com/view/f2e579453a39e5ebc0f2b0422a1f760d/f2e579453a39e5ebc0f2b0422a1f760d4.gif)
![體系結(jié)構(gòu)復(fù)習(xí)資料(1-2章)_第5頁(yè)](http://file4.renrendoc.com/view/f2e579453a39e5ebc0f2b0422a1f760d/f2e579453a39e5ebc0f2b0422a1f760d5.gif)
版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
計(jì)算機(jī)體系結(jié)構(gòu)復(fù)習(xí)資料計(jì)算機(jī)體系結(jié)構(gòu)相關(guān)公式1.CPU時(shí)間=指令數(shù)*每條指令的時(shí)鐘周期數(shù)*時(shí)鐘周期所占時(shí)間2.Amdahl阿姆達(dá)爾定律——總加速比=3.晶體產(chǎn)量=晶片成品率*(1+單位面積缺陷*晶片面積/α)-α其中,晶片成品率表示因已經(jīng)報(bào)廢而無須測(cè)試的晶片數(shù),α表示掩膜層數(shù),通常α=4.0。4.平均存儲(chǔ)器訪問時(shí)間=命中時(shí)間+失效率*失效開銷計(jì)算機(jī)體系結(jié)構(gòu)相關(guān)公式5.每條指令缺失數(shù)=每條指令存儲(chǔ)器訪問數(shù)*缺失率6.cache索引空間:2index=cache容量/(塊大小*組關(guān)聯(lián)度)經(jīng)驗(yàn)規(guī)律第一章計(jì)算機(jī)設(shè)計(jì)基本原理集成電路功耗(主要是動(dòng)態(tài)功耗)功率計(jì)算:能量計(jì)算:電容性負(fù)載電壓開關(guān)頻率集成電路成本例題:設(shè)單位面積殘次品密度為0.4/cm2,分別求出邊長(zhǎng)為1.0cm和1.5cm的晶片的成品率。答:晶片面積分別為1cm2和2.25cm2:面積較大的成品率為:可靠性MTTF平均無故障時(shí)間,MTTR平均修復(fù)時(shí)間Amdahl定律處理器性能公式CPI=CPU時(shí)間=CPI*該程序指令數(shù)*時(shí)鐘周期長(zhǎng)度CPU時(shí)鐘周期數(shù)=第i條指令的執(zhí)行時(shí)間第i條指令例題假設(shè)我們有如下測(cè)量值:浮點(diǎn)操作頻率為25%,浮點(diǎn)操作指令平均CPI為4.0,其他指令平均CPI為1.33,F(xiàn)PSQR指令的執(zhí)行頻率為2%,F(xiàn)PSQR的平均CPI為20,以下有兩種方案,一種是將FPSQRCPI減少至2,另一種是將所有浮點(diǎn)操作的CPI減少至2.5比較兩種方案性能。第二章指令級(jí)并行及其開發(fā)主要內(nèi)容:流水線指令級(jí)并行MIPS五段流水線1. 流水線的性能受限于流水線中指令之間的相關(guān)性:結(jié)構(gòu)相關(guān)數(shù)據(jù)相關(guān)(寫后讀RAW,讀后寫WAR,寫后寫WAW)控制相關(guān)CPI流水線=CPI理想+停頓結(jié)構(gòu)相關(guān)+停頓寫后讀+停頓讀后寫
+停頓寫后寫+停頓控制相關(guān)本章研究的內(nèi)容:如何消除這些停頓,使得進(jìn)入流水線的指令序列運(yùn)行時(shí)能有更好的并行性4.1指令級(jí)并行的概念2. 本章所研究的提高指令級(jí)并行的技術(shù)(1)循環(huán)展開: 控制相關(guān)停頓(2)基本流水線調(diào)度:數(shù)據(jù)寫后讀停頓(3)指令動(dòng)態(tài)調(diào)度: 各種數(shù)據(jù)相關(guān)停頓(4)分支預(yù)測(cè): 控制相關(guān)停頓(5)推斷: 所有數(shù)據(jù)/控制相關(guān)停頓(6)多指令流出: 提高理想CPI其他技術(shù):如向量計(jì)算機(jī)(不在本章討論)研究范圍:一個(gè)基本程序塊,如一個(gè)循環(huán)體4.1.1 循環(huán)展開調(diào)度的基本方法提高指令級(jí)并行的最基本方法:(1)指令調(diào)度
(2)循環(huán)展開一般由編譯器來完成。指令調(diào)度:通過改變指令在程序中的位置,將相關(guān)指令
之間的距離加大到不小于指令執(zhí)行延遲的時(shí)
鐘數(shù),使相關(guān)指令成為實(shí)際上的無關(guān)指令。操作意義分析: 每次循環(huán)一共使用了五個(gè)操作三個(gè)操作為實(shí)際操作(LD,ADDD,SD)
兩個(gè)操作為循環(huán)控制(SUBI,BENZ)事實(shí)上,循環(huán)控制所需要的指令數(shù)一般是恒定的,不會(huì)因每次循環(huán)所含的操作個(gè)數(shù)的多少而變化,但它所花費(fèi)的時(shí)間顯然與循環(huán)次數(shù)有關(guān)---通過增加每次循環(huán)完成的操作來降低循環(huán)次數(shù),從而降低循環(huán)控制所花費(fèi)的時(shí)間。循環(huán)展開:通過多次復(fù)制循環(huán)體(并改變循環(huán)結(jié)束條件)來減少循環(huán)控制對(duì)性能的影響(循環(huán)控制指令以及控制相關(guān)引起的停頓)。
循環(huán)展開+指令調(diào)度要注意這幾方面問題:(1)正確性(主要是循環(huán)控制和操作數(shù)偏移量修改)(2)有效性(主要是不同循環(huán)次之間的無關(guān)性)(3)使用不同的寄存器(避免沖突)(4)盡可能減少循環(huán)控制中的測(cè)試和分支(5)注意對(duì)存儲(chǔ)器數(shù)據(jù)的相關(guān)性分析(6)注意新的相關(guān)性關(guān)鍵:要分析清指令之間存在怎樣的相關(guān)性以及在這種相關(guān)性下指令應(yīng)該如何被修改和調(diào)度。4.1.2 相關(guān)性相關(guān)性指的是一條指令的運(yùn)行如何依賴于另一條指令的運(yùn)行。研究相關(guān)性,不但可作為是否可指令調(diào)度的依據(jù),而且可了解程序固有的并行性以及可以獲得的并行性。相關(guān)意味指令的運(yùn)行、結(jié)果產(chǎn)生的順序有要求,意味指令的并行運(yùn)行和改變順序可能會(huì)產(chǎn)生問題,不意味指令的流水線運(yùn)行一定會(huì)產(chǎn)生停頓。
相關(guān)類型數(shù)據(jù)相關(guān)(datadependence)
名相關(guān)(namedependence)
控制相關(guān)(controldependence)1.數(shù)據(jù)相關(guān)對(duì)指令i和j,如果
(1)指令j使用指令i產(chǎn)生的結(jié)果,或
(2)指令j與指令k數(shù)據(jù)相關(guān),指令k與指令i數(shù)據(jù)相關(guān)(傳遞性)分析數(shù)據(jù)相關(guān)的主要工作:(1)確定指令的相關(guān)性(2)確定數(shù)據(jù)的計(jì)算順序(3)確定最大并行性數(shù)據(jù)相關(guān)是程序相關(guān)性中最本質(zhì)的相關(guān)性之一。2.名相關(guān)兩條指令使用相同的寄存器或內(nèi)存單元(稱為名),但它們之間沒有數(shù)據(jù)流。指令j和指令i之間的名相關(guān)有以下兩種:(1)反相關(guān):指令i先執(zhí)行,指令j寫的名是指令i讀
的名(讀后寫相關(guān))。(2)輸出相關(guān):指令i和指令j寫的是同一個(gè)寄存器或內(nèi)
存單元(寫后寫相關(guān))。
LOOP: LD F0,0(R1) ADDD F4,F0,F2 SD 0(R1),F4 LD F0,-8(R1) ADDD F4,F0,F2 SD -8(R1),F4......名相關(guān)不能改變指令順序,但由于沒有數(shù)據(jù)流,但可以通過改變操作數(shù)名來消除名相關(guān),稱為重命名(renaming)技術(shù):
LOOP: LD F0,0(R1) ADDD F4,F0,F2 SD 0(R1),F4 LD F8,-8(R1) ADDD F12,F8,F2 SD -8(R1),F12......3.控制相關(guān)分支指令引起的相關(guān),如果一條指令是否執(zhí)行的情況依賴于一條分支指令,則稱它與該分支指令控制相關(guān)。例
ifp1{s1};ifp2{s2};=>s1控制相關(guān)于p1,s2控制相關(guān)于p2s1與p2、s2與p1控制無關(guān)?;咎幚碓瓌t(1)與控制相關(guān)的指令不能移到分支指令之前;(2)與控制無關(guān)的指令不能移到分支指令之后;減少或消除控制相關(guān)的方法是減少或消除分支指令。可能的數(shù)據(jù)冒險(xiǎn)4.3控制相關(guān)的動(dòng)態(tài)解決技術(shù)上一章解決控制相關(guān):(1)“凍結(jié)”或“排空”流水線的方法(2)“預(yù)測(cè)分支失敗”的方法(3)“預(yù)測(cè)分支成功”的方法(4)“延遲分支”的方法
a)從前調(diào)度
b)從目標(biāo)處調(diào)度
c)從失敗處調(diào)度除了“延遲分支”方法的“從前調(diào)度”以外,性能的獲得都是以預(yù)測(cè)成功為前提。如果預(yù)測(cè)在 編譯時(shí)進(jìn)行(或固定)
----控制相關(guān)的靜態(tài)解決技術(shù) 執(zhí)行時(shí)進(jìn)行動(dòng)態(tài)進(jìn)行
----控制相關(guān)的動(dòng)態(tài)解決技術(shù)上一章的方法都是靜態(tài)解決技術(shù)。4.3.1減少分支延遲:分支預(yù)測(cè)緩沖技術(shù)基本思想:基于該分支指令的歷史記錄----根據(jù)該分支指令在最近一次或幾次的運(yùn)行情況(分支成功或失敗),來預(yù)測(cè)該分支指令的本次運(yùn)行情況(分支成功或失敗)。實(shí)現(xiàn)方法:建立一片緩沖區(qū),記錄各運(yùn)行過的分支指令的運(yùn)行情況(分支成功或失敗)。緩沖區(qū)如何尋址----根據(jù)分支指令地址的低位,究竟 多少位取決于緩沖區(qū)大小。緩沖區(qū)的內(nèi)容----預(yù)測(cè)位,其長(zhǎng)度(多少位)決定能 記錄該指令前多少次運(yùn)行情況。分支指令的執(zhí)行過程:(1)現(xiàn)場(chǎng)保留。(2)按預(yù)測(cè)方向取后繼指令。(3)得到分支結(jié)果后 如果預(yù)測(cè)成功,繼續(xù)運(yùn)行; 如果預(yù)測(cè)失敗,恢復(fù)保留的現(xiàn)場(chǎng),從分支處重新 執(zhí)行;(4)修改預(yù)測(cè)位。(1)預(yù)測(cè)位長(zhǎng)度為1預(yù)測(cè)位內(nèi)容:記錄該指令最近一次分支是否成功,
如“1”表示分支成功,“0”表示分
支失敗。預(yù)測(cè)方法: 如果該指令最近一次分支成功則預(yù)測(cè) 分支成功,反之則預(yù)測(cè)分支失敗。預(yù)測(cè)位修改:如果實(shí)際運(yùn)行該指令發(fā)現(xiàn)分支成功,則 置預(yù)測(cè)位為“1”,反之為“0”。(2)預(yù)測(cè)位長(zhǎng)度為n預(yù)測(cè)位內(nèi)容:為0到2n-1計(jì)數(shù)器,每次分支結(jié)果 出來后,如分支成功則加1,分支失 則減1,計(jì)數(shù)器值增加到2n-1后不 再增加,減小到0后不再減小。預(yù)測(cè)方法: 如果計(jì)數(shù)器值大于或等于最大值的一 半2n-1,預(yù)測(cè)分支成功,反之預(yù)測(cè)分 支失敗。N為2時(shí)的預(yù)測(cè)位:實(shí)際試驗(yàn):(1)預(yù)測(cè)位為2和預(yù)測(cè)位為n的預(yù)測(cè)性能差別不大。(2)預(yù)測(cè)緩沖區(qū)大小增加到4096個(gè)記錄項(xiàng)后預(yù)測(cè)性能不再明顯增加(只用取指令地址的低12位)(3)在預(yù)測(cè)位為2,預(yù)測(cè)緩沖區(qū)為4096個(gè)記錄項(xiàng)情況下,預(yù)測(cè)準(zhǔn)確率為82%99%,即預(yù)測(cè)失敗率為
1%18%。起作用的前提:目標(biāo)地址的計(jì)算要快于分支結(jié)果計(jì)算。1. 基本流水線的數(shù)據(jù)相關(guān)解決方法:采用定向技術(shù)(相關(guān)隱藏)停頓2. 解決停頓的方法:靜態(tài)調(diào)度方法(編譯器)產(chǎn)生于60年代,目前比較流行。動(dòng)態(tài)調(diào)度方法(處理器)產(chǎn)生于更早時(shí)期,目前在一些RISC機(jī)中仍在采用。4.2指令的動(dòng)態(tài)調(diào)度3. 動(dòng)態(tài)調(diào)度的優(yōu)點(diǎn):能處理某些在編譯時(shí)無法知道的相關(guān)情況能簡(jiǎn)化編譯器的設(shè)計(jì)使代碼適合移植4. 動(dòng)態(tài)調(diào)度的主要缺點(diǎn):硬件復(fù)雜度大調(diào)度的范圍比較小4.2.1 動(dòng)態(tài)調(diào)度的原理1.基本流水線的最大問題是指令必須順序流出:例如: DIVD F0,F2,F4 ADDD F10,F0,F8 SUBD F12,F8,F14SUBD指令并不與前面指令數(shù)據(jù)相關(guān),但仍需等待。2.動(dòng)態(tài)調(diào)度的解決方法:
(1)結(jié)構(gòu)相關(guān):設(shè)置多個(gè)功能部件或功能部件流水化
(2)數(shù)據(jù)相關(guān):掛起停頓:后續(xù)指令全部被停頓掛起:后續(xù)指令仍可執(zhí)行,被流出(如果沒有數(shù)據(jù)相關(guān))或也被掛起(如果有數(shù)據(jù)相關(guān))處理器中:可以有多條指令同時(shí)被執(zhí)行(多個(gè)功能部件)可以有多條指令被掛起(一旦解除數(shù)據(jù)相關(guān)則運(yùn)行)指令運(yùn)行亂序亂序帶來的問題:異常處理比較復(fù)雜,難以確定和恢復(fù)現(xiàn)場(chǎng)。3.指令譯碼階段分成兩個(gè)階段:流出(Issue,IS):指令譯碼,檢查結(jié)構(gòu)相關(guān)(停頓)讀操作數(shù)(ReadOperands,RO):檢查數(shù)據(jù)相關(guān)(掛起)4.2.2 動(dòng)態(tài)算法之二:Tomasulo算法1.采用于IBM360/91浮點(diǎn)部件(1967年);2.將記分牌技術(shù)和寄存器重命名技術(shù)結(jié)合起來,更有效地解決寫后寫、讀后寫相關(guān);3.寄存器重命名技術(shù)使得在不改變指令系統(tǒng)前提下實(shí)際寄存器數(shù)量得到增加。開發(fā)這種技術(shù)的原因:IBM360/91一方面需要獲得很高的浮點(diǎn)性能,一方面又希望整個(gè)360系列只用一個(gè)指令系統(tǒng)和編譯器只能有四個(gè)浮點(diǎn)寄存器,指令和指令之間較易產(chǎn)生讀后寫、寫后寫相關(guān)通過寄存器重命名技術(shù)增加實(shí)際的寄存器數(shù)量TomasuloOrganizationAdvancedComputerArchitecture5AllresultsfromFPfunc.unitsandloadsarebroadcastedontheCBDThereservationstationsholdinstructionsthatbeenissuedandareawaitingexecutionatafunctionunit保留站一旦浮點(diǎn)運(yùn)算指令流出,進(jìn)入保留站保留站記錄指令的操作,如果任何一個(gè)操作數(shù)就緒,則將其值立即取入保留站,使得指令執(zhí)行時(shí)無需再訪問相應(yīng)的寄存器(解決了讀后寫)保留站中相關(guān)指令之間的數(shù)據(jù)傳遞直接進(jìn)行(不通過浮點(diǎn)寄存器),使得保留站中某些指令可以沒有目的寄存器(減少了寄存器使用量增加了寄存器數(shù)量)如果保留站中有兩條指令的目的寄存器相同,則前面指令的目的寄存器會(huì)被刪除(解決了寫后寫)基本結(jié)構(gòu)(DLX)三個(gè)FP加法保留站可記錄三條浮點(diǎn)加減法指令兩個(gè)FP乘法保留站可記錄兩條浮點(diǎn)乘除法指令六個(gè)取緩沖可記錄六條讀存儲(chǔ)器指令三個(gè)存緩沖可記錄三條寫存儲(chǔ)器指令(149頁(yè)圖4.5)浮點(diǎn)運(yùn)算指令留在保留站的原因是等待操作數(shù)的形成(寫后讀)或等待運(yùn)算操作的完成訪存指令留在存取緩沖的原因是等待訪存操作的完成或等待存操作數(shù)的形成指令運(yùn)行過程(1)指令流出(IS):取一條浮點(diǎn)指令,如果有相應(yīng)的空閑保留站就流出,如果操作數(shù)就緒(在寄存器中)就將值送入保留站;如果是訪存指令,有空的緩沖則流出。否則等待。 解決了結(jié)構(gòu)相關(guān)。
(2)執(zhí)行(EX):如果操作數(shù)未就緒,監(jiān)視公共數(shù)據(jù)總線等待結(jié)果(某個(gè)操作完成后會(huì)以廣播方式通知所有等待該結(jié)果的保留站),當(dāng)兩個(gè)操作數(shù)都就緒則開始運(yùn)行。 解決了寫后讀相關(guān)。(3)寫結(jié)果(WB):結(jié)果計(jì)算完,寫入公共數(shù)據(jù)總線,廣播至所有等待該結(jié)果的保留站和目的寄存器(如果存在)。數(shù)據(jù)結(jié)構(gòu)(1)指令狀態(tài)表:表示正在執(zhí)行的各指令處于三步中
的哪一步。(2)寄存器狀態(tài)表:表示各寄存器分別是哪一個(gè)保留站的目的寄存器。(3)保留站:一共有六個(gè)域
Busy:該保留站是否空閑
Op:對(duì)操作數(shù)S1、S2的操作
Vj,Vk:操作數(shù)值
Qj,Qk:將產(chǎn)生操作數(shù)值的保留站號(hào),為零表示操
作數(shù)值已在Vj、Vk中或不需要。 (4)取緩沖:一共有兩個(gè)域
Busy:該保留站是否空閑
Address:地址值
(5)存緩沖:一共有四個(gè)域
Busy:該保留站是否空閑
Address:地址值
Vj:操作數(shù)值
Qj:將產(chǎn)生操作數(shù)值的保留站號(hào),為
零表示操作數(shù)值已在Vj中。TomasuloExampleCycle0InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R2Load1NoLDF245+R3Load2NoMULTF0F2F4Load3NoSUBDF8F6F2DIVDF10F0F6ADDDF6F8F2ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk0Add1No0Add2No0Add3No0Mult1No0Mult2NoRegisterresultstatusClockF0F2F4F6F8F10F12...F300FUAdvancedComputerArchitecture8Latency:load1,add2,multiply10anddivide40clockcyclesTomasuloExampleCycle1InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R21Load1Yes34+R2LDF245+R3Load2NoMULTF0F2F4Load3NoSUBDF8F6F2DIVDF10F0F6ADDDF6F8F2ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk0Add1No0Add2No
Add3No0Mult1No0Mult2NoRegisterresultstatusClockF0F2F4F6F8F10F12...F301FULoad1AdvancedComputerArchitecture9TomasuloExampleCycle2InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R21Load1Yes34+R2LDF245+R32Load2Yes45+R3MULTF0F2F4Load3NoSUBDF8F6F2DIVDF10F0F6ADDDF6F8F2ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk0Add1No0Add2No
Add3No0Mult1No0Mult2NoRegisterresultstatusClockF0F2F4F6F8F10F12...F302FULoad2Load1AdvancedComputerArchitecture10Note:Unlike6600,canhavemultipleloadsoutstandingAdvancedComputerArchitecture11TomasuloExampleCycle3InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R213Load1Yes34+R2LDF245+R32Load2Yes45+R3MULTF0F2F43Load3NoSUBDF8F6F2DIVDF10F0F6ADDDF6F8F2ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk0Add1No0Add2No
Add3NoR(F4)Load20Mult1YesMULTD0Mult2NoRegisterresultstatusClockF0F2F4F6F8F10F12...F303FUMult1Load2Load1Note:registernamesareremoved(“renamed”)inReservationStations;MULTissuedvs.scoreboardLoad1completing;whatiswaitingforLoad1?TomasuloExampleCycle4InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R2134Load1NoLDF245+R324Load2Yes45+R3MULTF0F2F43Load3NoSUBDF8F6F24DIVDF10F0F6ADDDF6F8F2ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQkLoad20Add1YesSUBDM(34+R2)0Add2No
Add3NoR(F4)Load20Mult1YesMULTD0Mult2NoRegisterresultstatusClockF0F2F4F6F8F10F12...F304FUMult1Load2M(34+R2)Add1AdvancedComputerArchitecture12?Load2completing;whatiswaitingforit?TomasuloExampleCycle5InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R2134Load1NoLDF245+R3245Load2NoMULTF0F2F43Load3NoSUBDF8F6F24DIVDF10F0F65ADDDF6F8F2ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk
2Add1YesSUBDM(34+R2) 0Add2No
Add3No10Mult1YesMULTDM(45+R3) 0Mult2YesDIVDM(45+R3)R(F4)M(34+R2)Mult1RegisterresultstatusClockF0F2F4F6F8F10F12...F305FUMult1M(45+R3)M(34+R2)Add1Mult2AdvancedComputerArchitecture13TomasuloExampleCycle6InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R2134Load1NoLDF245+R3245Load2NoMULTF0F2F43Load3NoSUBDF8F6F24DIVDF10F0F65ADDDF6F8F26ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk1Add1YesSUBDM(34+R2)0Add2YesADDD Add3No9Mult1YesMULTDM(45+R3)0Mult2YesDIVDM(45+R3)M(45+R3)Add1R(F4)M(34+R2)Mult1RegisterresultstatusClockF0F2F4F6F8F10F12...F306FUMult1M(45+R3)Add2Add1Mult2AdvancedComputerArchitecture14?IssueADDDherevs.scoreboard?TomasuloExampleCycle7InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R2134Load1NoLDF245+R3245Load2NoMULTF0F2F43Load3NoSUBDF8F6F247DIVDF10F0F65ADDDF6F8F26ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk0Add1YesSUBDM(34+R2)0Add2YesADDD Add3No8Mult1YesMULTDM(45+R3)0Mult2YesDIVDM(45+R3)M(45+R3)Add1R(F4)M(34+R2)Mult1RegisterresultstatusClockF0F2F4F6F8F10F12...F307FUMult1M(45+R3)Add2Add1Mult2AdvancedComputerArchitecture15?Add1completing;whatiswaitingforit?TomasuloExampleCycle8InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R2134Load1NoLDF245+R3245Load2NoMULF0F2F43Load3NoSUBDF8F6F2478DIVDF10F0F65ADDDF6F8F26ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk0Add1No2Add2YesADDDM()-M()M(45+R3)0Add3No7Mult1YesMULTM(45+R3)R(F4)0Mult2YesDIVDM(34+R2)Mult1RegisterresultsClockF0F2F4F6F8F10F12...F308FUMult1M(45+R3)Add2M()-M()Mult2AdvancedComputerArchitecture16TomasuloExampleCycle9InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R2134Load1NoLDF245+R3245Load2NoMULTF0F2F43Load3NoSUBDF8F6F2478DIVDF10F0F65ADDDF6F8F26ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk0Add1No1Add2YesADDDM()–M()0Add3No6Mult1YesMULTDM(45+R3)0Mult2YesDIVDM(45+R3)R(F4)M(34+R2)Mult1RegisterresultstatusClockF0F2F4F6F8F10F12...F309FUMult1M(45+R3)Add2M()–M()Mult2AdvancedComputerArchitecture17InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R2134Load1NoLDF245+R3245Load2NoMULTF0F2F43Load3NoSUBDF8F6F2478DIVDF10F0F65ADDDF6F8F2610ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk0Add1No0Add2YesADDDM()–M()0Add3No5Mult1YesMULTDM(45+R3)0Mult2YesDIVDM(45+R3)R(F4)M(34+R2)Mult1RegisterresultstatusClockF0F2F4F6F8F10F12...F3010FUMult1M(45+R3)Add2M()–M()Mult2TomasuloExampleCycle10AdvancedComputerArchitecture18?Add2completing;whatiswaitingforit?TomasuloExampleCycle11AdvancedComputerArchitecture19?WriteresultofADDDherevs.scoreboard?TomasuloExampleCycle12InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R2134Load1NoLDF245+R3245Load2NoMULTF0F2F43Load3NoSUBDF8F6F2467DIVDF10F0F65ADDDF6F8F261011ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk0Add1No0Add2No0Add3No3Mult1YesMULTDM(45+R3)0Mult2YesDIVDR(F4)M(34+R2)Mult1RegisterresultstatusClockF0F2F4F6F8F10F12...F3012FUMult1M(45+R3)(M-M)+M()M()–M()Mult2AdvancedComputerArchitecture20?Note:allquickinstructionscompletealreadyTomasuloExampleCycle13InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R2134Load1NoLDF245+R3245Load2NoMULTF0F2F43Load3NoSUBDF8F6F2478DIVDF10F0F65ADDDF6F8F261011ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk0Add1No0Add2No Add3No2Mult1YesMULTDM(45+R3)0Mult2YesDIVDR(F4)M(34+R2)Mult1RegisterresultstatusClockF0F2F4F6F8F10F12...F3013FUMult1M(45+R3)(M–M)+M()M()–M()Mult2AdvancedComputerArchitecture21TomasuloExampleCycle14InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R2134Load1NoLDF245+R3245Load2NoMULTF0F2F43Load3NoSUBDF8F6F2478DIVDF10F0F65ADDDF6F8F261011ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk0Add1No0Add2No0Add3No1Mult1YesMULTDM(45+R3)0Mult2YesDIVDR(F4)M(34+R2)Mult1RegisterresultstatusClockF0F2F4F6F8F10F12...F3014FUMult1M(45+R3)(M–M)+M()M()–M()Mult2AdvancedComputerArchitecture22TomasuloExampleCycle15InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R2134Load1NoLDF245+R3245Load2NoMULTF0F2F4315Load3NoSUBDF8F6F2478DIVDF10F0F65ADDDF6F8F261011ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk0Add1No0Add2No Add3No0Mult1YesMULTDM(45+R3)0Mult2YesDIVDR(F4)M(34+R2)Mult1RegisterresultstatusClockF0F2F4F6F8F10F12...F3015FUMult1M(45+R3)(M–M)+M()M()–M()Mult2AdvancedComputerArchitecture23?Mult1completing;whatiswaitingforit?TomasuloExampleCycle16InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R2134Load1NoLDF245+R3245Load2NoMULTF0F2F431516Load3NoSUBDF8F6F2478DIVDF10F0F65ADDDF6F8F261011ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk0Add1No0Add2No Add3No0Mult1No40Mult2YesDIVDM*F4M(34+R2)RegisterresultstatusClockF0F2F4F6F8F10F12...F3016FUM*F4M(45+R3)(M–M)+M()M()–M()Mult2AdvancedComputerArchitecture24?Note:JustwaitingfordivideTomasuloExampleCycle55AdvancedComputerArchitecture25TomasuloExampleCycle56InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R2134Load1NoLDF245+R3245Load2NoMULTF0F2F431516Load3NoSUBDF8F6F2478DIVDF10F0F6556ADDDF6F8F261011ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk0Add1No0Add2No Add3No0Mult1No0Mult2YesDIVDM*F4M(34+R2)RegisterresultstatusClockF0F2F4F6F8F10F12...F3056FUM*F4M(45+R3)(M–M)+M()M()–M()Mult2AdvancedComputerArchitecture26?Mult2completing;whatiswaitingforit?TomasuloExampleCycle57InstructionstatusExecutionWriteInstructionjkIssuecompleteResultBusyAddressLDF634+R2134Load1NoLDF245+R3245Load2NoMULTF0F2F431516Load3NoSUBDF8F6F2478DIVDF10F0F655657ADDDF6F8F261011ReservationStationsS1S2RSforjRSforkTimeNameBusyOpVjVkQjQk0Add1No0Add2No Add3No0Mult1No0Mult2NoRegisterresultstatusClockF0F2F4F6F8F10F12...F3057FUM*F4M(45+R3)(M–M)+M()M()–M()M*F4/M?Again,in-orderissue,out-of-orderexecution,completionAdvancedComputerArchitecture27ComparetoScoreboardCycle62InstructionstatusReadExecutiWriteInstructionjkIssueoperandcompletResultLDF634+R21234LDF245+R35678MULTF0F2F4691920SUBDF8F6F2791112DIVDF10F0F68216162ADDDF6F8F213141622FunctionalunitstatusdestS1S2FUforjFUforkFj?Fk?TimeNameBusyOpFiFjFkQjQkRjRk
Integer Mult1 Mult2 Add0DivideNoNoNoNoNoRegisterresultstatusClockF0F2F4F6F8F10F12...F3062FUAdvancedComputerArchitecture28?WhytakeslongeronScoreboard/6600?TomasuloLoopExampleLoop:LDF00R1F2R1#8MULTDSDSUBIBNEZF4F4R1R1F00R1Loop
AssumeMultiplytakes4clocks Assumefirstloadtakes8clocks(cachemiss?),secondloadtakes4clocks(hit) Tobeclear,willshowclocksforSUBI,BNEZ Reality,integerinstructionsaheadAdvancedComputerArchitecture30LoopExampleCycle0InstructionstatusExecutionWriteInstructionjkiterationIssuecompleteResultBusyAddressLDF00R11Load1NoMULTF4F0F21Load2NoSDF40R11Load3NoQiLDF00R12Store1NoMULTF4F0F22Store2NoSDF40R12Store3NoReservationStationsS1S2RSforjRSforkVjVkQjQkTimeNameBusyOp
0Add1 NoCode:LDF00R10Add2NoMULTF4F0F20Add3NoSDF40R10Mult1No0Mult2NoSUBIR1BNEZR1R1#8LoopRegisterresultstatusClockR1F0F2F4F6F8F10F12...F30080QiAdvancedComputerArchitecture31LoopExampleCycle1InstructionstatusExecutionWriteInstructionjkiterationIssuecompleteResultBusyAddressLDF00R111Load1Yes80MULTF4F0F21Load2NoSDF40R11Load3NoQiLDF00R12Store1NoMULTF4F0F22Store2NoSDF40R12Store3NoReservationStationsS1S2RSforjRSforkVjVkQjQkTimeNameBusyOp
0Add1 NoCode:LDF00R10Add2NoMULTF4F0F20Add3NoSDF40R10Mult1No0Mult2NoSUBIR1BNEZR1R1#8LoopRegisterresultstatusClockR1F0F2F4F6F8F10F12...F30180QiLoad1AdvancedComputerArchitecture32LoopExampleCycle2InstructionstatusExecutionWriteInstructionjkiterationIssuecompleteResultBusyAddressLDF00R111Load1Yes80MULTF4F0F212Load2NoSDF40R11Load3NoQiLDF00R12Store1NoMULTF4F0F22Store2NoSDF40R12Store3NoReservationStationsS1S2RSforjRSforkVjVkQjQkTimeNameBusyOp
0Add1 NoCode:LDF00R10Add2NoMULTF4F0F20Add3NoSDF40R1R(F2)Load10Mult1YesMULTD0Mult2NoSUBIR1BNEZR1R1#8LoopRegisterresultstatusClockR1F0F2F4F6F8F10F12...F30280QiLoad1Mult1AdvancedComputerArchitecture33LoopExampleCycle3InstructionstatusExecutionWriteInstructionjkiterationIssuecompleteResultBusyAddressLDF00R111Load1Yes80MULTF4F0F212Load2NoSDF40R113Load3NoQiLDF00R12Store1Yes80Mult1MULTF4F0F22Store2NoSDF40R12Store3NoReservationStationsS1S2RSforjRSforkVjVkQjQkTimeNameBusyOp
0Add1NoCode:LDF00R10Add2NoMULTF4F0F20Add3NoSDF40R1R(F2)Load10Mult1YesMULTD0Mult2NoSUBIR1BNEZR1R1#8LoopRegisterresultstatusClockR1F0F2F4F6F8F10F12...F30380QiLoad1Mult1AdvancedComputerArchitecture34?Note:MULT1hasnoregistersnamesinRSLoopExampleCycle4InstructionstatusExecutionWriteInstructionjkiterationIssuecompleteResultBusyAddressLDF00R111Load1Yes80MULTF4F0F212Load2NoSDF40R113Load3NoQiLDF00R12Store1Yes80Mult1MULTF4F0F22Store2NoSDF40R12Store3NoReservationStationsS1S2RSforjRSforkVjVkQjQkTimeNameBusyOp
0Add1NoCode:LDF00R10Add2NoMULTF4F0F20Add3NoSDF40R1R(F2)Load10Mult1YesMULTD0Mult2NoSUBIR1BNEZR1R1#8LoopRegisterresultstatusClockR1F0F2F4F6F8F10F12...F30472QiLoad1Mult1AdvancedComputerArchitecture35LoopExampleCycle5InstructionstatusExecutionWriteInstructionjkiterationIssuecompleteResultBusyAddressLDF00R111Load1Yes80MULTF4F0F212Load2NoSDF40R113Load3NoQiLDF00R12Store1Yes80Mult1MULTF4F0F22Store2NoSDF40R12Store3NoReservationStationsS1S2RSforjRSforkVjVkQjQkTimeNameBusyOp
0Add1 NoCode:LDF00R10Add2NoMULTF4F0F20Add3NoSDF40R1R(F2)Load10Mult1YesMULTD0Mult2NoSUBIR1BNEZR1R1#8LoopRegisterresultstatusClockR1F0F2F4F6F8F10F12...F30572QiLoad1Mult1AdvancedComputerArchitecture36LoopExampleCycle6InstructionstatusExecutionWriteInstructionjkiterationIssuecompleteResultBusyAddressLDF00R111Load1Yes80MULTF4F0F212Load2Yes72SDF40R113Load3NoQiLDF00R126Store1Yes80Mult1MULTF4F0F22Store2NoSDF40R12Store3NoReservationStationsS1S2RSforjRSforkVjVkQjQkTimeNameBusyOp
0Add1 NoCode:LDF00R10Add2NoMULTF4F0F20Add3NoSDF40R1R(F2)Load10Mult1YesMULTD0Mult2NoSUBIR1BNEZR1R1#8LoopRegisterresultstatusClockR1F0F2F4F6F8F10F12...F30672QiLoad2Mult1AdvancedComputerArchitecture37?Note:F0neverseesLoad1resultLoopExampleCycle7InstructionstatusExecutionWriteInstructionjkiterationIssuecompleteResultBusyAddressLDF00R111Load1Yes80MULTF4F0F212Load2Yes72SDF40R113Load3NoQiLDF00R126Store1Yes80Mult1MULTF4F0F227Store2NoSDF40R12Store3NoReservationStationsS1S2RSforjRSforkVjVkQjQkTimeNameBusyOp
0Add1 NoCode:LDF00R10Add2NoMULTF4F0F20Add3NoSDF40R10Mult1YesMULTD0Mult2YesMULTDR(F2)R(F2)Load1Load2SUBIR1BNEZR1R1#8LoopRegisterresultstatusClockR1F0F2F4F6F8F10F12...F30772QiLoad2Mult2AdvancedComputerArchitecture38?Note:MULT2hasnoregistersnamesinRSLoopExampleCycle8InstructionstatusExecutionWriteInstructionjkiterationIssuecompleteResultBusyAddressLDF00R111Load1Yes80MULTF4F0F212Load2Yes72SDF40R113Load3NoQiLDF00R126Store1Yes80Mult1MULTF4F0F227Store2Yes72Mult2SDF40R128Store3NoReservationStationsS1S2RSforjRSforkVjVkQjQkTimeNameBusyOp
0Add1 NoCode:LDF00R10Add2NoMULTF4F0F20Add3NoSDF40R10Mult1YesMULTD0Mult2YesMULTDR(F2)R(F2)Load1Load2SUBIR1BNEZR1R1#8LoopRegisterresultstatusClockR1F0F2F4F6F8F10F12...F30872QiLoad2Mult2AdvancedComputerArchitecture39LoopExampleCycle9InstructionstatusExecutionWriteInstructionjkiterationIssuecompleteResultBusyAddressLDF00R1119Load1Yes80MULTF4F0F212Load2Yes72SDF40R113Load3NoQiLDF00R126Store1Yes80Mult1MULTF4F0F227Store2Yes72Mult2SDF40R128Store3NoReservationStationsS1S2RSforjRSforkVjVkQjQkTimeNameBusyOp
0Add1 NoCode:LDF00R10Add2NoMULTF4F0F20Add3NoSDF40R10Mult1YesMULTD0Mult2YesMULTDR(F2)R(F2)Load1Load2SUBIR1BNEZR1R1#8LoopRegisterresultstatusClockR1F0F2F4F6F8F10F12...F30964QiLoad2Mult2AdvancedComputerArchitecture40?Load1completing;whatiswaitingforit?LoopExampleCycle10InstructionstatusExecutionWriteInstructionjkiterationIssuecompleteResultBusyAddressLDF00R111910Load1NoMULTF4F0F212Load2Yes72SDF40R113Load3NoQiLDF00R12610Store1Yes80Mult1MULTF4F0F227Store2Yes72Mult2SDF40R128Store3NoReservationStationsS1S2RSforjRSforkVjVkQjQkTimeNameBusyOp
0Add1 NoCode:LDF00R10Add2NoMULTF4F0F20Add3NoSDF40R14Mult1YesMULTDM(80)R(F2)SUBIR1R1#80Mult2YesMULTDR(F2)Load2BNEZR1LoopRegisterresultstatusClockR1F0F2F4F6F8F10F12...F301064QiLoad2Mult2AdvancedComputerArchitecture41?Load2completing;whatiswaitingforit?LoopExampleCycle11AdvancedComputerArchitecture42LoopExampleCycle12InstructionstatusExecutionWriteInstructionjkiterationIssuecompleteResultBusyAddressLDF00R111910Load1NoMULTF4F0F212Load2NoSDF40R113Load3Yes64QiLDF00R1261011Store1Yes80Mult1MULTF4F0F227Store2Yes72Mult2SDF40R128Store3NoReservationStationsS1S2RSforjRSforkVjVkQjQkTimeNameBusyOp
0Add1 NoCode:LDF00R10Add2NoMULTF4F0F20Add3NoSDF40R12Mult1YesMULTD3Mult2YesMULTDM(80)R(F2)M(72)R(F2)SUBIR1BNEZR1R1#8LoopRegisterresultstatusClockR1F0F2F4F6F8F10F12...F301264QiLoad3Mult2AdvancedComputerArchitecture43LoopExampleCycle13InstructionstatusExecutionWriteInstructionjkiterationIssuecompleteResultBusyAddressLDF00R111910Load1NoMULTF4F0F212Load2NoSDF40R113Load3Yes64QiLDF00R1261011Store1Yes80Mult1MULTF4F0F227Store2Yes72Mult2SDF40R128Store3NoReservationStationsS1S2RSforjRSforkVjVkQjQkTimeNameBusyOp
0Add1 NoCode:LDF00R10Add2NoMUL
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 四川電子機(jī)械職業(yè)技術(shù)學(xué)院《空間數(shù)據(jù)庫(kù)課程設(shè)計(jì)》2023-2024學(xué)年第二學(xué)期期末試卷
- 泉州經(jīng)貿(mào)職業(yè)技術(shù)學(xué)院《馬克思恩格斯列寧經(jīng)典著作》2023-2024學(xué)年第二學(xué)期期末試卷
- 陽(yáng)泉職業(yè)技術(shù)學(xué)院《定性研究》2023-2024學(xué)年第二學(xué)期期末試卷
- 淮陰工學(xué)院《C語言程序設(shè)計(jì)課程實(shí)驗(yàn)》2023-2024學(xué)年第二學(xué)期期末試卷
- 二零二五年度瀝青供應(yīng)與道路施工安全監(jiān)督合同
- 《餐廳的設(shè)立》課件
- 《頸椎骨折的診斷》課件
- 《中信銀行信用卡》課件
- 《食品保藏原理》課件
- 體育教師隊(duì)伍建設(shè)策略
- 門靜脈炎護(hù)理課件
- 重慶八中2024屆高三12月高考適應(yīng)性月考卷(四) 語文試卷(含答案)
- 基礎(chǔ)研究成果向臨床轉(zhuǎn)化的實(shí)踐與挑戰(zhàn)
- 建筑構(gòu)造(下冊(cè))
- 電流互感器試驗(yàn)報(bào)告
- 蔣中一動(dòng)態(tài)最優(yōu)化基礎(chǔ)
- 華中農(nóng)業(yè)大學(xué)全日制專業(yè)學(xué)位研究生實(shí)踐單位意見反饋表
- 付款申請(qǐng)英文模板
- 七年級(jí)英語閱讀理解10篇(附答案解析)
- 抖音來客本地生活服務(wù)酒旅商家代運(yùn)營(yíng)策劃方案
- 鉆芯法樁基檢測(cè)報(bào)告
評(píng)論
0/150
提交評(píng)論