已閱讀5頁,還剩9頁未讀, 繼續(xù)免費閱讀
版權說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權,請進行舉報或認領
文檔簡介
湖南大學課程實驗報告 課 程 名 稱: 計算機組成與結構 實驗項目名稱: perflab 專 業(yè) 班 級: 姓 名: 學 號: 指 導 教 師: 完 成 時 間: 2015 年 05 月 22 日計算機科學與工程系實驗題目:程序性能調(diào)優(yōu)實驗實驗目的:kernel.c文件中主要有兩個需要進行優(yōu)化的函數(shù):rotate和smooth,并分別給出了naive_rotate和naive_smooth兩個函數(shù)的基本實現(xiàn)作為baseline作為你改進后的程序的比較對象。你需要讀懂rotate和smooth函數(shù),并對其進行優(yōu)化。你每寫一個新版本的、優(yōu)化的rotate和smooth函數(shù),均可在成注冊后使用driver進行測試,并得到對應的CPE和加速比。本次實驗,要求針對每個函數(shù)、每個人均至少寫出3種優(yōu)化版本、并根據(jù)driver報告的結果進行性能分析。實驗環(huán)境:Vmware虛擬機 ubuntu12.04 linux終端實驗步驟和結果分析:函數(shù)源碼:rotate函數(shù):void naive_rotate(int dim, pixel *src, pixel *dst) int i, j; for (i = 0; i dim; i+)for (j = 0; j dim; j+) dstRIDX(dim-1-j, i, dim) = srcRIDX(i, j, dim);rotate函數(shù)的作用是通過將每個像素進行行列調(diào)位,將一副點陣圖像進行90度旋轉。其中RIDX(i,j,n)即(i)*(n)+(j)。函數(shù)缺點為程序局部性不好,循環(huán)次數(shù)過多??梢詫ζ溥M行分塊來提高空間局部性,也可以進行循環(huán)展開。smooth函數(shù):void naive_smooth(int dim, pixel *src, pixel *dst) int i, j; for (i = 0; i dim; i+)for (j = 0; j dim; j+) dstRIDX(i, j, dim) = avg(dim, i, j, src);smooth函數(shù)的作用是通過對圖像每幾點像素求平均值來對圖像進行模糊化處理。函數(shù)缺點是循環(huán)次數(shù)過多和頻繁調(diào)用avg函數(shù),avg函數(shù)中又包含許多函數(shù)。應該減少avg函數(shù)的調(diào)用次數(shù),且進行循環(huán)展開。第一種版本:CPE分析:rotate函數(shù):void rotate(int dim, pixel *src, pixel *dst) int i,j,ii,jj; for(ii=0;iidim;ii+=4) for(jj=0;jjdim;jj+=4) for(i=ii;iii+4;i+) for(j=jj;jjj+4;j+) dstRIDX(dim-1-j,i,dim)=srcRIDX(i,j,dim);多添加了兩個for函數(shù),將循環(huán)分成了4*4的小塊,在cache存儲體不足夠大的情況下,對循環(huán)分塊能夠提升高速緩存命中率,從高提升了空間局部性。從測試的CPE中也可以看出,在dim是64的時候,原代碼和本代碼CPE相差不大,而隨著dim的增大,本代碼CPE增加不大,而原代碼CPE急劇增加,就是受到了cache存儲的局限性。smooth函數(shù):void smooth(int dim, pixel *src, pixel *dst)pixel_sum rowsum530530; int i, j, snum; for(i=0; idim; i+) rowsumi0.red = (srcRIDX(i, 0, dim).red+srcRIDX(i, 1, dim).red); rowsumi0.blue = (srcRIDX(i, 0, dim).blue+srcRIDX(i, 1, dim).blue); rowsumi0.green = (srcRIDX(i, 0, dim).green+srcRIDX(i, 1, dim).green); rowsumi0.num = 2; for(j=1; jdim-1; j+) rowsumij.red = (srcRIDX(i, j-1, dim).red+srcRIDX(i, j, dim).red+srcRIDX(i, j+1, dim).red); rowsumij.blue = (srcRIDX(i, j-1, dim).blue+srcRIDX(i, j, dim).blue+srcRIDX(i, j+1, dim).blue); rowsumij.green = (srcRIDX(i, j-1, dim).green+srcRIDX(i, j, dim).green+srcRIDX(i, j+1, dim).green); rowsumij.num = 3; rowsumidim-1.red = (srcRIDX(i, dim-2, dim).red+srcRIDX(i, dim-1, dim).red); rowsumidim-1.blue = (srcRIDX(i, dim-2, dim).blue+srcRIDX(i, dim-1, dim).blue); rowsumidim-1.green = (srcRIDX(i, dim-2, dim).green+srcRIDX(i, dim-1, dim).green); rowsumidim-1.num = 2; for(j=0; jdim; j+) snum = rowsum0j.num+rowsum1j.num; dstRIDX(0, j, dim).red = (unsigned short)(rowsum0j.red+rowsum1j.red)/snum); dstRIDX(0, j, dim).blue = (unsigned short)(rowsum0j.blue+rowsum1j.blue)/snum); dstRIDX(0, j, dim).green = (unsigned short)(rowsum0j.green+rowsum1j.green)/snum); for(i=1; i512時,超出了設置的數(shù)組大小會報錯。第二種版本:CPE分析:rotate函數(shù):void rotate(int dim, pixel *src, pixel *dst) int i, j; int temp; int it,jt; int im,jm; for(jt=0; jtdim; jt+=32) jm=jt+32; for(it=0; itdim; it+=32) im=it+32; for(j=jt; jjm; j+) temp=dim-1-j; for(i=it; ired=(P1-red+(P1+1)-red+P2-red+(P2+1)-red)2; dst-green=(P1-green+(P1+1)-green+P2-green+(P2+1)-green)2; dst-blue=(P1-blue+(P1+1)-blue+P2-blue+(P2+1)-blue)2; dst+; for(i=1;ired=(P1-red+(P1+1)-red+(P1+2)-red+P2-red+(P2+1)-red+(P2+2)-red)/6; dst-green=(P1-green+(P1+1)-green+(P1+2)-green+P2-green+(P2+1)-green+(P2+2)-green)/6; dst-blue=(P1-blue+(P1+1)-blue+(P1+2)-blue+P2-blue+(P2+1)-blue+(P2+2)-blue)/6; dst+; P1+; P2+; dst-red=(P1-red+(P1+1)-red+P2-red+(P2+1)-red)2; dst-green=(P1-green+(P1+1)-green+P2-green+(P2+1)-green)2; dst-blue=(P1-blue+(P1+1)-blue+P2-blue+(P2+1)-blue)2; dst+; P1=src; P2=P1+dim0; P3=P2+dim0; for(i=1;ired=(P1-red+(P1+1)-red+P2-red+(P2+1)-red+P3-red+(P3+1)-red)/6; dst-green=(P1-green+(P1+1)-green+P2-green+(P2+1)-green+P3-green+(P3+ 1)-green)/6; dst-blue=(P1-blue+(P1+1)-blue+P2-blue+(P2+1)-blue+P3-blue+(P3+1)-blue)/6; dst+; dst1=dst+1; for(j=1;jred=(P1-red+(P1+1)-red+(P1+2)-red+P2-red+(P2+1)-red+(P2+2)-red+P3-red+(P3+1)-red+(P3+2)-red)/9; dst-green=(P1-green+(P1+1)-green+(P1+2)-green+P2-green+(P2+1)-green+(P2+2)-green+P3-green+(P3+1)-green+(P3+2)-green)/9; dst-blue=(P1-blue+(P1+1)-blue+(P1+2)-blue+P2-blue+(P2+1)-blue+(P2+2)-blue+P3-blue+(P3+1)-blue+(P3+2)-blue)/9; dst1-red=(P1+3)-red+(P1+1)-red+(P1+2)-red+(P2+3)-red+(P2+1)-red+(P2+2)-red+(P3+3)-red+(P3+1)-red+(P3+2)-red)/9; dst1-green=(P1+3)-green+(P1+1)-green+(P1+2)-green+(P2+3)-green+(P2+1)-green+(P2+2)-green+(P3+3)-green+(P3+1)-green+(P3+2)-green)/9; dst1-blue=(P1+3)-blue+(P1+1)-blue+(P1+2)-blue+(P2+3)-blue+(P2+1)-blue+(P2+2)-blue+(P3+3)-blue+(P3+1)-blue+(P3+2)-blue)/9; dst+=2; dst1+=2; P1+=2; P2+=2; P3+=2; for(;jred=(P1-red+(P1+1)-red+(P1+2)-red+P2-red+(P2+1)-red+(P2+2)-red+P3-red+(P3+1)-red+(P3+2)-red)/9; dst-green=(P1-green+(P1+1)-green+(P1+2)-green+P2-green+(P2+1)-green+(P2+2)-green+P3-green+(P3+1)-green+(P3+2)-green)/9; dst-blue=(P1-blue+(P1+1)-blue+(P1+2)-blue+P2-blue+(P2+1)-blue+(P2+2)-blue+P3-blue+(P3+1)-blue+(P3+2)-blue)/9; dst+; P1+; P2+; P3+; dst-red=(P1-red+(P1+1)-red+P2-red+(P2+1)-red+P3-red+(P3+1)-red)/6; dst-green=(P1-green+(P1+1)-green+P2-green+(P2+1)-green+P3-green+(P3+1)-green)/6; dst-blue=(P1-blue+(P1+1)-blue+P2-blue+(P2+1)-blue+P3-blue+(P3+1)-blue)/6; dst+; P1+=2; P2+=2; P3+=2; dst-red=(P1-red+(P1+1)-red+P2-red+(P2+1)-red)2; dst-green=(P1-green+(P1+1)-green+P2-green+(P2+1)-green)2; dst-blue=(P1-blue+(P1+1)-blue+P2-blue+(P2+1)-blue)2; dst+; for(i=1;ired=(P1-red+(P1+1)-red+(P1+2)-red+P2-red+(P2+1)-red+(P2+2)-red)/6; dst-green=(P1-green+(P1+1)-green+(P1+2)-green+P2-green+(P2+1)-green+(P2+2)-green)/6; dst-blue=(P1-blue+(P1+1)-blue+(P1+2)-blue+P2-blue+(P2+1)-blue+(P2+2)-blue)/6; dst+; P1+; P2+; dst-red=(P1-red+(P1+1)-red+P2-red+(P2+1)-red)2; dst-green=(P1-green+(P1+1)-green+P2-green+(P2+1)-green)2; dst-blue=(P1-blue+(P1+1)-blue+P2-blue+(P2+1)-blue)2; 這段代碼也是通過不調(diào)用avg函數(shù)來加速程序。將Smooth函數(shù)處理分為4塊,一為主體內(nèi)部,由9點求平均值;二為4個頂點,由4點求平均值;三為四條邊界,由6點求平均值。從圖片的頂部開始處理,再上邊界,順序處理下來,其中在處理左邊界時,for循環(huán)處理一行主體部分,就是以上的代碼。第三種版本:CPE分析:rotate函數(shù):void rotate(int dim, pixel *src, pixel *dst) int i, j; int dst_base=(dim-1)*dim; dst+=dst_base; for(i=0;idim;i+=32) for(j=0;jdim;j+) *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+=dim; dst+; *dst=*src; src+; src-=(dim5)-dim; dst-=31+dim; dst+=dst_base+dim; dst+=32; src+=(dim2; dst0.blue = (src0.blue+src1.blue+srcdim.blue+srcdim+1.blue)2; dst0.green = (src0.green+src1.green+srcdim.green+srcdim+1.green)2; dstdim-1.red = (srcdim-1.red+srcdim-2.red+srcdim*2-1.red+srcdim*2-2.red)2; dstdim-1.blue = (srcdim-1.blue+srcdim-2.blue+srcdim*2-1.blue+srcdim*2-2.blue)2; dstdim-1.green = (srcdim-1.green+srcdim-2.green+srcdim*2-1.green+srcdim*2-2.green)2; dstdim*(dim-1).red = (srcdim*(dim-1).red+srcdim*(dim-1)+1.red+srcdim*(dim-2).red+srcdim*(dim-2)+1.red)2; dstdim*(dim-1).blue = (srcdim*(dim-1).blue+srcdim*(dim-1)+1.blue+srcdim*(dim-2).blue+srcdim*(dim-2)+1.blue)2; dstdim*(dim-1).green = (srcdim*(dim-1).green+srcdim*(dim-1)+1.green+srcdim*(dim-2).green+srcdim*(dim-2)+1.green)2; dstdim*dim-1.red = (srcdim*dim-1.red+srcdim*dim-2.red+srcdim*(dim-1)-1.red+srcdim*(dim-1)-2.red)2; dstdim*dim-1.blue = (srcdim*dim-1.blue+srcdim*dim-2.blue+srcdim*(dim-1)-1.blue+srcdim*(dim-1)-2.blue)2; dstdim*dim-1.green = (srcdim*dim-1.green+srcdim*dim-2.green+srcdim*(dim-1)-1.green+srcdim*(dim-1)-2.green)2; for (j = 1; j dim-1; j+) dstj.red = (srcj.red+srcj-1.red+srcj+1.red+srcj+dim.red+srcj+1+dim.red+srcj-1+dim.red)/6; dstj.green = (srcj.green+srcj-1.green+srcj+1.green+srcj+dim.green+srcj+1+dim.green+srcj-1+dim.green)/6; dstj.blue = (srcj.blue+srcj-1.blue+srcj+1.blue+srcj+dim.blue+srcj+1+dim.blue+srcj-1+dim.blue)/6; for (j = dim*(dim-1)+1; j dim*dim-1; j+) dstj.red = (srcj.red+srcj-1.red+srcj+1.red+srcj-dim.red+srcj+1-dim.red+srcj-1-dim.red)/6; dstj.green = (srcj.green+srcj-1.green+srcj+1.green+srcj-dim.green+srcj+1-dim.green+srcj-1-dim.green)/6; dstj.blue = (srcj.blue+srcj-1.blue+srcj+1.blue+srcj-dim.blue+srcj+1-dim.blue+srcj-1-dim.blue)/6; for (j = dim; j dim*(dim-1); j+=dim) dstj.red = (srcj.red+srcj-dim.red+srcj+1.red+srcj+dim.red+srcj+1+dim.red+srcj-dim+1.red)/6; dstj.green = (srcj.green+srcj-dim.green+srcj+1.green+srcj+dim.green+srcj+1+dim.green+srcj-dim+1.green)/6; dstj.blue = (srcj.blue+srcj-dim.blue+srcj+1.blue+srcj+dim.blue+srcj+1+dim.blue+srcj-dim+1.blue)/6; for (j = dim+dim-1; j dim*dim-1; j+=d
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經(jīng)權益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 幼兒園元旦活動計劃8篇
- 2024年版企業(yè)勞動協(xié)議參考文本版B版
- 2022幼兒手工教案
- 小區(qū)物業(yè)工作計劃
- 2024-2030年中國酚醛樹脂涂料行業(yè)發(fā)展運行現(xiàn)狀及投資潛力預測報告
- 半導體激光治療儀項目可行性分析報告
- 大班健康活動教案四篇
- 大學班主任工作計劃
- 美術教師個人工作總結5篇
- 醫(yī)學類實習報告模板九篇
- 學前兒童家庭與社區(qū)教育學習通超星期末考試答案章節(jié)答案2024年
- 網(wǎng)絡安全產(chǎn)品質保與售后方案
- 2024-2025學年河北省高三上學期省級聯(lián)測政治試題及答案
- 貴州省貴陽市2023-2024學年高一上學期期末考試 物理 含解析
- 幼兒園班級管理中的沖突解決策略研究
- 【7上英YL】蕪湖市2023-2024學年七年級上學期英語期中素質教育評估試卷
- 2024年度中國鈉離子電池報告
- 2024年問政山東拆遷協(xié)議書模板
- 浪潮iqt在線測評題及答案
- 山東省青島市2023-2024學年高一年級上冊1月期末選科測試 生物 含解析
- 電工技術(第3版)表格式教案教學詳案設計
評論
0/150
提交評論