![Matrix Multiplication.ppt_第1頁(yè)](http://file1.renrendoc.com/fileroot2/2020-1/13/b01ba191-70e0-4097-913b-40ae64f15579/b01ba191-70e0-4097-913b-40ae64f155791.gif)
![Matrix Multiplication.ppt_第2頁(yè)](http://file1.renrendoc.com/fileroot2/2020-1/13/b01ba191-70e0-4097-913b-40ae64f15579/b01ba191-70e0-4097-913b-40ae64f155792.gif)
![Matrix Multiplication.ppt_第3頁(yè)](http://file1.renrendoc.com/fileroot2/2020-1/13/b01ba191-70e0-4097-913b-40ae64f15579/b01ba191-70e0-4097-913b-40ae64f155793.gif)
![Matrix Multiplication.ppt_第4頁(yè)](http://file1.renrendoc.com/fileroot2/2020-1/13/b01ba191-70e0-4097-913b-40ae64f15579/b01ba191-70e0-4097-913b-40ae64f155794.gif)
![Matrix Multiplication.ppt_第5頁(yè)](http://file1.renrendoc.com/fileroot2/2020-1/13/b01ba191-70e0-4097-913b-40ae64f15579/b01ba191-70e0-4097-913b-40ae64f155795.gif)
版權(quán)說(shuō)明:本文檔由用戶(hù)提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、Parallel Programmingin C with MPI and OpenMP,Michael J. Quinn,Chapter 11,Matrix Multiplication,Outline,Sequential algorithms Iterative, row-oriented Recursive, block-oriented Parallel algorithms Rowwise block striped decomposition Cannons algorithm,Iterative, Row-oriented Algorithm,Series of inner p
2、roduct (dot product) operations,Performance as n Increases,Reason:Matrix B Gets Too Big for Cache,Computing a row of C requires accessing every element of B,Block Matrix Multiplication,Replace scalar multiplication with matrix multiplication Replace scalar addition with matrix addition,Recurse Until
3、 B Small Enough,Comparing Sequential Performance,First Parallel Algorithm,Partitioning Divide matrices into rows Each primitive task has corresponding rows of three matrices Communication Each task must eventually see every row of B Organize tasks into a ring,First Parallel Algorithm (cont.),Agglome
4、ration and mapping Fixed number of tasks, each requiring same amount of computation Regular communication among tasks Strategy: Assign each process a contiguous group of rows,Communication of B,A,A,B,C,A,A,B,C,A,A,B,C,A,A,B,C,Communication of B,A,A,B,C,A,A,B,C,A,A,B,C,A,A,B,C,Communication of B,A,A,
5、B,C,A,A,B,C,A,A,B,C,A,A,B,C,Communication of B,A,A,B,C,A,A,B,C,A,A,B,C,A,A,B,C,Complexity Analysis,Algorithm has p iterations During each iteration a process multiplies(n / p) (n / p) block of A by (n / p) n block of B: (n3 / p2) Total computation time: (n3 / p) Each process ends up passing(p-1)n2/p
6、 = (n2) elements of B,Isoefficiency Analysis,Sequential algorithm: (n3) Parallel overhead: (pn2)Isoefficiency relation: n3 Cpn2 n Cp This system does not have good scalability,Weakness of Algorithm 1,Blocks of B being manipulated have p times more columns than rows Each process must access every ele
7、ment of matrix B Ratio of computations per communication is poor: only 2n / p,Parallel Algorithm 2(Cannons Algorithm),Associate a primitive task with each matrix element Agglomerate tasks responsible for a square (or nearly square) block of C Computation-to-communication ratio rises to n / p,Element
8、s of A and B Needed to Compute a Processs Portion of C,Algorithm 1,Cannons Algorithm,Blocks Must Be Aligned,Before,After,Blocks Need to Be Aligned,A00,B00,A01,B01,A02,B02,A03,B03,A10,B10,A11,B11,A12,B12,A13,B13,A20,B20,A21,B21,A22,B22,A23,B23,A30,B30,A31,B31,A32,B32,A33,B33,Each triangle represents
9、a matrix block Only same-color triangles should be multiplied,Rearrange Blocks,A00,B00,A01,B01,A02,B02,A03,B03,A10,B10,A11,B11,A12,B12,A13,B13,A20,B20,A21,B21,A22,B22,A23,B23,A30,B30,A31,B31,A32,B32,A33,B33,Block Aij cycles left i positions Block Bij cycles up j positions,Consider Process P1,2,B02,A
10、10,A11,A12,B12,A13,B22,B32,Step 1,Consider Process P1,2,B12,A11,A12,A13,B22,A10,B32,B02,Step 2,Consider Process P1,2,B22,A12,A13,A10,B32,A11,B02,B12,Step 3,Consider Process P1,2,B32,A13,A10,A11,B02,A12,B12,B22,Step 4,Complexity Analysis,Algorithm has p iterations During each iteration process multip
11、lies two (n / p ) (n / p ) matrices: (n3 / p 3/2) Computational complexity: (n3 / p) During each iteration process sends and receives two blocks of size (n / p ) (n / p ) Communication complexity: (n2/ p),Isoefficiency Analysis,Sequential algorithm: (n3) Parallel overhead: (pn2) Isoefficiency relati
12、on: n3 C pn2 n C p This system is highly scalable,Summary,Considered two sequential algorithms Iterative, row-oriented algorithm Recursive, block-oriented algorithm Second has better cache hit rate as n increases Developed two parallel algorithms First based on rowwise block striped decomposition Se
13、cond based on checkerboard block decomposition Second algorithm is scalable, while first is not,Parallel Programmingin C with MPI and OpenMP,Michael J. Quinn,Chapter 12,Solving Linear Systems,Outline,Terminology Back substitution Gaussian elimination Jacobi method Conjugate gradient method,Terminolo
14、gy,System of linear equations Solve Ax = b for x Special matrices Symmetrically banded Upper triangular Lower triangular Diagonally dominant Symmetric,Symmetrically Banded,4,2,-1,0,0,0,3,-4,5,6,0,0,1,6,3,2,4,0,0,2,-2,0,9,2,0,0,7,3,8,7,0,0,0,4,0,2,Semibandwidth 2,Upper Triangular,4,2,-1,5,9,2,0,-4,5,
15、6,0,-4,0,0,3,2,4,6,0,0,0,0,9,2,0,0,0,0,8,7,0,0,0,0,0,2,Lower Triangular,4,0,0,0,0,0,0,0,0,0,0,0,5,4,3,0,0,0,2,6,2,3,0,0,8,-2,0,1,8,0,-3,5,7,9,5,2,Diagonally Dominant,19,0,2,2,0,6,0,-15,2,0,-3,0,5,4,22,-1,0,4,2,3,2,13,0,-5,5,-2,0,1,16,0,-3,5,5,3,5,-32,Symmetric,3,0,2,2,0,6,0,7,4,3,-3,5,5,4,0,-1,0,4,2
16、,3,-1,9,0,-5,0,-3,0,0,5,5,6,5,4,-5,5,-3,Back Substitution,Used to solve upper triangular systemTx = b for x Methodology: one element of x can be immediately computed Use this value to simplify system, revealing another element that can be immediately computed Repeat,Back Substitution,1x0,+1x1,1x2,+4
17、x3,8,=, 2x1,3x2,+1x3,5,=,2x2, 3x3,0,=,2x3,4,=,Back Substitution,1x0,+1x1,1x2,+4x3,8,=, 2x1,3x2,+1x3,5,=,2x2, 3x3,0,=,2x3,4,=,x3 = 2,Back Substitution,1x0,+1x1,1x2,0,=, 2x1,3x2,3,=,2x2,6,=,2x3,4,=,Back Substitution,1x0,+1x1,1x2,0,=, 2x1,3x2,3,=,2x2,6,=,2x3,4,=,x2 = 3,Back Substitution,1x0,+1x1,3,=, 2
18、x1,12,=,2x2,6,=,2x3,4,=,Back Substitution,1x0,+1x1,3,=, 2x1,12,=,2x2,6,=,2x3,4,=,x1 = 6,Back Substitution,1x0,9,=, 2x1,12,=,2x2,6,=,2x3,4,=,Back Substitution,1x0,9,=, 2x1,12,=,2x2,6,=,2x3,4,=,x0 = 9,Pseudocode,for i n 1 down to 1 do x i b i / a i, i for j 0 to i 1 do b j b j x i a j, i endfor endfor
19、,Time complexity: (n2),Data Dependence Diagram,We cannot execute the outer loop in parallel. We can execute the inner loop in parallel.,Row-oriented Algorithm,Associate primitive task with each row of A and corresponding elements of x and b During iteration i task associated with row j computes new
20、value of bj Task i must compute xi and broadcast its value Agglomerate using rowwise interleaved striped decomposition,Interleaved Decompositions,Rowwise interleaved striped decomposition,Columnwise interleaved striped decomposition,Complexity Analysis,Each process performs about n / (2p) iterations
21、 of loop j in all A total of n -1 iterations in all Computational complexity: (n2/p) One broadcast per iteration Communication complexity: (n log p),Column-oriented Algorithm,Associate one primitive task per column of A and associated element of x Last task starts with vector b During iteration i ta
22、sk i computes xi, updates b, and sends b to task i -1 In other words, no computational concurrency Agglomerate tasks in interleaved fashion,Complexity Analysis,Since b always updated by a single process, computational complexity same as sequential algorithm: (n2) Since elements of b passed from one process to another each iteration, communication complexity is (n2),Comparison,Message- passing time dominates,Computation time dominates,Gaussian Elimination,Used to solve Ax = b when A is dense Reduces Ax = b to upper triangular system Tx = c Back su
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶(hù)所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶(hù)上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶(hù)因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025年度貨物運(yùn)輸合同貨物追蹤與查詢(xún)服務(wù)協(xié)議
- 2025年度合伙制合同協(xié)議書(shū):生物制藥產(chǎn)業(yè)合作框架
- 2025年度智能停車(chē)場(chǎng)管理系統(tǒng)建設(shè)合同范本
- 2025年法絲龍運(yùn)動(dòng)面料項(xiàng)目可行性研究報(bào)告
- 2025年度離婚后共同財(cái)產(chǎn)分割與債務(wù)清償協(xié)議
- 辦公室物資申請(qǐng)書(shū)
- 2025年度城市更新改造工程施工合作合同
- 2025年小吊柜項(xiàng)目投資可行性研究分析報(bào)告
- 2025年度房地產(chǎn)項(xiàng)目借款延期及還款計(jì)劃協(xié)議
- 2025年度并購(gòu)項(xiàng)目股權(quán)投資保密合同
- 2024年包頭市水務(wù)(集團(tuán))有限公司招聘筆試沖刺題(帶答案解析)
- 知識(shí)庫(kù)管理規(guī)范大全
- 2024年贛州民晟城市運(yùn)營(yíng)服務(wù)有限公司招聘筆試參考題庫(kù)附帶答案詳解
- 領(lǐng)導(dǎo)干部報(bào)告?zhèn)€人事項(xiàng)
- 9這點(diǎn)挫折算什么(課件)-五年級(jí)上冊(cè)生命與健康
- 價(jià)格監(jiān)督檢查知識(shí)培訓(xùn)課件
- 駐場(chǎng)保潔方案
- 中國(guó)心理衛(wèi)生協(xié)會(huì)家庭教育指導(dǎo)師參考試題庫(kù)及答案
- 智能廣告投放技術(shù)方案
- 知識(shí)產(chǎn)權(quán)保護(hù)執(zhí)法
- 高質(zhì)量社區(qū)建設(shè)的路徑與探索
評(píng)論
0/150
提交評(píng)論