版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)
文檔簡介
1、Parallel and Distributed Systems,Instructor: Zhang Weizhe (張偉哲) Computer Network and Information Security Technique Research Center , School of Computer Science and Technology, Harbin Institute of Technology,Chapter 14: Replication and Fault Tolerance,3,Fault-tolerant services Replication services H
2、ighly available services Summary,Outline,4,Fault Tolerance Basic Concepts,Being fault tolerant is strongly related to what are called dependable systems Dependability implies the following: Availability Reliability Safety Maintainability,5,Failure Models,Different types of failures.,6,Failure Maskin
3、g by Redundancy,Figure 8-2. Triple modular redundancy.,7,Flat Groups versus Hierarchical Groups,(a) Communication in a flat group. (b) Communication in a simple hierarchical group.,8,Agreement in Faulty Systems (1),The Byzantine agreement problem for three nonfaulty and one faulty process. (a) Each
4、process sends their value to the others.,9,Agreement in Faulty Systems (2),The Byzantine agreement problem for three nonfaulty and one faulty process. (b) The vectors that each process assembles based on (a). (c) The vectors that each process receives in step 3.,10,Agreement in Faulty Systems (3),no
5、w with two correct process and one faulty process. m faulty process only if at least 2m+1 correct process!,11,RPC Semantics in the Presence of Failures,Five different classes of failures that can occur in RPC systems: The client is unable to locate the server. The request message from the client to
6、the server is lost. The server crashes after receiving a request. The reply message from the server to the client is lost. The client crashes after sending a request.,12,Basic Reliable-Multicasting Schemes,A simple solution to reliable multicasting when all receivers are known and are assumed not to
7、 fail. (a) Message transmission. (b) Reporting feedback.,13,Nonhierarchical Feedback Control,Several receivers have scheduled a request for retransmission, but the first retransmission request leads to the suppression of others.,14,Hierarchical Feedback Control,The essence of hierarchical reliable m
8、ulticasting. Each local coordinator forwards the message to its children and later handles retransmission requests.,15,Fault-tolerant services Replication services Highly available services Summary,Outline,16,Replication Basic Concepts,Replication is a key technology to enhance service Replication o
9、f data: Maintenance of copies of data at multiple computers Goals: Enhanced performance. Increased availability. Fault tolerance. Some potential requirements: Replication transparency. Consistency: if a copy is modified, how and when the others are updated determines “the price of replication”,17,Re
10、plication Basic Concepts,Simple math: if two independent servers, each with 5% chance of failing, then availability is: 1 prob (ALL failed) = 1 - 0. 25% = 99.75% Diff between replication and caches? Caches might not necessarily include ALL objects of interest,18,Replication system model,A basic arch
11、itectural model Replica manager One replica manager per replica Receive FEs request, apply operations to its replicas atomically Front end One front end per client Receive clients request, communicate with RM by message passing,19,An operation executed on a replicated object,Request The front end is
12、sues the request to one or more replica managers Coordination The replica managers coordinate in preparation for executing the request consistently Different ordering Execution The replica managers execute the request (perhaps tentatively) Agreement The replica managers reach consensus on the effect
13、 of the request Response One or more replica managers responds to the front end,20,One primary replica manager, one or more secondary replica manager When the primary replica manager fail, one of the backups is prompted to act as the primary The architecture,Passive (primary-backup) replication,21,R
14、equest The font end issues the request, containing a unique identifier, to the primary replica manager Coordination The primary takes each request atomically, in the order in which it receives it Execution The primary execute the request and stores the response,The sequence of events when a client i
15、ssue a request,22,Agreement If the request is an update then the primary sends the updated state, the response and the unique identifier to all the backups The backups send an acknowledgement Response The primary responds to the front end, which hands the response back to the client,The sequence of
16、events when a client issue a request (2),23,Front end multicast request to replication managers The architecture,Active replication,24,Request The front end attaches a unique identifier to the request and multicasts it to the group of replica managers, using a totally ordered, reliable multicast pri
17、mitive Coordination The group communication system delivers the request to every correct replica manager in the same order Execution Every replica manager executes the request Agreement (no) Response Each replica manager sends its response to the front end,Active replication scheme,25,Achieve sequen
18、tial consistency Reliable multicast All correct replica manager process the same set of requests: reliable multicast Total order All correct replica manager process requests in the same order FIFO order Be Maintained by each front end No linearizability The total order is not same as the real-time o
19、rder,Active replication performance,26,Fault-tolerant services Replication services Highly available services Summary,Outline,27,Fault tolerance “eager” consistency all replicas reach agreement before passing control to client High availability “l(fā)azy” consistency Reach consistency until next access
20、Reach agreement after passing control to client Gossip, Bayou, Coda,High availability vs. fault tolerance,28,The architecture Front end connects to any of replica manager Query/Update Replica managers exchange “gossip” messages periodically to maintain consistency Two guarantees Each client obtains
21、a consistent service over time Relaxed consistency between replicas All replica managers eventually receive all updates and they apply updates with ordering guarantees,The gossip architecture,29,Request The front end sends the request to a replica manager Query: client may be blocked Update: unblock
22、ed Update response Replica manager replies immediately Coordination Suspend the request until it can be apply May receive gossip messages that sent from other replica managers,Queries and updates in a gossip service,30,Execution The replica manager executes the request Query response Reply at this p
23、oint Agreement exchange gossip messages which contain the most recent updates applied on the replica Exchange occasionally Ask the particular replica manager to send when some replica manager finds it has missed one,Queries and updates in a gossip service continued,31,Exchange gossip message Estimat
24、e the missed messages of one replica manager by its timestamp table Exchange gossip messages periodically or when some other replica manager ask The format or a gossip message m.log: one or more updates in the source replica managers log m.ts: the replica timestamp of the source replica manager,Goss
25、ip messages,32,How often to exchange gossip messages? Minutes, hours or days Depend on the requirement of application How to choose partners to exchange? Random Deterministic Utilize a simple function of the replica managers state to make the choice of partner Topological Mesh, circle, tree,Update p
26、ropagation,33,Limits of AFS Read-only replica The objective of Coda Constant data availability Coda: extend AFS on Read-write replica Optimistic strategy to resolve conflicts Disconnected operation,The Coda file system,34,Venus/Vice Vice: replica manager Venus: hybrid of front end and replica manage
27、r Volume storage group (VSG) The set of servers holding replicas of a file volume Available volume storage group (AVSG) Vice know AVSG of each file Access a file The file is serviced by any server in AVSG,The Coda architecture,35,Replication services Fault-tolerant services Highly available services Summary,Outline,36,Summary,Replication for distributed systems High performance, high availability, fault tolerance Replication for fault tolerance Primary-backup rep
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025中國郵政集團公司來賓市分公司社會招聘(廣西)高頻重點提升(共500題)附帶答案詳解
- 2025中國聯(lián)通海南分公司春季校園招聘22人高頻重點提升(共500題)附帶答案詳解
- 2025中國移動集中運營中心(中國移動銷售分公司)校園招聘高頻重點提升(共500題)附帶答案詳解
- 2025中國石油高校畢業(yè)生春季招聘高頻重點提升(共500題)附帶答案詳解
- 2025中國電信湖北宜昌分公司校園招聘29人高頻重點提升(共500題)附帶答案詳解
- 2025中國建筑(南洋)發(fā)展限公司招聘高頻重點提升(共500題)附帶答案詳解
- 2025中共南平市延平區(qū)委黨史和地方志研究室公開招聘1人(福建)高頻重點提升(共500題)附帶答案詳解
- 2025下半年陜西延安事業(yè)單位招聘(533人)高頻重點提升(共500題)附帶答案詳解
- 2025下半年湖北孝感孝南事業(yè)單位聯(lián)考高頻重點提升(共500題)附帶答案詳解
- 2025下半年山東高速建材集團限公司社會招聘1人高頻重點提升(共500題)附帶答案詳解
- 10KV供配電工程施工組織設(shè)計
- 【課件】Unit3ReadingforWriting寫作指導課件課件-2021-2022學年高中英語人教版(2019)必修第二冊
- PLC課程設(shè)計說明書旋轉(zhuǎn)式濾水器電氣控制系統(tǒng)設(shè)計
- 《社區(qū)安全防范》課程教案
- 高效全自動凈水器操作使用說明
- 伯努利方程逐段試算法求水庫回水
- ppt素材――小圖標 可直接使用
- 30課時羽毛球教案
- 學術(shù)英語寫作范文17篇
- 任發(fā)改委副主任掛職鍛煉工作總結(jié)范文
- 針灸治療學題庫(精品課件)
評論
0/150
提交評論