10分布式系統(tǒng)的復制與容錯.ppt_第1頁
10分布式系統(tǒng)的復制與容錯.ppt_第2頁
10分布式系統(tǒng)的復制與容錯.ppt_第3頁
10分布式系統(tǒng)的復制與容錯.ppt_第4頁
10分布式系統(tǒng)的復制與容錯.ppt_第5頁
已閱讀5頁,還剩37頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

1、Parallel and Distributed Systems,Instructor: Zhang Weizhe (張偉哲) Computer Network and Information Security Technique Research Center , School of Computer Science and Technology, Harbin Institute of Technology,Chapter 14: Replication and Fault Tolerance,3,Fault-tolerant services Replication services H

2、ighly available services Summary,Outline,4,Fault Tolerance Basic Concepts,Being fault tolerant is strongly related to what are called dependable systems Dependability implies the following: Availability Reliability Safety Maintainability,5,Failure Models,Different types of failures.,6,Failure Maskin

3、g by Redundancy,Figure 8-2. Triple modular redundancy.,7,Flat Groups versus Hierarchical Groups,(a) Communication in a flat group. (b) Communication in a simple hierarchical group.,8,Agreement in Faulty Systems (1),The Byzantine agreement problem for three nonfaulty and one faulty process. (a) Each

4、process sends their value to the others.,9,Agreement in Faulty Systems (2),The Byzantine agreement problem for three nonfaulty and one faulty process. (b) The vectors that each process assembles based on (a). (c) The vectors that each process receives in step 3.,10,Agreement in Faulty Systems (3),no

5、w with two correct process and one faulty process. m faulty process only if at least 2m+1 correct process!,11,RPC Semantics in the Presence of Failures,Five different classes of failures that can occur in RPC systems: The client is unable to locate the server. The request message from the client to

6、the server is lost. The server crashes after receiving a request. The reply message from the server to the client is lost. The client crashes after sending a request.,12,Basic Reliable-Multicasting Schemes,A simple solution to reliable multicasting when all receivers are known and are assumed not to

7、 fail. (a) Message transmission. (b) Reporting feedback.,13,Nonhierarchical Feedback Control,Several receivers have scheduled a request for retransmission, but the first retransmission request leads to the suppression of others.,14,Hierarchical Feedback Control,The essence of hierarchical reliable m

8、ulticasting. Each local coordinator forwards the message to its children and later handles retransmission requests.,15,Fault-tolerant services Replication services Highly available services Summary,Outline,16,Replication Basic Concepts,Replication is a key technology to enhance service Replication o

9、f data: Maintenance of copies of data at multiple computers Goals: Enhanced performance. Increased availability. Fault tolerance. Some potential requirements: Replication transparency. Consistency: if a copy is modified, how and when the others are updated determines “the price of replication”,17,Re

10、plication Basic Concepts,Simple math: if two independent servers, each with 5% chance of failing, then availability is: 1 prob (ALL failed) = 1 - 0. 25% = 99.75% Diff between replication and caches? Caches might not necessarily include ALL objects of interest,18,Replication system model,A basic arch

11、itectural model Replica manager One replica manager per replica Receive FEs request, apply operations to its replicas atomically Front end One front end per client Receive clients request, communicate with RM by message passing,19,An operation executed on a replicated object,Request The front end is

12、sues the request to one or more replica managers Coordination The replica managers coordinate in preparation for executing the request consistently Different ordering Execution The replica managers execute the request (perhaps tentatively) Agreement The replica managers reach consensus on the effect

13、 of the request Response One or more replica managers responds to the front end,20,One primary replica manager, one or more secondary replica manager When the primary replica manager fail, one of the backups is prompted to act as the primary The architecture,Passive (primary-backup) replication,21,R

14、equest The font end issues the request, containing a unique identifier, to the primary replica manager Coordination The primary takes each request atomically, in the order in which it receives it Execution The primary execute the request and stores the response,The sequence of events when a client i

15、ssue a request,22,Agreement If the request is an update then the primary sends the updated state, the response and the unique identifier to all the backups The backups send an acknowledgement Response The primary responds to the front end, which hands the response back to the client,The sequence of

16、events when a client issue a request (2),23,Front end multicast request to replication managers The architecture,Active replication,24,Request The front end attaches a unique identifier to the request and multicasts it to the group of replica managers, using a totally ordered, reliable multicast pri

17、mitive Coordination The group communication system delivers the request to every correct replica manager in the same order Execution Every replica manager executes the request Agreement (no) Response Each replica manager sends its response to the front end,Active replication scheme,25,Achieve sequen

18、tial consistency Reliable multicast All correct replica manager process the same set of requests: reliable multicast Total order All correct replica manager process requests in the same order FIFO order Be Maintained by each front end No linearizability The total order is not same as the real-time o

19、rder,Active replication performance,26,Fault-tolerant services Replication services Highly available services Summary,Outline,27,Fault tolerance “eager” consistency all replicas reach agreement before passing control to client High availability “l(fā)azy” consistency Reach consistency until next access

20、Reach agreement after passing control to client Gossip, Bayou, Coda,High availability vs. fault tolerance,28,The architecture Front end connects to any of replica manager Query/Update Replica managers exchange “gossip” messages periodically to maintain consistency Two guarantees Each client obtains

21、a consistent service over time Relaxed consistency between replicas All replica managers eventually receive all updates and they apply updates with ordering guarantees,The gossip architecture,29,Request The front end sends the request to a replica manager Query: client may be blocked Update: unblock

22、ed Update response Replica manager replies immediately Coordination Suspend the request until it can be apply May receive gossip messages that sent from other replica managers,Queries and updates in a gossip service,30,Execution The replica manager executes the request Query response Reply at this p

23、oint Agreement exchange gossip messages which contain the most recent updates applied on the replica Exchange occasionally Ask the particular replica manager to send when some replica manager finds it has missed one,Queries and updates in a gossip service continued,31,Exchange gossip message Estimat

24、e the missed messages of one replica manager by its timestamp table Exchange gossip messages periodically or when some other replica manager ask The format or a gossip message m.log: one or more updates in the source replica managers log m.ts: the replica timestamp of the source replica manager,Goss

25、ip messages,32,How often to exchange gossip messages? Minutes, hours or days Depend on the requirement of application How to choose partners to exchange? Random Deterministic Utilize a simple function of the replica managers state to make the choice of partner Topological Mesh, circle, tree,Update p

26、ropagation,33,Limits of AFS Read-only replica The objective of Coda Constant data availability Coda: extend AFS on Read-write replica Optimistic strategy to resolve conflicts Disconnected operation,The Coda file system,34,Venus/Vice Vice: replica manager Venus: hybrid of front end and replica manage

27、r Volume storage group (VSG) The set of servers holding replicas of a file volume Available volume storage group (AVSG) Vice know AVSG of each file Access a file The file is serviced by any server in AVSG,The Coda architecture,35,Replication services Fault-tolerant services Highly available services Summary,Outline,36,Summary,Replication for distributed systems High performance, high availability, fault tolerance Replication for fault tolerance Primary-backup rep

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論