版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、百度系統(tǒng)部Hadoop Distributed File SystemWhat is HadoopOpen Source, JavaApache開源組織下Lucene(開源搜索引擎)的一個(gè)子項(xiàng)目 map-reduce engine + HDFS(+Hbase) Hadoop不應(yīng)該簡(jiǎn)簡(jiǎn)單單地被認(rèn)為是一個(gè)分布式文件系統(tǒng),實(shí)際上Hadoop是一套完善的分布式計(jì)算和存儲(chǔ)基礎(chǔ)設(shè)施。 What is HDFSHDFS(Hadoop Distributed filesystem)被設(shè)計(jì)用來在大型集群上(由普通硬件設(shè)備組成)執(zhí)行分布式應(yīng)用的底層框架,而并非一個(gè)單純用于存儲(chǔ)的分布式文件系統(tǒng)適合大數(shù)據(jù)集的應(yīng)用程序
2、高可靠性和高可用性支持map-reduce編程模型其它類GFS系統(tǒng)KFS(Kosmos Filesystem), 來自startup垂直搜索引擎的開源項(xiàng)目, c+ , Kosmix 僅僅是一個(gè)文件系統(tǒng),沒有MapReduce層Backing store for other open source projects: Hadoop (provides a Map/Reduce implementation ) Hypertable (provides a Big-Table interface, Zvents Inc)DisadvantageGFS支持低效的re-write和高效的并發(fā)appen
3、d操作,而HDFS目前還不支持rewrite和append。HDFS只允許一次性地創(chuàng)建文件,創(chuàng)建時(shí)就需要寫入數(shù)據(jù),一旦創(chuàng)建完畢就不能再修改,嚴(yán)格的遵守“one-writer-write-once & read-many” 。 然而,現(xiàn)在有很多應(yīng)用對(duì)append都有需求。比如,不斷往HDFS中的一個(gè)文件進(jìn)行日志追加。Our plan實(shí)現(xiàn)單一Client端append和truncate: HDFS允許多次打開文件進(jìn)行修改(append和truncate),每一次都只允許一個(gè)client進(jìn)行修改,修改的過程中允許多個(gè)client并發(fā)讀。 ArchitectureMaster/Slave Arch.a
4、 single namenode and multiple datanodesNamenodeexecutes file system namespace operations like opening, closing, and renaming files and directoriesdetermines the mapping of blocks to DatanodesArchitectureDatanodesDatanodes are responsible for serving read and write requests from the file systems clie
5、nts. Datanodes also perform block creation, deletion, and replication upon instruction from the Namenode.ArchitectureNamenodeServes as both diretory namespace manager and “inode table”Filename-blocksequence(namespace), stored on disk and is very preciousBlock-machinelist(“inodes”), rebuilt every tim
6、e the NameNode comes upNamenodeInitiation:new FSNamesystem:Load FS ImageCheck and trigger safe mode if neededSet the total number of blocks in the systemRecord all blocks that are getting replicatedStart monitorsStart http serverstart RPC server Start Trash Emptier threadMonitorsSafeModeMonitorPerio
7、dically check whether it is time to leave safe mode.PendingReplicationMonitorA periodic thread that scans for blocks that never finished their replication request.HeartbeatMonitorPeriodically Check if there are any expired heartbeats.MonitorsLeaseMonitorPeriodically checks for leases that have expir
8、ed, and disposes of them.ReplicationMonitorPeriodically Look at a few datanodes and compute any replication work that can be scheduled on them. missionedMonitorPeriodically check if any of the nodes being missioned has finished moving all its datablocks to another replica.Data ReplicationStores each
9、 file as a sequence of blocksBlocks of a file are replicated for fault toleranceThe replication factor can be specified at file creation time and can be changed laterFiles in HDFS are write-once and have strictly one writer at any timeData ReplicationData ReplicationThe Namenode makes all decisions
10、regarding replication of blocksNamenode receives Heartbeat and Blockreport from datanodesHeartbeat: Im live! (3 seconds)Blockreport: all blocks on datanode(1 hour)HeartbeatMonitordatanode向namenode發(fā)送heartbeat(TCP)一個(gè)間隔內(nèi)沒有收到heartbeat,則認(rèn)為datanode為dead每一次只允許一個(gè)datanode被標(biāo)記為dead更新需要復(fù)制的block數(shù)響應(yīng)時(shí)攜帶命令:看是否有需要復(fù)制
11、block的工作和需要?jiǎng)h除block的工作要做ReplicationMonitor計(jì)算需要復(fù)制的塊,如果沒有復(fù)制工作,就計(jì)算需要?jiǎng)h除的塊默認(rèn)每3秒種進(jìn)行一次每次只處理32%的datanode如果某一個(gè)datanode的復(fù)制塊負(fù)載比較大,會(huì)跳過,而不再添加新的工作(默認(rèn)只能同時(shí)處理2個(gè))SafeModenamenode一種特殊的狀態(tài),此時(shí)的namenode不接受任何對(duì)命名空間的操作,也不進(jìn)行任何副本數(shù)目調(diào)整。namenode啟動(dòng)的時(shí)候會(huì)自動(dòng)進(jìn)入安全模式,接受來自數(shù)據(jù)節(jié)點(diǎn)的心跳和塊報(bào)告,并檢查數(shù)據(jù)塊的列表。當(dāng)一個(gè)塊的副本數(shù)大于配置的最小復(fù)制數(shù)(dfs.replication.min)時(shí),該塊就被認(rèn)
12、為是安全的;當(dāng)檢測(cè)到系統(tǒng)已達(dá)到配置的塊安全復(fù)制比例(dfs.safemode.threshold.pct),namenode會(huì)持續(xù)一段時(shí)間(通過dfs.safemode.extension配置)的安全模式,讓剩余的datanode完成注冊(cè)(check in),就自動(dòng)退出安全模式。SafeMode可以通過調(diào)用DFSAdmin中的setSafeMode命令手動(dòng)地進(jìn)入或退出安全模式。 說明:如果threshold配置為0或命名空間為空,namenode啟動(dòng)時(shí)將不會(huì)自動(dòng)進(jìn)入安全模式;如果threshold的值大于1,namenode將只能手動(dòng)退出。SafemodeMonitor檢查Namonode是否
13、可以離開安全模式 默認(rèn)每1秒種進(jìn)行一次如果可以離開,則退出安全模式,并停止該MonitorLease與鎖的區(qū)別:時(shí)限Client在創(chuàng)建文件時(shí),需要先向namenode申請(qǐng)一個(gè)lease,目的是為了防止有失效的Client長(zhǎng)久地占有節(jié)點(diǎn)服務(wù)器的資源。namenode假定在一段時(shí)間后沒有收到Client的lease 更新調(diào)用就認(rèn)為該Client“死掉”,必須釋放掉它在該節(jié)點(diǎn)上持有的資源。namenode使用一種名叫l(wèi)eases的類來實(shí)現(xiàn)這種機(jī)制。每個(gè)lease記錄了該lease對(duì)應(yīng)的資源(file)、lease持有者(Client)和上次renew lease的時(shí)間。Lease客戶端通過周期性地調(diào)
14、用renewLease向namenode表明自己alive,如果namenode在一定的時(shí)間內(nèi)沒有收到某個(gè)客戶端對(duì)該函數(shù)的調(diào)用,便認(rèn)為該客戶端已經(jīng)死掉。 如果lease超時(shí),該lease實(shí)例會(huì)使用一個(gè)線程來進(jìn)行資源清理工作,該線程會(huì)在lease關(guān)閉的時(shí)候終止。LeaseMonitor檢查當(dāng)前是否有l(wèi)ease,lease按照創(chuàng)建時(shí)間進(jìn)行排序 默認(rèn)每2秒種進(jìn)行一次每次只處理第一個(gè)leaseLease如果超時(shí)(1個(gè)小時(shí)),就將該lease刪除 Filesystem Managementtrack several important tablesvalid fsname - blocklist (ke
15、pt on disk, logged)Set of all valid blocksblock - machinelist (kept in memory, rebuilt dynamically from reports) machine - blocklist LRU cache of updated-heartbeat machinesFilesystem Managementabstract class INode implements Comparable protected byte name;protected INodeDirectory parent;protected lo
16、ng modificationTime;Filesystem Managementpublic class INode enum FileType DIRECTORY, FILE public static final FileType FILE_TYPES = FileType.DIRECTORY, FileType.FILE ; public static final INode DIRECTORY_INODE = new INode(FileType.DIRECTORY, null); private FileType fileType; private Block blocks; Fi
17、lesystem Managementclass INodeDirectory extends INode protected static final int DEFAULT_FILES_PER_DIRECTORY = 5; final static String ROOT_NAME = ; private List children; class INodeFile extends INode private BlockInfo blocks = null; protected short blockReplication; protected long preferredBlockSiz
18、e; Filesystem Managementclass INodeDirectory extends INode protected static final int DEFAULT_FILES_PER_DIRECTORY = 5; final static String ROOT_NAME = ; private List children; class INodeFile extends INode private BlockInfo blocks = null; protected short blockReplication; protected long preferredBlo
19、ckSize; Filesystem Managementclass LocatedBlock implements Writable private Block b; private long offset; /offset of the first byte of the block in the file private DatanodeInfo locs; Filesystem Managementpublic class DatanodeDescriptor extends DatanodeInfo private volatile BlockInfo blockList = nul
20、l; protected boolean isAlive = false; List replicateBlocks; List replicateTargetSets; List invalidateBlocks; static class DatanodeImage implements parable DatanodeDescriptor node; Filesystem Managementclass BlocksMap static class BlockInfo extends Block private INodeFile inode; private Object triple
21、ts;private static class NodeIterator implements Iterator private BlockInfo blockInfo; private int nextIdx = 0; Filesystem ManagementArrayList heartbeats = new ArrayList();private Map leases = new TreeMap();private SortedSet sortedLeases = new TreeSet();Persistence of Filesystem MetadataEditLogA tran
22、saction log: persistently record every change that occurs to file system metadata:OP_ADD,OP_RENAME,OP_DELETE,OP_MKDIR,OP_SET_REPLICATION,OP_DATANODE_ADD,OP_DATANODE_REMOVE(datanode只持久化一部分)FsImageStores the entire file system namespace, including the mapping of blocks to files and file system propert
23、iesCheckpointNamenode startupPeriodic checkpointing(secondary namenode, HTTP)checkpointdoCheckpoint()doSetup(); / Do the required initialization of the merge work node.rollEditLog(); / start logging transactions in a new edit filegetFSImage(); / Fetch fsimagegetFSEdits(); / Fetch edistdoMer
24、ge(); / Do the mergeputFSImage(token); / Upload the new image into the NameNodenamenode.rollFsImage();checkpointprivate void doMerge() throws IOException fsImage.loadFSImage(srcImage);fsImage.getEditLog().loadFSEdits(editFile);fsImage.saveFSImage(destImage);checkpoint loadFSEdits(File edits) case OP
25、_ADD : unprotectedAddFile case OP_SET_REPLICATION : unprotectedSetReplicationcase OP_RENAME : unprotectedRenameTo case OP_DELETE : unprotectedDeletecase OP_MKDIR: unprotectedMkdircase OP_DATANODE_ADDcase OP_DATANODE_REMOVENamenodeclose:close namesystemstop PendingReplication daemonstop http serverInterrupt Heartbeat daemonInterrupt Replication daemonInter
溫馨提示
- 1. 本站所有資源如無(wú)特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 電子產(chǎn)品回收利用合同
- 旅行社業(yè)務(wù)合同
- 報(bào)紙新聞的都市新聞深度報(bào)道解讀考核試卷
- 石油化工行業(yè)數(shù)字化轉(zhuǎn)型投資合同
- 塑料管材的超聲波焊接技術(shù)考核試卷
- 高科技領(lǐng)域創(chuàng)新研發(fā)合作合同書
- 體育設(shè)施擴(kuò)展與搬遷考核試卷
- 丹麥智能家居市場(chǎng)機(jī)會(huì)預(yù)測(cè)報(bào)告考核試卷
- 建筑陶瓷戶外裝飾技術(shù)應(yīng)用考核試卷
- 寵物藥品的寵物主人健康需求滿足與市場(chǎng)開發(fā)考核試卷
- 深圳市物業(yè)專項(xiàng)維修資金管理系統(tǒng)操作手冊(cè)(電子票據(jù))
- 2023年鐵嶺衛(wèi)生職業(yè)學(xué)院高職單招(數(shù)學(xué))試題庫(kù)含答案解析
- 電力安全工作規(guī)程(電網(wǎng)建設(shè)部分)2023年
- 呆死帳的發(fā)生與預(yù)防課件
- 10000中國(guó)普通人名大全
- 起重機(jī)械安裝吊裝危險(xiǎn)源辨識(shí)、風(fēng)險(xiǎn)評(píng)價(jià)表
- 華北理工兒童口腔醫(yī)學(xué)教案06兒童咬合誘導(dǎo)
- 中國(guó)建筑項(xiàng)目管理表格
- 高一3班第一次月考總結(jié)班會(huì)課件
- 公共政策分析導(dǎo)論教學(xué)課件匯總完整版電子教案
- 我國(guó)油菜生產(chǎn)機(jī)械化技術(shù)(-119)
評(píng)論
0/150
提交評(píng)論