版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、 2009 VMware Inc. All rights reservedSerengeti - 虛擬化你的大數(shù)據(jù)應(yīng)用Agenda Todays big data system Why virtualize hadoop? Serengeti introduction Common questions about virtualization Serengeti solution Deep insight into Serengeti Summary Q&ATodays Big Data System:ETLUnstructured Data (HDFS)Real TimeStruct
2、uredDatabaseBig SQLDataParallelBatchProcessingReal TimeStreamsReal-TimeProcessing(s4,storm)AnalyticsAgenda Todays big data system Why virtualize hadoop? Serengeti introduction Common questions about virtualization Serengeti solution Deep insight into Serengeti Summary Q&AChallenges To Use Hadoop
3、 in physical infrastructureDeployment Difficult to deploy, cost several people for several days even months Difficult to tune cluster performanceLow Efficiency Hadoop clusters are typically not 100% utilized across all hardware resources. Difficult to share resources safely between different workloa
4、dSingle Point of Failure Single point of failure for Name Node and Job tracker No HA for Hive, HCatalog, etc.Why Virtualize Hadoop? - Get your Hadoop cluster in minutes1/1000humanefforts,LeastHadoopoperation knowledgeFullyautomated process,10 minutesto get aHadoop/HBaseclusterfromscratchServer prepa
5、rationOS installationAutomateby Serengeti onvSpherewith best practiceNetwork ConfigurationHadoop Installation andConfigurationManual process, costdaysWhy Virtualize Hadoop? - Consolidate sprawling clustersClustersshareserverswithstrongisolation Single Hardware Infrastructure Unified operations Optim
6、ize Shared Resources = higher utilization Elastic resources = faster on-demand accessHadoop DevHadoopProdHBaseClusterSprawlingSingle purpose clusters for variousbusiness applications lead to clustersprawl.Cluster Consolidation SimplifyFinanceHadoopVirtualization PlatformHadoopDevHadoopProdHBase.Port
7、alHadoopPortalHadoop30%CAPEXDown50%+ resourcesaresittingidlewhilehighpriorityjob isburningup its cluster.Utilizeall resourcesfrompool on demand.Dynamic elasticscalingonsharedresourcepoolWhy Virtualize Hadoop? Utilize all your resources to solve the priority problem3X fasterto getanalyticresultsvSphe
8、re High Availability (HA) - protection against unplanned downtimeOverview Protection against host and VM failures Automatic failure detection (host, guest OS) Automatic virtual machine restart in minutes, on any available host in cluster OS and application-independent,does not require complex config
9、urationchanges(Coordination)ZookeeprManagement ServerHigh Availability for the Hadoop Stack(Hadoop Distributed File System)HBase (Key-Valuestore)HDFSMapReduce (Job Scheduling/Execution System)Pig (Data Flow)HiveBI ReportingETLToolsRDBMSJobtrackerNamenode(SQL)HiveMetaDBHCatalogHcatalog MDBServerX XHA
10、 HAAppOSApp AppOS OSAppOSAppOSAppOSAppOSVMwareESXXVMwareESX Zero downtime, zero data lossfailover for all virtual machines incase of hardware failures Integrated with VMware HA/DRS No complex clustering orspecialized hardware required Single common mechanism for allapplications and operatingFTvSpher
11、e Fault Tolerance provides continuous protectionOverview Single identical VMs running inlockstep on separate hostssystemsZerodowntimeforNameNode,JobTrackerandothercomponentsin HadoopclustersAgenda Todays big data system Why virtualize hadoop? Serengeti introduction Common questions about virtualizat
12、ion Serengeti solution Deep insight into Serengeti Summary Q&AEasy and rapid deployment and managementOpen sourceprojectlaunched in June 2012, 0.8 is released at Apr.and willrelease0.9 at Jun.Toolkitthat leveragevirtualizationto simplifyHadoop deploymentand operationsDeploy a cluster in 10 Minut
13、es fully automatedCustomize Hadoop and HBase clusterAutomated cluster operationCome with eco-system componentsSupport all popular Hadoop DistributionsSerengetiDemo: 10 minutes to a Hadoop cluster with SerengetiAgenda Todays big data system Why virtualize hadoop? Serengeti introduction Common questio
14、ns about virtualization Serengeti solution Deep insight into Serengeti Summary Q&ACommon questions about virtualizationLocal DiskCan local disk be used in virtualization environment?Flexibilityand ScalabilityHow to flexible schedule resources between clusters and differentapplications as mention
15、ed above?Data stabilityIn virtual environment, how can we distribute data across host and rack?Data localityHadoop will schedule compute tasks near by the data, to reduce networkIO for data R/W. Can virtual environment get the same result?PerformanceHow about the performance in virtual environment?A
16、genda Todays big data system Why virtualize hadoop? Serengeti introduction Common questions about virtualization Serengeti solution Deep insight into Serengeti Summary Q&ACan I use local diskeasily?Other VMOther VMOther VMOther VMOther VMOther VMOther VMOther VMHadoopHadoopHadoopHadoopHadoopHado
17、opHadoopHadoopHadoopHadoopSerengetiExtend Virtual StorageArchitectureto IncludeLocalDiskShared Storage:SAN or NAS Easy to provision Automated cluster rebalancingHybrid Storage SAN for boot images, otherworkloads Local disk for Hadoop & HDFSHostHostHostHostHostHostHow to flexiblescalein/scaleoutH
18、ow to flexiblescheduleresourcesbetween clustersanddifferentapplications?-ComputeCurrentHadoop:T1T2VMVMVMVMCombinedStorage/ComputeHadoopinVM- * VM lifecycledeterminedby Datanode- * Limited elasticityVMStorageSeparateStorageVMStorageSeparateComputeClusters- * Separate compute -fromdata- * Remove elast
19、icconstrain- by Datanode- * Elastic compute- * Raise utilization-* Separate virtual compute* Compute clusterpertenant* Stronger VM-grade securityand resourceisolationEvolution of Hadoop on VMs Data/Compute separationSlave NodeSerengeti Node Scale Out / Scale InNameNodeHostDHostJobTrackerCCCCDHostCCC
20、CDHostCCCCDHostCCCCSerengeti Ballooning Enhancement for Java ApplicationJVMGuest OSHostJVMGuest OSHostGuest OSJVMHow to keep data stability?How to access data locallyif data node and computenodeare located in differentVM?DatanodeandtasktrackercombinedclusterDataComputeseparatedclustermasterHostworke
21、rHostworkerHostmasterHostData nodeHostTasktrackerData nodeHostTasktrackerTasktrackerTasktrackerData nodeHostComputeonly cluster1Computeonly cluster2HDFS clusterCompute OnlyclusterRack1Rack2Rack1Distributed and Data/Compute Associated VM PlacementRack2Rack1Job trackerJob trackerName nodeHostRack2Task
22、trackerTasktrackerData nodeHostHadoopTopologyChangesfor VirtualizationHadoop Topology Awareness Serengeti HVE/D1D2R1R2N1H1H2H3H4H5H6H7H8H9H10H11H12R3R43/D1D2R1R2H1H2H3H4H5H6H7H8H9H10H11H12R3R423N2N3N4N5N6N7N81 12 321 1234HADOOP-8468(UmbrellaJIRA)HADOOP-8469HDFS-3495HDFS-3498HadoopNetworkTopologyExte
23、nsionHadoop Virtualization Extensions for TopologyHVETaskScheduling PolicyExtensionBalancerPolicy ExtensionReplicaChoosing PolicyExtensionReplicaPlacement PolicyExtensionReplicaRemovalPolicyExtensionHDFSMapReduceHadoop CommonMAPREDUCE-4310MAPREDUCE-4309HADOOP-8470HADOOP-8472Is there significantperfo
24、rmancedegradationin virtualizationenvironment?Is there any performancedata?Virtualized Hadoop PerformanceNative versus Virtual Platforms, 32 hosts, 16 disks/hostSource: http:/ Todays big data system Why virtualize hadoop? Serengeti introduction Common questions about virtualization Serengeti solutio
25、n Deep insight into Serengeti Summary Q&ARestAPISpringBatchUpdateMetaDBstepVMPlacementcalculationVMProvisionstepSoftwareMgmtstepUI ClientFlex UISerengeti architecture diagramCLI ClientSpring ShellSerengetiWebServiceHibernate/DAOvPostgresVC adapterIronfanserviceThriftServiceProgressIronfanreportC
26、hefserverRestAPICookbookVHMstepRabbitMQVM runtimeManagerHostHostHostHostHostVirtualization PlatformHadoopNodeChefClientHA kitHadoopNodeHadoopNodePackagerepositoryvCenterCustomizing your Hadoop/HBase cluster with Serengeti Choiceof distros Storageconfiguration Choice of shared storage or Local disk R
27、esourceconfiguration High availabilityoption # of nodesdistro:apache,groups: name:master,roles:hadoop_namenode,hadoop_jobtracker”,storage: type: SHARED,sizeGB: 20,instance_type:MEDIUM,instance_num:1,ha:true,name:worker,roles:hadoop_datanode,hadoop_tasktracker,instance_type:SMALL,instance_num:5,ha:fa
28、lseOne command to scale out your cluster with Serengeticluster resize name -nodegroup worker instanceNum Configure/reconfigure Hadoop with ease by SerengetiModifyHadoop clusterconfigurationfromSerengeti Use the “configuration” section of the json spec file Specify Hadoop attributes in core-site.xml,
29、 hdfs-site.xml, mapred-site.xml,hadoop-env.sh, perties Apply new Hadoop configuration using the edited spec fileconfiguration:hadoop:core-site.xml: / check for all settings at /common/docs/r1.0.0/core-default.html,hdfs-site.xml:/ check for all settings at http:/hadoop
30、./common/docs/r1.0.0/hdfs-default.html,mapred-site.xml:/ check for all settings at /common/docs/r1.0.0/mapred-default.htmlio.sort.mb: 300,hadoop-env.sh:/ HADOOP_HEAPSIZE:,/ HADOOP_NAMENODE_OPTS:,/ HADOOP_DATANODE_OPTS:, cluster config -name myHadoop -specFile /home/s
31、erengeti/myHadoop.jsonFreedom of Choice and Open SourceCommunity ProjectsDistributions Flexibilityto choosefrom major distributionscluster create -name myHadoop -distro apache Supportfor multipleprojects Open architectureto welcomeindustryparticipation ContributingHadoop VirtualizationExtensions(HVE
32、)to opensourcecommunityHDFS2 with Namenode Federation and HADeploy CDH4 Hadoop cluster Name Node Federation Name Node HA MapReduce v1 HBase, Pig, Hive, and Hive ServerCDH4 configurationsScale outElasticityJobTracker HA/FTActiveNamenodeStandby NamenodeActiveNamenodeStandby NamenodeZookeeper GroupZKZKZKCoordinateNamenodeGroup1Coordinate
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 二零二五年度四方擔(dān)保企業(yè)信用貸款合同范本
- 二零二五年度產(chǎn)業(yè)園區(qū)合伙人入駐管理合同3篇
- 二零二五年度綠色節(jié)能門窗改造工程合同4篇
- 2025年度模特時(shí)尚產(chǎn)品代言合同4篇
- 二零二五年度土地承包權(quán)轉(zhuǎn)讓與農(nóng)村產(chǎn)權(quán)交易服務(wù)合同范本
- 2025年度海上風(fēng)電場(chǎng)建設(shè)與運(yùn)維合同4篇
- 2025年度公共安全項(xiàng)目驗(yàn)收流程及合同法應(yīng)用要求3篇
- 二零二五年度企業(yè)年會(huì)主題服裝租賃合同協(xié)議書4篇
- 2025年度個(gè)人商標(biāo)使用權(quán)授權(quán)委托合同3篇
- 2025年零星勞務(wù)合同模板:全新升級(jí)2篇
- 平安產(chǎn)險(xiǎn)陜西省地方財(cái)政生豬價(jià)格保險(xiǎn)條款
- 銅礦成礦作用與地質(zhì)環(huán)境分析
- 30題紀(jì)檢監(jiān)察位崗位常見面試問題含HR問題考察點(diǎn)及參考回答
- 詢價(jià)函模板(非常詳盡)
- 《AI營(yíng)銷畫布:數(shù)字化營(yíng)銷的落地與實(shí)戰(zhàn)》
- 麻醉藥品、精神藥品、放射性藥品、醫(yī)療用毒性藥品及藥品類易制毒化學(xué)品等特殊管理藥品的使用與管理規(guī)章制度
- 一個(gè)28歲的漂亮小媳婦在某公司打工-被老板看上之后
- 乘務(wù)培訓(xùn)4有限時(shí)間水上迫降
- 2023年低年級(jí)寫話教學(xué)評(píng)語方法(五篇)
- DB22T 1655-2012結(jié)直腸外科術(shù)前腸道準(zhǔn)備技術(shù)要求
- GB/T 16474-2011變形鋁及鋁合金牌號(hào)表示方法
評(píng)論
0/150
提交評(píng)論