Serengeti - 虛擬化你的大數(shù)據(jù)應(yīng)用計(jì)算機(jī)專業(yè)_第1頁
Serengeti - 虛擬化你的大數(shù)據(jù)應(yīng)用計(jì)算機(jī)專業(yè)_第2頁
Serengeti - 虛擬化你的大數(shù)據(jù)應(yīng)用計(jì)算機(jī)專業(yè)_第3頁
Serengeti - 虛擬化你的大數(shù)據(jù)應(yīng)用計(jì)算機(jī)專業(yè)_第4頁
Serengeti - 虛擬化你的大數(shù)據(jù)應(yīng)用計(jì)算機(jī)專業(yè)_第5頁
已閱讀5頁,還剩36頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、 2009 VMware Inc. All rights reservedSerengeti - 虛擬化你的大數(shù)據(jù)應(yīng)用Agenda Todays big data system Why virtualize hadoop? Serengeti introduction Common questions about virtualization Serengeti solution Deep insight into Serengeti Summary Q&ATodays Big Data System:ETLUnstructured Data (HDFS)Real TimeStruct

2、uredDatabaseBig SQLDataParallelBatchProcessingReal TimeStreamsReal-TimeProcessing(s4,storm)AnalyticsAgenda Todays big data system Why virtualize hadoop? Serengeti introduction Common questions about virtualization Serengeti solution Deep insight into Serengeti Summary Q&AChallenges To Use Hadoop

3、 in physical infrastructureDeployment Difficult to deploy, cost several people for several days even months Difficult to tune cluster performanceLow Efficiency Hadoop clusters are typically not 100% utilized across all hardware resources. Difficult to share resources safely between different workloa

4、dSingle Point of Failure Single point of failure for Name Node and Job tracker No HA for Hive, HCatalog, etc.Why Virtualize Hadoop? - Get your Hadoop cluster in minutes1/1000humanefforts,LeastHadoopoperation knowledgeFullyautomated process,10 minutesto get aHadoop/HBaseclusterfromscratchServer prepa

5、rationOS installationAutomateby Serengeti onvSpherewith best practiceNetwork ConfigurationHadoop Installation andConfigurationManual process, costdaysWhy Virtualize Hadoop? - Consolidate sprawling clustersClustersshareserverswithstrongisolation Single Hardware Infrastructure Unified operations Optim

6、ize Shared Resources = higher utilization Elastic resources = faster on-demand accessHadoop DevHadoopProdHBaseClusterSprawlingSingle purpose clusters for variousbusiness applications lead to clustersprawl.Cluster Consolidation SimplifyFinanceHadoopVirtualization PlatformHadoopDevHadoopProdHBase.Port

7、alHadoopPortalHadoop30%CAPEXDown50%+ resourcesaresittingidlewhilehighpriorityjob isburningup its cluster.Utilizeall resourcesfrompool on demand.Dynamic elasticscalingonsharedresourcepoolWhy Virtualize Hadoop? Utilize all your resources to solve the priority problem3X fasterto getanalyticresultsvSphe

8、re High Availability (HA) - protection against unplanned downtimeOverview Protection against host and VM failures Automatic failure detection (host, guest OS) Automatic virtual machine restart in minutes, on any available host in cluster OS and application-independent,does not require complex config

9、urationchanges(Coordination)ZookeeprManagement ServerHigh Availability for the Hadoop Stack(Hadoop Distributed File System)HBase (Key-Valuestore)HDFSMapReduce (Job Scheduling/Execution System)Pig (Data Flow)HiveBI ReportingETLToolsRDBMSJobtrackerNamenode(SQL)HiveMetaDBHCatalogHcatalog MDBServerX XHA

10、 HAAppOSApp AppOS OSAppOSAppOSAppOSAppOSVMwareESXXVMwareESX Zero downtime, zero data lossfailover for all virtual machines incase of hardware failures Integrated with VMware HA/DRS No complex clustering orspecialized hardware required Single common mechanism for allapplications and operatingFTvSpher

11、e Fault Tolerance provides continuous protectionOverview Single identical VMs running inlockstep on separate hostssystemsZerodowntimeforNameNode,JobTrackerandothercomponentsin HadoopclustersAgenda Todays big data system Why virtualize hadoop? Serengeti introduction Common questions about virtualizat

12、ion Serengeti solution Deep insight into Serengeti Summary Q&AEasy and rapid deployment and managementOpen sourceprojectlaunched in June 2012, 0.8 is released at Apr.and willrelease0.9 at Jun.Toolkitthat leveragevirtualizationto simplifyHadoop deploymentand operationsDeploy a cluster in 10 Minut

13、es fully automatedCustomize Hadoop and HBase clusterAutomated cluster operationCome with eco-system componentsSupport all popular Hadoop DistributionsSerengetiDemo: 10 minutes to a Hadoop cluster with SerengetiAgenda Todays big data system Why virtualize hadoop? Serengeti introduction Common questio

14、ns about virtualization Serengeti solution Deep insight into Serengeti Summary Q&ACommon questions about virtualizationLocal DiskCan local disk be used in virtualization environment?Flexibilityand ScalabilityHow to flexible schedule resources between clusters and differentapplications as mention

15、ed above?Data stabilityIn virtual environment, how can we distribute data across host and rack?Data localityHadoop will schedule compute tasks near by the data, to reduce networkIO for data R/W. Can virtual environment get the same result?PerformanceHow about the performance in virtual environment?A

16、genda Todays big data system Why virtualize hadoop? Serengeti introduction Common questions about virtualization Serengeti solution Deep insight into Serengeti Summary Q&ACan I use local diskeasily?Other VMOther VMOther VMOther VMOther VMOther VMOther VMOther VMHadoopHadoopHadoopHadoopHadoopHado

17、opHadoopHadoopHadoopHadoopSerengetiExtend Virtual StorageArchitectureto IncludeLocalDiskShared Storage:SAN or NAS Easy to provision Automated cluster rebalancingHybrid Storage SAN for boot images, otherworkloads Local disk for Hadoop & HDFSHostHostHostHostHostHostHow to flexiblescalein/scaleoutH

18、ow to flexiblescheduleresourcesbetween clustersanddifferentapplications?-ComputeCurrentHadoop:T1T2VMVMVMVMCombinedStorage/ComputeHadoopinVM- * VM lifecycledeterminedby Datanode- * Limited elasticityVMStorageSeparateStorageVMStorageSeparateComputeClusters- * Separate compute -fromdata- * Remove elast

19、icconstrain- by Datanode- * Elastic compute- * Raise utilization-* Separate virtual compute* Compute clusterpertenant* Stronger VM-grade securityand resourceisolationEvolution of Hadoop on VMs Data/Compute separationSlave NodeSerengeti Node Scale Out / Scale InNameNodeHostDHostJobTrackerCCCCDHostCCC

20、CDHostCCCCDHostCCCCSerengeti Ballooning Enhancement for Java ApplicationJVMGuest OSHostJVMGuest OSHostGuest OSJVMHow to keep data stability?How to access data locallyif data node and computenodeare located in differentVM?DatanodeandtasktrackercombinedclusterDataComputeseparatedclustermasterHostworke

21、rHostworkerHostmasterHostData nodeHostTasktrackerData nodeHostTasktrackerTasktrackerTasktrackerData nodeHostComputeonly cluster1Computeonly cluster2HDFS clusterCompute OnlyclusterRack1Rack2Rack1Distributed and Data/Compute Associated VM PlacementRack2Rack1Job trackerJob trackerName nodeHostRack2Task

22、trackerTasktrackerData nodeHostHadoopTopologyChangesfor VirtualizationHadoop Topology Awareness Serengeti HVE/D1D2R1R2N1H1H2H3H4H5H6H7H8H9H10H11H12R3R43/D1D2R1R2H1H2H3H4H5H6H7H8H9H10H11H12R3R423N2N3N4N5N6N7N81 12 321 1234HADOOP-8468(UmbrellaJIRA)HADOOP-8469HDFS-3495HDFS-3498HadoopNetworkTopologyExte

23、nsionHadoop Virtualization Extensions for TopologyHVETaskScheduling PolicyExtensionBalancerPolicy ExtensionReplicaChoosing PolicyExtensionReplicaPlacement PolicyExtensionReplicaRemovalPolicyExtensionHDFSMapReduceHadoop CommonMAPREDUCE-4310MAPREDUCE-4309HADOOP-8470HADOOP-8472Is there significantperfo

24、rmancedegradationin virtualizationenvironment?Is there any performancedata?Virtualized Hadoop PerformanceNative versus Virtual Platforms, 32 hosts, 16 disks/hostSource: http:/ Todays big data system Why virtualize hadoop? Serengeti introduction Common questions about virtualization Serengeti solutio

25、n Deep insight into Serengeti Summary Q&ARestAPISpringBatchUpdateMetaDBstepVMPlacementcalculationVMProvisionstepSoftwareMgmtstepUI ClientFlex UISerengeti architecture diagramCLI ClientSpring ShellSerengetiWebServiceHibernate/DAOvPostgresVC adapterIronfanserviceThriftServiceProgressIronfanreportC

26、hefserverRestAPICookbookVHMstepRabbitMQVM runtimeManagerHostHostHostHostHostVirtualization PlatformHadoopNodeChefClientHA kitHadoopNodeHadoopNodePackagerepositoryvCenterCustomizing your Hadoop/HBase cluster with Serengeti Choiceof distros Storageconfiguration Choice of shared storage or Local disk R

27、esourceconfiguration High availabilityoption # of nodesdistro:apache,groups: name:master,roles:hadoop_namenode,hadoop_jobtracker”,storage: type: SHARED,sizeGB: 20,instance_type:MEDIUM,instance_num:1,ha:true,name:worker,roles:hadoop_datanode,hadoop_tasktracker,instance_type:SMALL,instance_num:5,ha:fa

28、lseOne command to scale out your cluster with Serengeticluster resize name -nodegroup worker instanceNum Configure/reconfigure Hadoop with ease by SerengetiModifyHadoop clusterconfigurationfromSerengeti Use the “configuration” section of the json spec file Specify Hadoop attributes in core-site.xml,

29、 hdfs-site.xml, mapred-site.xml,hadoop-env.sh, perties Apply new Hadoop configuration using the edited spec fileconfiguration:hadoop:core-site.xml: / check for all settings at /common/docs/r1.0.0/core-default.html,hdfs-site.xml:/ check for all settings at http:/hadoop

30、./common/docs/r1.0.0/hdfs-default.html,mapred-site.xml:/ check for all settings at /common/docs/r1.0.0/mapred-default.htmlio.sort.mb: 300,hadoop-env.sh:/ HADOOP_HEAPSIZE:,/ HADOOP_NAMENODE_OPTS:,/ HADOOP_DATANODE_OPTS:, cluster config -name myHadoop -specFile /home/s

31、erengeti/myHadoop.jsonFreedom of Choice and Open SourceCommunity ProjectsDistributions Flexibilityto choosefrom major distributionscluster create -name myHadoop -distro apache Supportfor multipleprojects Open architectureto welcomeindustryparticipation ContributingHadoop VirtualizationExtensions(HVE

32、)to opensourcecommunityHDFS2 with Namenode Federation and HADeploy CDH4 Hadoop cluster Name Node Federation Name Node HA MapReduce v1 HBase, Pig, Hive, and Hive ServerCDH4 configurationsScale outElasticityJobTracker HA/FTActiveNamenodeStandby NamenodeActiveNamenodeStandby NamenodeZookeeper GroupZKZKZKCoordinateNamenodeGroup1Coordinate

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論