What'snewinCondorWhat'scomingupCondorWeek_第1頁(yè)
What'snewinCondorWhat'scomingupCondorWeek_第2頁(yè)
What'snewinCondorWhat'scomingupCondorWeek_第3頁(yè)
What'snewinCondorWhat'scomingupCondorWeek_第4頁(yè)
What'snewinCondorWhat'scomingupCondorWeek_第5頁(yè)
已閱讀5頁(yè),還剩44頁(yè)未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶(hù)提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、Condor ProjectComputer Sciences DepartmentUniversity of Wisconsin-MadisonWhats new in Condor?Whats coming up?Condor Week 20092Release SituationStable SerieshCurrent: Condor v7.2.2 (April 14 2009)hLast Year: Condor v7.0.1 (Feb 27th 2008)Development SerieshCurrent: Condor v7.3.0 (

2、Feb 24 2009) v7.3.1 “any day”hLast Year : Condor v7.1.0 (April 1st 2008)How long is development taking?hv6.9 Series : 18 monthshv7.1 Series : 12 monthshv7.3 Series : plan says done in July 093New Ports In 7.2.0 and BeyondFull ports: Debian 5.0 x86 & x86_64Also added condor_c

3、ompile support for 4Big new goodies in v7.0Virtual Machine UniverseScalability ImprovementsGCB ImprovementsPrivilege SeparationNew Quill“Crondor”5Big new goodies in v7.2Job RouterStartd and Job Router hooksDAGMan tagging and splicingGreen Computing s

4、tartedGLEXECConcurrency L6Job RouterAutomated way to let jobs run on a wider array of resourceshTransform jobs into different formshReroute jobs to different 7What is “job routing”?7Universe = “vanilla”Executable = “sim”Arguments = “seed=34

5、5”O(jiān)utput = “stdout.345”Error = “stderr.345”ShouldTransferFiles = TrueWhenToTransferOutput = “ON_EXIT”Universe = “grid”GridType = “gt2”GridResource = “/jobmanager-condor”Executable = “sim”Arguments = “seed=345”O(jiān)utput = “stdout”Error = “stderr”ShouldTransferFiles = TrueWhenToTran

6、sferOutput = “ON_EXIT”JobRouterRouting Table: Site 1 Site 2 final statusrouted (grid) joboriginal (vanilla) 8Routing is just site-level matchmakingWith feedback from job queue number of jobs currently routed to site X number of idle jobs routed to site X rate of recent succes

7、s/failure at site XAnd with power to modify job ad change attribute values (e.g. Universe) insert new attributes (e.g. GridResource) add a “portal” grid proxy if 9Startd Job HooksUsers wanted to take advantage of Condors resource management daemon (condor_startd) to run

8、jobs, but they had their own scheduling system.hSpecialized scheduling needshJobs live in their own database or other storage rather than a Condor job 10Job Router HooksTruly transform jobs, not just reroute themhE.g. stuff a job into a virtual machine (either VM universe

9、or Amazon EC2)Hooks invoked like startd 11Our solutionMake a system of generic “hooks” that you can plug into:hA hook is a point during the life-cycle of a job where the Condor daemons will invoke an external programhHook Condor to your existing job management system witho

10、ut modifying the Condor 12DAGMan Depth First E13Category ExampleSetupCleanupBig jobSmall jobSmall jobSmall jobBig jobSmall jobSmall jobSmall jobBig jobSmall jobSmall jobSmall jobRun = 2Run = 514DAGMan SplicingABX+AX+CX+BX+DY+

11、AY+CY+BY+DZ+AZ+CZ+BZ+D# Example Use CaseJOB A A.subJOB B B.subSPLICE X diamond.dagSPLICE Y diamond.dagSPLICE Z diamond.dagPARENT A CHILD X Y ZPARENT X Y Z CHILD B# Notice scoping of node!Splicing creates one “in memory”DAG. No subdags means noextra condor_15Green Computi

12、ngThe startd has the ability to place a machine into a low power state. (Standby, Hibernate, Soft-Off, etc.)hHIBERNATE, HIBERNATE_CHECK_INTERVAL hIf all slots return non-zero, then the machine is powered down; otherwise; it continues running.Machine ClassAd contains all information required for a cl

13、ient to wake it uphCondor can wake it up, also a standalone tool.hThis was NOT as easy as it should be.Machines in “Offline State”hStored persistently to diskhLots of other 16Concurrency LimitsLimit job execution based on admin-defined consumable resourceshE.g. licensesCan h

14、ave many different limitsJobs say what resources they needNegotiator enforces limits 17Concurrency ExampleNegotiator config filehMATLAB_LIMIT = 5hNFS_LIMIT = 20Job submit filehconcurrency_limits = matlab,nfs:3hThis requests 1 Matlab token and 3 NFS tokens17www.condorp

15、18Other goodies in v7.2ALLOW/DENY_CLIENTJob queue backup on local diskPREEMPTION_REQUIREMENTS and RANK can reference additional attributes in negotiator about group resource usage Start on dynamic provisioning in the startd$()19Dynamic Slot PartitioningDivide slots int

16、o chunks sized for matched jobsReadvertise remaining resourcesPartitionable resources are cpus, memory, and diskSee Matt Farrellees 20Dynamic Partitioning CaveatsCannot preempt original slot or group of sub-slotshPotential starvation of jobs with large resource requirement

17、sPartitioning happens once per slot each negotiation cyclehScheduling of large slots may be 21New Variable Substitution$(Foo) in submit filehExisting featurehAttribute Foo from machine ad substituted$(Memory * 0.9) in submit filehNew featurehExpression is evaluated and the

18、n 22More Info For PreemptionNew attributes for these preemption expressions in the negotiatorhPREEMPTION_REQUIREMENTShPREEMPTION_RANKUsed for controlling preemption due to user 23Right then.What about v7.3.x and beyond?Terms of Licen

19、seAny and all dates in these slides are relative from a date hereby unspecified in the event of a likely situation involving a frequent condition. Viewing, use, reproduction, display, modification and redistribution of these slides, with or without modification, in source and binary forms, is permit

20、ted only after a deposit by said user into PayPal accounts registered to Todd Tannenbaum . 25Some tasty dishes cooking in the Condor kitchenSpecial guest, Julia Child!26Already served (leftovers)CCB Condor Connection BrokerhDan Bradleys presentationBring che

21、ckpoint/restart to Vanilla JobhPete Kellers presentation re DMTCPAsynch notification of events to fill a hole in Condors web service APIhJungha Woos presentationGrid Universe improvementshXin Zhaos 27Data “Drinks”Wando Fishbowl Anyone?28Condor +

22、Hadoop FS !Lots of hard work by Faisal Khan MotivationhCondor+HDFS = 2 + 2 = 5 !hA Synergy exists (next slide) Hadoop as distributed storage system Condor as cluster management systemhLarge number of distributed disks in a compute clusterManaging disk as a 29Condor + HDF

23、S Dhruba Borthakurs talkSynergyhCondor knows a lot about its cluster Capability of individual machines in terms of available memory, CPU load, disk space etc. Availability of JRE (Java Universe)hCondor can easily automate house keeping jobs e.g rebalancing data blocks Implementing user file quotawww

24、.30Condor + HDFSSynergyhFailover High availability daemon in CondorhClassAds Let clients know the current IP of name server H31condor_hdfs daemonMain integration point of HDFS within CondorConfigures HDFS cluster based on existing condor_config filesRuns

25、 under condor_master and can be controlled by existing Condor utilitiesPublish interesting parameters to Collector e.g IP address, node type, disk activityCurrently deployed at UW-M32Condor + HDFS : Next StepsFileNode FailoverBlock placement policies & managementThinki

26、ng about how Condor can steer jobs to the datahVia a ClassAd function used in the RANK expression?Integrate with File Transfer M33More Job Sandbox OptionsCondors File Transfer mechanismhCurrently moves files between submit and execute hosts (shadow and starter).hNext : F

27、iles can have URLs HTTP HDFShHow about Condors SPOOL ? Need to schedule movement? New StorkhMehmet Balmans 34Virtual Meatchine Dishes 35Virtual Machine SandboxingWe have the Virtual Machine UniversehGreat for provisioninghNitin Narkhedes presenta

28、tion and now we are exploring different mechanisms to run a job inside a VM.BenefitshIsolate the job from execute host.hStage custom execution environments.hSandbox and control the job 36One way to do it via the Condor Job RouterHard work by Varghese MathewOrdinary Job

29、s & VM Universe Jobs.Job router transform a job into a new form.Job router hook picks them up, sets them up inside a VM job, and submits the VM job.On completion, job router hook extracts output from the VM and returns to original 37Different FlavorsScript Inside VMStart

30、er Inside VMPersonal Condor Inside VMVM joins the pool as an execute nodeAll different ways to bind a job to a specific virtual 38Speaking of VM UniverseAdding VM Universe Support forhVMWare Server 2.xhKVM Done via libvirt Future VM systems added to libvirt should be eas

31、y to add in the futurehVMWare ESX, ESXiThank you community for contributions!39“Lightweight Jobs” S40Fast, quick, light jobsOptions to put a Condor job on a dietDiet ideas:hLeave the luggage at home! No job file sandbox, everything in the job ad.hDont pa

32、y for strong semantic guarantees if you dont need em. Define expectations on entry, update, completion.Want to honor scheduling policy, 41Some small side dishesJulia, a spy who really knew her 42Non-blocking communication via threadshRefer to Dan

33、/Igors talkhEspecially all the security session roundtripshThe USCMS scalability testbed needs 70 collectors to support 20k dynamic machines; replaced with 1 collector w/ threading code. 70:1, baby!Configuration knob managementhThink about:config in firefoxhHard-coded configurations now possibleNest

34、ed 43Mmmm, tasty Condor W44Scheduling DessertPabst and Jack, a dessert favorite!45Back to Green ComputingThe startd has the ability to place a machine into a low power state. (Standby, Hibernate, Soft-Off, etc.).Machine ClassAd contains all information required for a client to wa

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶(hù)所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶(hù)上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶(hù)因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論