IBMPlatformLSF家族安裝和配置簡(jiǎn)介.V1.0_第1頁
IBMPlatformLSF家族安裝和配置簡(jiǎn)介.V1.0_第2頁
IBMPlatformLSF家族安裝和配置簡(jiǎn)介.V1.0_第3頁
IBMPlatformLSF家族安裝和配置簡(jiǎn)介.V1.0_第4頁
IBMPlatformLSF家族安裝和配置簡(jiǎn)介.V1.0_第5頁
已閱讀5頁,還剩38頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

之五兆芳芳創(chuàng)

作IBMPlatformLSF家族裝置和配置簡(jiǎn)介V1.0版馬雪潔36363636363636363636363611111111目錄1集群結(jié)構(gòu)1單純LSF情況(命令行提交)1LSF+PAC情況(WEB提交)1LSF+PM情況(PM提交)32LSF裝置和根本配置舉例333444442.1.7啟動(dòng)/停止LSF進(jìn)程(三種方法)555661010102.6.4設(shè)定Generallimits11121212121313133LSF命令行集成應(yīng)用示例13CFD++集成(spoolingfile)13CFD++裝置和許可證1314添力口CFD++jobstarter14添力口CFDAPPprofile15CFD++命令行提交腳本實(shí)例15GAUSSIAN集成方法(spoolingfile)151515Abaqus的腳本集成(bsub命令)15PlatformMPI作業(yè)1620ntelMPI作業(yè)202021264裝置PAC285使用PAC進(jìn)行應(yīng)用程序集成29305.2CFD++集成后界面和后臺(tái)腳本33356裝置LicenseScheduler366.2.2映射許可證feature:3737377罕有問題378使用manpage379售后技巧支持371集群結(jié)構(gòu)較大的集群都會(huì)設(shè)計(jì)單獨(dú)的登錄節(jié)點(diǎn),用戶只能ssh到登錄節(jié)點(diǎn),不克不及直接ssh到集群的任何主節(jié)點(diǎn)和計(jì)較節(jié)點(diǎn).同時(shí)配置用戶在計(jì)較節(jié)點(diǎn)之間的ssh互信,為了并行作業(yè)的運(yùn)行.登錄節(jié)點(diǎn)也裝置LSF,配置為L(zhǎng)SF靜態(tài)Client或MXJ值為0,也即不運(yùn)行作業(yè)的客戶端.集群的WEB節(jié)點(diǎn)與辦公拜訪局域網(wǎng)一個(gè)網(wǎng)段.如需使用浮動(dòng)client,主節(jié)點(diǎn)網(wǎng)卡需要單純LSF情況(命令行提交)口LSF主節(jié)點(diǎn)(可擴(kuò)展到3個(gè))作業(yè)提交腳本設(shè)計(jì)流程腳本bsubjobsDesktopLSFFloatClient口LSF主節(jié)點(diǎn)(可擴(kuò)展到3個(gè))作業(yè)提交腳本設(shè)計(jì)流程腳本bsubjobsDesktopLSFFloatClientDesktopLSFFloatClientDesktopLSFFloatClientDesktopLSFFloatClient作業(yè)提交腳本設(shè)計(jì)流程腳本bsubjobs登錄節(jié)點(diǎn)SFStaticClientri用戶隔離計(jì)算資源,??中腳本流程中的bsubjobs將作業(yè)散到

集群計(jì)算節(jié)點(diǎn)。LSF+PAC情況(WEB提交)用戶通過portal提交作業(yè):管理網(wǎng)絡(luò)Linux/1口4。0$集群計(jì)算網(wǎng)絡(luò)LSF+PM情況(PM提交)LSF主節(jié)點(diǎn)ProcessManagerServerManagerClientManagerClient登錄節(jié)點(diǎn)(WEBPORTAL)ManagerClientManagerClient2LSF裝置和根本配置舉例裝置前的準(zhǔn)備任務(wù)NISready;NFS/GPFSready;LSF裝置步調(diào)Useroottoinstall.GetNISandNFS/GPFSready.取得LSF和PAC裝置包lsf8.3_linux2.6glibc2.3x86_64.tar.Zlsf8.3_lsfinstall_linux_x86_64.tar.Zpac8.3_standard_linuxx64.tar.Z解壓縮Isfinstall裝置腳本文件Putthepackageunder/root/lsf2.2.3首先添加集群辦理員Isfadmin.LSF_TOP="/opt/lsf"(裝置目錄)LSF_ADMINS="lsfadmin"(先創(chuàng)建Isfadmin的用戶名)LSF_CLUSTER_NAME="platform"(集群名稱,任意指定)LSF_MASTER_LIST="s2s3"(LSF辦理節(jié)點(diǎn))LSF_ENTITLEMENT_FILE="/root/lsf/platform_hpc_std_entitlement.dat"(裝置源許可證的地址)LSF_TARDIR="/root/lsf/"(裝置源文件包的地址)執(zhí)行裝置配置開機(jī)自啟動(dòng)hostsetuprhostsetup測(cè)試裝置裝置目錄下的/conf目錄Addsourceprofile.lsfto/etc/profileLSF_RSH="ssh"啟動(dòng)/停止LSF進(jìn)程(三種方法)[root@S2conf]#lsfstartup/lsfstop或lsadminlimstatup/limshutdownlsadminresstartup/resshutdownbadminhstartup/hshutdown或lsf_daemonsstart/stop[root@S2conf]#lsidIBMPlatformLSFExpress8.3forIBMPlatformHPC,May10CopyrightPlatformComputingInc.,anIBMCompany,1992.USGovernmentUsersRestrictedRightsUse,duplicationordisclosurerestrictedbyGSAADPScheduleContractwithIBMCorp.MyclusternameisplatformMymasternameiss2Youhavenewmailin/var/spool/mail/root[root@S2conf]#lsloadHOST_NAMEstatusr15sr1mr15mutpglsittmpswpmems2ok0.00.00.01%0.010151G20G61Gs4ok0.00.00.02%0.012183G20G62Gs6okHOST_NAMEstatusr15sr1mr15mutpglsittmpswpmems2ok0.00.00.01%0.010151G20G61Gs4ok0.00.00.02%0.012183G20G62Gs6ok0.00.00.03%0.0123734M2G30Gs5ok0.00.00.05%0.0123468M2G30G測(cè)試提交作業(yè)bsubsleep100000使能root提交作業(yè)enableroottosubmitjob:LSF_ROOT_REX=local重啟LSF進(jìn)程.修改配置文件后reconfig修改lsf.*配置文件后Isadminreconfig修改lsb.*配置文件后badminreconfig部分參數(shù)需要重啟LSF主調(diào)度或其他進(jìn)程:badminmbdrestart;Isadminlimrestart;Isadminresrestart;badminhrestart日志和debugFindthelogsunderlogdirectory.LSFwillrunmainly3processesoneachnode,onmasternodewillhave2more.Master:lim,res,sbatchd,mbatchd,mbschedCompute:lim,res,sbatchdTurnondebugincommandline:Runlim2directlyonnodetocheckwhylimnotstartup.配置文件說明目錄/etc/init.d:目錄/apps/platform/8.3/lsf/conf:lsf.conf lsf配置文件lsf.cluster.cluster83集群配置文件lsf.shared 同享資源定義文件./lsbatch/cluster83/configdir/lsb.*調(diào)度系統(tǒng)配置文件lsb.users lsf用戶與用戶組配置文件lsb.queues lsf隊(duì)列配置文件HOSTSHOSTS=hostGroupCPRIORITYPRIORITY=40Isb.modules Isf模塊配置文件經(jīng)常使用命令bsub:提交作業(yè);bjobs:查抄作業(yè)信息;bhist:查抄作業(yè)歷史;Ishosts:查抄節(jié)點(diǎn)靜態(tài)資源;bhosts,Isload:查抄節(jié)點(diǎn)狀態(tài)和資源信息;bqueues:查抄隊(duì)列配置;blimits:查抄限制limit信息;lsid:集群版本和主節(jié)點(diǎn);bmod:修改bsuboption;等等.基于資源的調(diào)度戰(zhàn)略bsub-R"((type==LINUX2.4&&rim<2.0)||(type==AIX&&rim<1.0))”或在隊(duì)列Isb.queues或或.application文件定義:RES_REQ=select[((type==LINUX2.4&&rim<2.0)||(type==AIX&&rim<i.0))]bsub-R"select[type==any&&swap>=300&&mem>500]order[swap:mem]rusage[swap=300,mem=500]"jobibsub-Rrusage[mem=500:app_lic_v2=i||mem=400:app_lic_vi.5=i]"jobibsub-R"select[type==any&&swp>=300&&mem>500]order[mem]"jobi配置公道競(jìng)爭(zhēng)調(diào)度戰(zhàn)略添加輪循調(diào)度隊(duì)列Modifylsb.queues,addfollowingBeginQueueQUEUE_NAME=roundRobinFAIRSHARE=USER_SHARES[[default,1]]#USERS=userGroupADefineyourownusergroupEndQueueRunbadminreconfigtoenablethechange.Runbqueues-ltocheckthequeue’sconfigure添加條理公道競(jìng)爭(zhēng)戰(zhàn)略Addfollowingqueuetoaddhierarchicalsharepolicy:BeginQueueQUEUE_NAME=hierarchicalSharePRIORITY=40USERS=userGroupBuserGroupCFAIRSHARE=USER_SHARES[[userGroupB,7][userGroupC,3]]EndQueue多隊(duì)列公道競(jìng)爭(zhēng)戰(zhàn)略在Isb.queues中添加下列隊(duì)列,注意節(jié)點(diǎn)組和用戶組定義.BeginQueueQUEUE_NAME=verilogDESCRIPTION=masterqueuedefinitioncrossqueuePRIORITY=50FAIRSHARE=USER_SHARES[[user1,100][default,1]]FAIRSHARE_QUEUES=normalshortHOSTS=hostGroupC#resourcecontention#RES_REQ=rusage[verilog=1]EndQueueBeginQueueQUEUE_NAME=shortDESCRIPTION=shortjobsPRIORITY=70#highestRUNLIMIT=510EndQueueBeginQueueQUEUE_NAME=normalDESCRIPTION=defaultqueuePRIORITY=40#lowestHOSTS=hostGroupCEndQueue使能配置badminreconfig提交作業(yè),并查抄隊(duì)列的用戶動(dòng)態(tài)優(yōu)先級(jí)變更:bqueues-rlnormal配置搶占調(diào)度戰(zhàn)略配置最根本的slots搶占:BeginQueueQUEUE_NAME=shortPRIORITY=70HOSTS=hostGroupC#potentialconflictPREEMPTION=PREEMPTIVE[normal]EndQueueBeginQueueQUEUE_NAME=normalPRIORITY=40HOSTS=hostGroupC#potentialconflictPREEMPTION=PREEMPTABLE[short]EndQueue向兩個(gè)隊(duì)列提交作業(yè),查抄被preempt的作業(yè)的pending原因.配置全局限制戰(zhàn)略限制用戶運(yùn)行的作業(yè)數(shù)目在Isb.users文件中添加:BeginUserUSER_NAMEMAX_JOBSJL/PTOC\o"1-5"\h\zuser1 4user2 2 1user3 2groupA8groupB@1 1Default2EndUser限制節(jié)點(diǎn)運(yùn)行作業(yè)數(shù)目在Isb.hosts文件中:BeginHostHOST_NAMEMXJJL/Uhost1 4 2host2 2 1host3!EndHost限制隊(duì)列作業(yè)的運(yùn)行限制在Isb.queues中添加:BeginQueueQUEUE_NAME=myQueueHJOB_LIMIT=2PJOB_LIMIT=1UJOB_LIMIT=4USERS=userGroupAEndQueue設(shè)定Generallimits在Isb.resources文件定義全局generallimits示例:BeginLimitUSERSQUEUESHOSTSSLOTSMEMSWPuser1 hostB20%user2normalhostA20EndLimitBeginLimitNAME=limit1USERS=user1PER_HOST=hostAhostCTMP=30%SWP=50%MEM=10%EndLimitBeginLimitPER_USERQUEUESHOSTSSLOTSMEMSWPTMPJOBSgroupAhgroup1 2user2normal200short 200EndLimit使能配置badminreconfig配置提交控制腳本esub全局esub腳本在作業(yè)被提交是調(diào)用,可以被自動(dòng)的或顯式的調(diào)用從而控制用戶作業(yè)提交的行動(dòng).編輯ject文件在$LSF_SERVERDIR下面(chmod為可執(zhí)行):#!/bin/shif["_$LSB_SUB_PARM_FILE"!="_"];then.$LSB_SUB_PARM_FILEif["_$LSB_SUB_PROJECT_NAME"=="_"];thenecho"Youmustspecifyaproject!">&2exit$LSB_SUB_ABORT_VALUEfifiexit0在Isf.conf中定義LSB_ESUB_METHOD="project”配置資源辦理elim示例報(bào)告請(qǐng)示home目錄空閑大小編輯elim文件elim.home,放置在$LSF_SERVERDIR下面.chmod為可執(zhí)行.#!/bin/shwhiletrue;dohome='dfk/home|tail1|awk'{printf"%4.1f",$4/(1024*1024)}'、echo1home$homesleep30done報(bào)告請(qǐng)示root進(jìn)程數(shù)目編輯elim.root,放置在$LSF_SERVERDIR下面.chmod為可執(zhí)行.#!/bin/shwhiletrue;doroot='psef|grepvgrep|grepcAroot'echo1rootprocs$rootsleep30done報(bào)告請(qǐng)示應(yīng)用程序許可證數(shù)目#!/bin/shlic_X=0;num=0whiletrue;doonlywantthemastertogatherlic_Xif["$LSF_MASTER"="Y"];thenlic_X='lmstat-a-clic_X.dat|grep…'>&2fionlywanttraining8,training1togathersimptonlicensesif["'hostname'"="training8"\-o"'hostname'"="trainingl"];thennum='lmstat-a-csimpton_lic.dat|grep...'>&2fi#allhostsincludingmaster,willgatherthefollowingroot='ps-efw|grep-vgrep|grep-croot'>>1&2tmp='df-k/var/tmp|grepvar|awk'{print$4/1024}''>&2if["$LSF_MASTER"="Y"];thenecho4lic_X$lic_Xsimpton$numrtprc$roottmp$tmpelseecho3simpton$numrtprc$roottmp$tmpfisleep60done測(cè)試elim腳本直接運(yùn)行./elim.root查抄elim輸出是否正確.添加資源定義和資源地圖在Isf.shared文件中添加rootprocs定義,并在Isf.clusterresourcesMap中添加資源和節(jié)點(diǎn)的映射關(guān)系.使能配置:Isadminreconfig;badminreconfig查抄資源數(shù)目lsload-l3LSF命令行集成應(yīng)用示例本節(jié)例舉幾個(gè)應(yīng)用的不合集成方法.使用spooling文件或bsub命令行都可以自由轉(zhuǎn)換.CFD++集成(spooling fi)eCFD++裝置和許可證裝置路徑:ln36204

許可證辦事器:ln36204啟動(dòng)許可證辦事器:[hpcadmin@mn3650jessi]$sshln36204確認(rèn)許可證辦事器是否正常運(yùn)行:集成許可證辦理elim添加elim辦法:(elim全集群只需運(yùn)行一個(gè),因此只在頭節(jié)點(diǎn)放置elim腳本便可)在頭節(jié)點(diǎn):cd$LSF_SERVERDIR添加如下文件:elim.lic:[root@mn3650jessi]#cd$LSF_SERVERDIR[root@mn3650etc]#pwd修!/b如下的配置文件:totallicences='/gpfs/software/cfdpp/mbin/lmutilImstatac添gp如下一WOtre/cfdpp/mbin/Metacomp.lic|grep"UsersofCFD++_SOLV_Ser"|/bin/cutd''f7'cW_iiketrUeumeric30Y(CFD++License)do在sed0icenms=7g段fs添加如下一行^pp/mbin/lmutillmstatacB/gpfs/sofuWeFe/pfdpp/mbin/Metacomp.lic|/bin/grep"UsersofCFD++_SOLV_Ser"|/bin/cutdfR3SOURCENAME LOCATIONcccf_tic=$((${tOaal]licences}${usedlicences}))echstidlcfd_lic[$ecad_ic}"■■■[roon@s!eep65getc]#lsadminreconfig;badminreconfigdone添加CFD++jobstarter如果使用駟解]曬36可不必添郝/o泓S集成方法使解-starter^MflobsUN=/g可執(zhí)行文件:e/cfdpp/hpmpi/bin/mpiruncase"$PRESSION"inSINGLE_PRESSION);;DOUBLE_PRESSION);;esacCMD="$*hostfile$LSB_DJOB_HOSTFILE$CFD_CMD"

添力口CFDAPPprofile添加如下配置:BeginApplicationNAME=cfdJOB_STARTER=/opt/lsf/jobstarter/cfd_starterRES_REQ="rusage[cfd_lic=1]"EndApplicationbadmninreconfig使得此文件生效,使用bapp-lcfd查抄是否成功:[root@mn3650bin]#bapplcfdAPPLICATIONNAME:cfdNodescriptionprovided.STATISTICS:NJOBSPENDRUNSSUSPUSUSPRSV12 12 0 0 0 0PARAMETERS:JOB_STARTER:/opt/lsf/jobstarter/cfd_starterRES_REQ:"rusage[cfd_lic=1]"CFD++命令行提交腳本實(shí)例#佛后席世<cfd.sh提交作業(yè).#BSUBn12#BSUBappcfd#BIUB:R/g睢纏tfe儂皆sian/cd許可證so砒可豳軌單個(gè)作業(yè)只能單機(jī)運(yùn)行.#!/bin/sh提交作業(yè):#BSUBqqchem#BSUBn4#BSUBR"span[hosts=1]3#BSUBcwd.s的腳本#!/bin/shJOBNAME='basename"$JOB".comexportg03root=/gpfs/software/GaussianexportGAUSS_SCRDIR=/tmp

exportABAQUS_CMD="/gpfs/software/Abaqus/Commands/abaqus"exportLM_LICENSE_FILE="/gpfs/software/Abaqus/License/abq612.lic"cpunumber,注意要與bsub命令行中n指定的cpu個(gè)數(shù)一致exportNCPU=16輸入文件作業(yè)名exportJOB_NAME=abaqus_job3${ABAQUS_CMD}job=$JOB_NAMEcpus=$NCPUinput=\"$INP_INPUT_FILE\"2)通過LSF提交輸入數(shù)據(jù)所在目錄,執(zhí)行bsub命令A(yù)mber作業(yè)(blaunch集成,可記賬)針對(duì)intelmpi,編寫mpdboot.lsf腳本.變成可執(zhí)行,放置在$LSF_SERVERDIR下面編寫提交作業(yè)腳本:[ymei@mnistest]$catnew.sh#!/bin/sh#BSUBqsmall#BSUBn128#BSUBJIMPI#BSUBx#exportPATH=/gpfs01/software/intel/impi/24/intel64/bin:$PATHexportI_MPI_DEVICE=ssm#exportI_MPI_FABRICS=shm:ofa#exportI_MPI_FAST_STARTUP=1#exportI_MPI_DEVICE=rdssmmpdallexit提交作業(yè):PlatformMP作業(yè)裝置PlatformMPI確認(rèn)用戶無密碼拜訪sshOK.裝置PlatformMPI到同享目錄下:shplatform_mpi00320r.x64.shinstalldir=/opt/pmpi-norpm如果缺失CCompiler,執(zhí)行:yuminstallgccLSF外面驗(yàn)證裝置OK設(shè)置情況變量:exportMPI_REMSH="sshx"exportMPI_ROOT=/opt/pmpi/opt/ibm/platform_mpi/編譯helloworld示例程序:[root@server3help]#/opt/pmpi/opt/ibm/platform_mpi/bin/mpirunf../help/hostswarning:MPI_ROOT/opt/pmpi/opt/ibm/platform_mpi/!=mpirunpath/opt/pmpi/opt/ibm/platform_mpiHelloworld!I'm1of4onserver3Helloworld!I'm0of4onserver3Helloworld!I'm3of4oncomputer007Helloworld!I'm2of4oncomputer007[root@server3help]#cat../help/hostshserver3np2/opt/pmpi/opt/ibm/platform_mpi/help/helloworldhcomputer007np2/opt/pmpi/opt/ibm/platform_mpi/help/helloworld通過LSF提交exportMPI_REMSH=blaunch$mpirunnp4IBV~/helloworld$mpirunnp32IBV~/helloworld$mpirunnp4TCP~/helloworld或[root@server3conf]#bsubo%J.oute.%J.errn4/opt/pmpi/opt/ibm/platform_mpi/bin/mpirunlsb_mcpu_hosts/opt/pmpi/opt/ibm/platform_mpi/help/helloworldJob<210>issubmittedtodefaultqueue<normal>.[root@server3conf]#bjobsJOBIDUSERSTATQUEUEFROM_HOSTEXEC_HOSTJOB_NAMESUBMIT_TIME210rootPENDnormalserver3*elloworldMay910:55[root@server3conf]#cat210.outSender:LSFSystem<jessi@computer007>Subject:Job210:</opt/pmpi/opt/ibm/platform_mpi/bin/mpirunlsb_mcpu_hosts/opt/pmpi/opt/ibm/platform_mpi/help/helloworld>incluster<jessi_cluster>DoneJob</opt/pmpi/opt/ibm/platform_mpi/bin/mpirunlsb_mcpu_hosts/opt/pmpi/opt/ibm/platform_mpi/help/helloworld>wassubmittedfromhost<server3>byuser<root>incluster<jessi_cluster>.Jobwasexecutedonhost(s)<4*computer007>,inqueue<normal>,asuser<root>incluster<jessi_cluster>.</root>wasusedasthehomedirectory.</opt/lsf/conf>wasusedastheworkingdirectory.StartedatThuMay918:49:06ResultsreportedatThuMay918:49:07Yourjoblookedlike:#LSBATCH:Userinput/opt/pmpi/opt/ibm/platform_mpi/bin/mpirunlsb_mcpu_hosts/opt/pmpi/opt/ibm/platform_mpi/help/helloworldSuccessfullycompleted.Resourceusagesummary:CPUtime: 0.23sec.MaxMemory: 2MBAverageMemory: 2.00MBTotalRequestedMemory:DeltaMemory:(Delta:thedifferencebetweentotalrequestedmemoryandactualmaxusage.)TOC\o"1-5"\h\zMaxSwap: 36MBMaxProcesses: 1MaxThreads: 1Theoutput(ifany)follows:Helloworld!I'm2of4oncomputer007Helloworld!I'm0of4oncomputer007Helloworld!I'm1of4oncomputer007Helloworld!I'm3of4oncomputer007PS:Readfile<.210.err>forstderroutputofthisjob.或更多參數(shù)$/opt/platform_mpi/bin/mpirunnp120ibvhostlist"cn2cn2cn2cn2cn2cn2cn2cn2cn2cn2"/data/hello_world如果希望MPI作業(yè)欠亨過LSF提走運(yùn)行,修改MPI_USELF情況變量為nOpenmpi作業(yè)下載openmpi軟件包./configureLIBS=ldlwithlsf=yesprefix=/usr/local/ompi/Openmpi1.3.2之上版本已經(jīng)于LSFblaunch緊密集成.提交openmpi作業(yè):Intel MP作業(yè)Express版本不記賬方法如果需要對(duì)作業(yè)記賬,需要使用blaunch的集成方法.exportPATH=/gpfs/software/intel/composerxe/bin/:/gpfs/software/intel/mpi_41_0_024/include:/gpfs/software/intel/mpi_41_0_024/bin64:/gpfs/software/intel/composerxe/mkl:$PATHsource/gpfs/software/intel/composerxe/bin/compilervars.shintel64source/gpfs/software/intel/composerxe/mkl/bin/mklvars.shintel64MPI測(cè)試程序#include"mpi.h"#include<stdio.h>#include<math.h>intmain(intargc,char**argv){intmyid,numprocs;intnamelen;charprocessor_name[MPI_MAX_PROCESSOR_NAME];MPI_Init(&argc,&argv);MPI_Comm_rank(MPI_COMM_WORLD,&myid);MPI_Comm_size(MPI_COMM_WORLD,&numprocs);MPI_Get_processor_name(processor_name,&namelen);fprintf(stderr,"HelloWorld!Process%dof%don%s\n",myid,numprocs,processor_name);MPI_Finalize();}命令執(zhí)行,TCP協(xié)議命令執(zhí)行,舊網(wǎng)絡(luò)命令執(zhí)行,Debug模式LSF提交腳本bsub_intelmpi_ib.sh#!/bin/sh#BSUBcwd.#BSUBR"span[ptile=4]"提交作業(yè):bsub<bsub_intelmpi_ib.shExpress版本blaunch記賬方法#!/usr/bin/envpython""IImpdbootforLSF[f|hostfilehostfile][i|ifhn=alternate_interface_hostname_of_ip_addressf|hostfilehostfile][h]"""importreimportstringimporttimeimportsysimportgetoptfromtimeimportctimefromos importenviron,pathfromsys importargv,exit,stdoutfrompopen2importPopen4fromsocketimportgethostname,gethostbynamedefmpdboot():#changemeMPI_ROOTDIR="/opt/intel/impi/25"#mpdCmd="%s/bin/mpd"%MPI_ROOTDIRmpdtraceCmd="%s/bin/mpdtrace"%MPI_ROOTDIRmpdtraceCmd2="%s/bin/mpdtracel"%MPI_ROOTDIRnHosts=1host=""ip=""localHost=""localIp=""found=FalseMAX_WAIT=5t1=0hostList=""hostTab={}cols=[]hostArr=[]hostfile=environ.get('LSB_DJOB_HOSTFILE')binDir=environ.get('LSF_BINDIR')ifenviron.get('LSB_MCPU_HOSTS')==None\orhostfile==None\orbinDir==None:print"notrunninginLSF"exit(1)rshCmd=binDir+"/blaunch"p=pile("\S+_\d+\s+\(\d+\.\d+\.\d+\.\d+")#try:opts,args=getopt.getopt(sys.argv[1:],"hf:i:",["help","hostfile=","ifhn="])exceptgetopt.GetoptError,err:printstr(err)usage()sys.exit(1)fileName=Noneifhn=Noneforo,ainopts:ifo=="v":version();sys.exit()elifoin("h","help"):usage()sys.exit()elifoin("f","hostfile"):fileName=aelifoin("i","ifhn"):ifhn=aelse:print"option%sunrecognized"%ousage()sys.exit(1)iffileName==None:ififhn!=None:print"ifhnrequiresahostfilecontaining'hostnameifhn=alternate_interface_hostname_of_ip_address'\n"sys.exit(1)useLSB_DJOB_HOSTFILEfileName=hostfilelocalHost=gethostname()localIp=gethostbyname(localHost)pifhn=pile("\w+\s+\ifhn=\d+\.\d+\.\d+\.\d+")#pifhn=pile("\S+\ifhn=\d+\.\d+\.\d+\.\d+")try:checkthehostfilemachinefile=open(fileName,"r")forlineinmachinefile:ifnotlineorline[0]=='#':continueline=re.split('#',line)[0]line=line.strip()ifnotline:continueifnotpifhn.match(line):#shouldnothaveifhnoptionififhn!=None:print"hostfile%snotvalidforifhn"%(fileName)print"hostfileshouldcontain'hostnameifhn=ip_address'"sys.exit(1)host=re.split(r'\s+',line)[0]ifcmp(localHost,host)==0\orcmp(localIp,gethostbyname(host))==0:continuehostTab[host]=Noneelse:#multipleblaunchescols=re.split(r'\s+\ifhn=',line)host=cols[0]ip=cols[1]ifcmp(localHost,host)==0\orcmp(localIp,gethostbyname(host))==0:continuehostTab[host]=ip#print"line:%s"%(line)machinefile.close()exceptIOError,err:printstr(err)exit(1)launchampdonlocalhostififhn!=None:#cmd=mpdCmd+"ifhn=%s"%(ifhn)cmd="%sn%s%sifhn=%s"%(rshCmd,localHost,mpdCmd,ifhn)else:#cmd=mpdCmdcmd="%sn%s%s"%(rshCmd,localHost,mpdCmd)print"Startinganmpdonlocalhost:",cmdPopen4(cmd,0)waittil5secondsatmaxwhilet1<MAX_WAIT:time.sleep(1)trace=Popen4(mpdtraceCmd2,0)hostname_portnumber(IPaddress)line=trace.fromchild.readline()ifnotp.match(line):t1+=1continuestrings=re.split('\s+',line)(basehost,baseport)=re.split('_',strings[0])#print"host:",basehost,"port:",baseportfound=Truehost=""breakifnotfound:print"Cannotstartmpdonlocalhost"sys.exit(1)else:print"Donestartinganmpdonlocalhost"launchmpdontherestofhostsforhost,ipinhostTab.items():nHosts+=1ifnHosts<2:sys.exit(0)print"Constructinganmpdring..."ififhn!=None:forhost,ipinhostTab.items():#print"host:%sifhn%s\n"%(host,ip)cmd="%s%s%sh%sp%sifhn=%s"%(rshCmd,host,mpdCmd,basehost,baseport,ip)#print"cmd:",cmdPopen4(cmd,0)else:forhost,ipinhostTab.items():#print"host:%sifhn%s\n"%(host,ip)hostArr.append(host+"")hostList=string.join(hostArr)print"hostList:%s"%(hostList)cmd="%sz\'%s\'%sh%sp%s"%(rshCmd,hostList,mpdCmd,basehost,baseport)print"cmd:",cmdPopen4(cmd,0)#waittillallmpdsarestartedMAX_TIMEOUT=300+0.1*(nHosts)t1=0started=Falsewhilet1<MAX_TIMEOUT:time.sleep(1)trace=Popen4(mpdtraceCmd,0)iflen(trace.fromchild.readlines())<nHosts:t1+=1continuestarted=Truebreakifnotstarted:print"Failedtoconstructanmpdring"exit(1)print"Doneconstructinganmpdringat",ctime()defusage():print__doc__ifname=='main':mpdboot()提交作業(yè)腳本S;p)o)oil[i[ngfilecpi.sh:#LSBATCH:Userinput#BSUBn2#BSUBPI210105G##BSUBW00:33#BSUBJIMPI#BSUBR'span[ptile=1]'#BSUBx#BSUBm"iquadcore01!rhel55"#BSUBappdjob#exportLSB_DEBUG_CMD="LC_TRACELC_EXECLC_HPC#exportLSB_CMD_LOG_MASK=LOG_DEBUG3exportPATH=/opt/intel/impi/25/bin:$PATH#./usr/share/modules/init/bash#modulepurgesetxmpiexecnp$LSB_DJOB_NUMPROC/tmp/cpi10000mpdallexit提交作業(yè)3.7.3Standard版本PAM集成方法[iquadcore01]186%env|grepMPI373.1依照HPC文檔配置intelmpi資源Addintelmpiresourcesinlsf.sharedfileandaddintelmpiresourceinlsf.clusterfileforeachhost.ExternalresourcesinIsf.shared:BeginResourceRESOURCE_NMETYPEINTERVALINCREASINGDESCRIPTION*?*intelmpi Boolean() () (IntelMP工)*?*EndResourcesYoushouldaddtheintelmpi「e§OLi「cenameundertheRESOURCEScolumnoftheHostsectionOfIsf.cluster.cluster_na.me.Verifywithfollowingcommand:[iquadcore01]189%lshostsHOST_NAMEtypemodelcpufncpusmaxmemmaxswpserverRESOURCESsaspm01X86_64PC6000116.123008M3074MYes(intelmpimpich2mgopenmpi)iquadcore0X86_64Intel_EM60.087974M4094MYes(intelmpimg)(2)修改intelmpi_wrapper中裝置路徑[saspm01]189%sudovi'whichintelmpi_wrapper'DefinetopdirectoryforIntelMPIMPI_TOPDIR="/scratch/intel/impi/06"DefineMPIcommandsusedinthescriptMPIEXEC_CMD="$MPI_TOPDIR/bin64/mpiexec"MPDEXIT_CMD="$MPI_TOPDIR/bin64/mpdallexit"MPDBOOT_CMD="$MPI_TOPDIR/bin64/mpdboot"CheckIntelMPIversion.Mustbe1.0.2orhigher.checkMPIversion驗(yàn)證MPI在LSF外的可行性[iquadcore01]195%iquadcore01iquadcore01iquadcore01saspm01saspm01saspm01[iquadcore01]196%mpiexecmachinefilep.hostsn4./testHelloworld:rank0of4runningoniquadcore01Helloworld:rank1of4runningoniquadcore01Helloworld:rank2of4runningoniquadcore01Helloworld:rank3of4runningonsaspm01[iquadcore01]197%mpdtraceliquadcore01_42093(00)saspm01_36768(5)3733使用PAM方法提交LSF作業(yè)[iquadcore01]200%[iquadcore01]200%bsubIaintelmpin4m"iquadcore01saspm01!"mpirun.lsf./testJob<3814>issubmittedtoqueue<hpc_linux>.<<Waitingfordispatch...>><<Startingonsaspm01>>Helloworld:rank0of4runningonsaspm01Helloworld:rank1of4runningonsaspm01Helloworld:rank2of4runningoniquadcore01Helloworld:rank3of4runningoniquadcore01TIDHOST_NAMECOMMAND_LINESTATUSTERMINATION_TIME00000iquadcore./testDone03/16/20:00:4900001iquadcore./testDone03/16/20:00:4900002saspm01./testDone03/16/20:00:3900003saspm01./testDone03/16/20:00:39[iquadcore01]201%Youcanseethereisno"np4"after"bsubn4mpirun.lsf"3734Debug辦法提交命令后添加passDpass3-Tsdebug:bsubIaintelmpin4mpirun.lsf./testpassDpass3TSdebug4裝置PAC查抄裝置文件,如pac8.3_standard_linuxx64.tar.Z,許可證在裝置包中自帶,位于NFS同享目錄/apps/platform/8.3/pac下.解壓縮pac8.3_standard_linuxx64.tar.Z,修改pacexportPAC_TOP="/apps/platform/8.3/pacexportMYSQL_JDBC_DRIVER_JAR="/usr/share/java/mysqlconnectorjava5.1.12.jar"裝置mysql,并確認(rèn)mysql辦事啟動(dòng)正常.(yuminstallmysql*y)裝置d[6口t和server端,servicemysqldstatus/start/stop(不必執(zhí)行)修改/opt/lsf/conf/lsbatch/clusterl/configdir/lsb.params參加ENABLE_EVENT_STREAM=ybadminreconfig6)運(yùn)行pacinstall.sh進(jìn)行裝置(運(yùn)行之前確認(rèn)sourceYLSF的情況變量)7) Source換情況變量:(將上面命令添加至U/etc/profile文件結(jié)尾,登陸自動(dòng)source情況)8)使用下面命令啟動(dòng)portal:pmcadminstartperfadminstartall9)使用下面命令查抄否正常啟動(dòng):#pmcadminlist#perfadminlist10)使用下面地址拜訪portal:http://hostipaddress:808011)使用辦理員或用戶身份登錄(NIS用戶)12)配置VNC辦法,請(qǐng)參考PAC辦理員文檔.5使用PAC進(jìn)行應(yīng)用程序集成PAC集成的概念:配置和設(shè)計(jì)XML提交頁面,在對(duì)應(yīng)的腳本文件中處理XML文件中傳遞的情況變量.最終生成提交作業(yè)的邏輯(/opt/pac/gui/conf/application/published/app.cmd文件的最后):JOB_RESULT='/bin/shc"bsubq$SUB)QUEUES$JOB_NAME_OPT$CWD_OPT${PROJECT_NAME_OPT}${CWD_DIR}${QUEUE_OPT}$NCPU_OPT$LSF_RESREQ$RUNHOST_OPT$APP_PARAMS$EXTRA_PARAMS$OUTPUT_OPT$NASTRAN_CMD$INPUT_OPT$MEMORYARCH_OPT$NASTRAN_PARAMS${NASTRAN_OPTIONS}${MPI_OPTIONS}2>&1"'5.1Gaussian界面集成進(jìn)程使用Isfadmin登錄進(jìn)入http://hostipaddress:8080/platform/IBMPlatformHPC3.2DashboardDevicesLlnrri^risgEcIDevitESLicsnses-HostProvisioninfgApplies[ionTamplatesRBsourceReports*R&5.0UteeAlertsApplicationTemplatesApplicationCFDpublish尸TemplateNameType匚?ASPCustom□-TAR-3"Built-inIZNWCHEMBuilt-in□MATLAEBuilt-inLS-Ci¥MABuilt-in□HMMERBuilt-inrFLUENTBuilt-in□ECLIPSEBuilt-inClustwWVBuilt-in□CMGLSTARSBuilt-inC\1GLMEXBuilt-in口CMGLGEMBuilt-in選中某現(xiàn)有模板,點(diǎn)擊SaveAs為GAUSSIAN模板,進(jìn)入Modify頁面編輯GAUSSIAN模板.選中程序參數(shù)部分,點(diǎn)擊Add:GMJS呂嶼收TemplateName白MSSIAMTypeCuaceitiAppiieationGAUSSIAM,4AddI◎口dMi?.."Edit集等卷射T41mpiHuMamaGAUSSJANTypaCitdamApphuHicnGALJSSLANTypel「二Th%InputTa耳i:DateanriTmeharlinHLTtr-onMU更的InputT41mpiHuMamaGAUSSJANTypaCitdamApphuHicnGALJSSLANTypel「二Th%InputTa耳i:DateanriTmeharlinHLTtr-onMU更的Input「I山SWecd的craflun^DnopdawnItcfSuNexlCancel當(dāng)作出狀態(tài)超費(fèi)時(shí)皆知牝

[IDrJnbNalriicalioniJ^jAddL?De<e1e.>Edit強(qiáng)片匏榻文件mm輯入文件* [口」HPUT_FlL£_CtM 加HLocaiFilE|[AddS?rwrFileSubmitTgsiJobSaweAs|CloseSubmitTgsiJob選擇DropDownList,然后點(diǎn)擊Next,GAUSSiJUJSelectatypefortheheldSimpleBrowse0ArDascr/p^DnUwe1odlowUGErs1os&ledbsing旭riefirfromapn?definecilist設(shè)置如下的情況變量和下拉列表值,代表兩種gaussian版本.下拉頁面點(diǎn)擊OK,保管后產(chǎn)生如下:1號(hào)213U''二[卜工]耳1卻否;=iia-ip£M|Ejh|Wv .Iwh*y?Fp^口 一?丁 -DahvZI?iri-rHi4sii?xF?i e^hMHmJJ[Bmf,TWM-

#D9ijihhurclUt^cm£*nicnIk■'in-l>ni1aslEiWcrtrMid卓%>g*5*鼻部CygiMtfHqIMhiUHEi*I_HAHi—II -BmAiCUIJU3B1AHHm如的加,TriCmtiiM*00“GAvSS^H上3為Mm.胃死髭的iMdjbWe*^lifesfUf!yrruHsitpi'idiMK_[CtJtblkiikaiQi|>E電停■itMF5辱,.文**pDMVT_n_E_c?iAriwoaF*a皿習(xí)所中RhLImiItaafcwwiL^iQUIHMO-IQ-HiOHJ1*醇I1M|I?熬刈營皿必一碎即|>HmiIFvsKi+dH?kOh 」.■>,1IE,編輯者可刪除或隱藏?zé)o用的選擇信息.并可以配置默認(rèn)的下拉列表變量等信息.如下圖所示:

DefauItValueG03DefauItValueG03O□HideField4EnableD&penJencies

HfrlpT&kt?PleaseinputtheGS-VersionTemplateNameGAUSSIANTypeCustomApplicaiionGAUSSIANStihmissionFcirmSubmissionScriptStihmissionFcirmSubmissionScript它送譯Gnu的它送譯Gnu的3rl版互O[inG3_VrR3IC?N;Cf^Addl-^lDeleteI箴Edit集群域融…一3AcMIdlO-^teIbErir,觀南京金…:午」三即[ID-JOB_rJAMF;作作業(yè)狀態(tài)或費(fèi)時(shí)退知去[IDJubHoiifkatimi]0臼[ms@mn3650~]$cat/usr/share/pmc/gui/conf/application/published/GAUSSIAN/GAUSSIAN.cmd#!/bin/sh#numberoftasksperhostSPAN="span[h0sts曲后臺(tái)腳本,在界面中點(diǎn)擊SubmissionScript或直接修改文件為如下:#LSF_RESREQ="select[type==any]"LANG=C#SourceCOMMONfunctons作作業(yè)狀態(tài)或費(fèi)時(shí)退知去[IDJubHoiifkatimi]0臼[ms@mn3650~]$cat/usr/share/pmc/gui/conf/application/published/GAUSSIAN/GAUSSIAN.cmd#!/bin/sh#numberoftasksperhostSPAN="span[h0sts曲后臺(tái)腳本,在界面中點(diǎn)擊SubmissionScript或直接修改文件為如下:#LSF_RESREQ="select[type==any]"LANG=C#SourceCOMMONfunctons口_.${GUI_CONFDIRflpp完后沒沒提/交測(cè)試MO業(yè)進(jìn)行測(cè)試.點(diǎn)擊AddServer或AddLocalFile添加.com文件.#checkBSUBparametersandcreatefinalbsuboptions點(diǎn)擊SubmitTestJob,運(yùn)行作業(yè).并查抄TestJob運(yùn)行狀況.由于Gaussian設(shè)置的執(zhí)行權(quán)限,if["x$JOB_NAME不克不及執(zhí)行.請(qǐng)使用gaussian用戶組用戶執(zhí)

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論