分析教案云計(jì)算using containers and hpc to solve the mysteries of universe_第1頁
分析教案云計(jì)算using containers and hpc to solve the mysteries of universe_第2頁
分析教案云計(jì)算using containers and hpc to solve the mysteries of universe_第3頁
分析教案云計(jì)算using containers and hpc to solve the mysteries of universe_第4頁
分析教案云計(jì)算using containers and hpc to solve the mysteries of universe_第5頁
已閱讀5頁,還剩29頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、Open Forum: Open ScienceDebbie BardUsing containers and supercomputers to solve the mysteries of the UniverseShifter: containers for HPCWhats a supercomputer?Containers for supercomputingAgendaAwesome scienceThe nature of the UniverseDeveloping new technologiesContainerizing open scienceReproducible

2、 scienceShifterContainerizing SupercomputersSupercomputing for Open ScienceMost widely used computing center in DoE Office of Science6000+ users, 750+ codes, 2000+ papers/yearBiology, Energy, EnvironmentComputingMaterials, Chemistry, GeophysicsParticle Physics, CosmologyNuclear PhysicsFusion Energy,

3、 Plasma PhysicsNERSC Cori cabinetNERSC Mendel Cluster cabinetIts all about the connectionsWhats a supercomputer?Edison Cray XC302.5PF357TB RAM5000 nodes, 130k coresCori Cray XC40Data-intensive (32-core Haswells, 128GB) partitionCompute-intensive (68-core KNLs, 90GB) partition10k nodes, 700k coresEdi

4、son Cray XC302.5PF357TB RAM5000 nodes, 130k coresCori Cray XC40Data-intensive (32-core Haswells, 128GB) partitionCompute-intensive (68-core KNLs, 90GB) partition10k nodes, 700k cores10PB project file system (GPFS)38PB scratch file system (Lustre)1.5PB Burst Buffer (flash)Supercomputing file systemsS

5、cale out FS 100s of OSSs. Access FS over high-speed interconnectHigh aggregate BW, but works best for large IO/transfer sizesGlobal, coherent namespace Easy for scientists to useHard to scale up metadataNot your grandmothers FSCompute NodesIO NodesStorage ServersHow do you distribute PBs of files an

6、d data to hundreds of thousands of compute cores, with no latency? Cori: 1000 jobs running simultaneously on (1600*32) coresEverything from 1000+ node jobs to single-core jobsTime-insensitive simulationsReal-time experimental data analysisComplex scheduling problem!Who uses a supercomputer?Job size

7、on Cori (# cores)The traditional idea of supercomputer usage is a gigantic, whole-machine simulation that runs for days/weeks and produces a huge dataset, or a single number for example, a 20,000-year climate simulation or a calculation of the structure of an atom. The reality is much more diverse/u

8、nruly. Supercomputing issuesScreamingly fast interconnect, no local disk, and custom compute environment designed to accelerate parallel apps but not everything can adapt easily to this environment. PortabilityCustom Cray SUSE Linux-based environment hard to use standard Linux-based code/libsScienti

9、sts often run at multiple sites wherever they can get the cyclesLHC Grid ComputingOur users want to run complex software stacks on multiple platformsSupercomputing issuesScreamingly fast interconnect, no local disk, and custom compute environment designed to accelerate parallel apps but not everythi

10、ng can adapt easily to this environment. PortabilityScalabilitySlow start-up time for shared libs (i.e. python code)Distributed FS doesnt deal well with lots of small filesOur users want to run complex software stacks on multiple platformsSupercomputing issuesScreamingly fast interconnect and custom

11、 compute environment designed to accelerate parallel apps but not everything can adapt easily to this environment. PortabilityScalabilitySlow start-up time for shared libs (i.e. python code)Distributed FS doesnt deal well with lots of small filesOur users want to run complex software stacks on multi

12、ple platformsContainers for HPC! Why not simply use Docker? Underlying custom OS Highly-optimized interconnect Security issues: if you can start a Docker container, you can start it as root map in other volumes with root access! Shifter enables the collaborative nature of Docker for science and larg

13、e-scale systemsEnable Docker functionality and direct compatibility, but customizing for the needs of HPC systemsShifter directly imports Docker imagesContainers on supercomputersWhy not simply use Docker? Underlying custom OS Highly-optimized interconnect Security issues: if you can start a Docker

14、container, you can start it as root map in other volumes with root access! Shifter uses loop mount of image file moves metadata operations (like file lookup) to the compute node, rather than relying on central metadata servers of parallel file system. Gives much faster shared library performanceHigh

15、 performance at huge scaleContainers on supercomputersWhy not simply use Docker? Underlying custom OS Highly-optimized interconnect Security issues: if you can start a Docker container, you can start it as root map in other volumes with root access! Shifter uses loop mount of image file moves metada

16、ta operations (like file lookup) to the compute node, rather than relying on central metadata servers of parallel file system. Gives much faster shared library performanceHigh performance at huge scaleAwesome ScienceContainerizing the UnvierseDark Energy SurveyWhat is the Universe made of?How and wh

17、y is it expanding? Astronomy Data AnalysisDark Energy SurveyWhat is the Universe made of?How and why is it expanding? Astronomy Data ProcessingLight from some of these galaxies was emitted 13 billion years agoDark Energy SurveyAstronomy Data AnalysisMeasuring the expansion history of the universe to

18、 understand the nature of Dark Energy. Data analysis code: identify objects (stars, galaxies, quasars, asteroids etc) in images, calibrate, measure their properties. Why Containers? Complicated software stack runs on laptops to supercomputersPython-based code; lots of importsLHC ATLAS computing stac

19、kWhat is the Universe made of?Why does anything have mass? A billion proton-proton collisions per second and multi-GB of data per second. CVMFS: 3.5TB, 50M inodesSpectacularly complex software stack required to analyse data from particle collisionsWhy Containers? Un-tar stack on compute node is not

20、efficient, doesnt scale (30min/job)Dedupe files, squashfs image: 315GBScales up to thousands of nodesLHC ATLAS computing stack# CoresAverage start-up time2432s24011s240015s2400024sHow does photosynthesis happen?How do drugs dock with proteins in our cells?Why do jet engines fail?LCLSLinac Coherent L

21、ight SourceSuepr-intense femtosecond x-ray pulses The Superfacility ConceptScientists using the LCLS at SLAC need real-time feedback on their running experiments take advantage of NERSC supercomputersWhy Containers?Complex python-based analysis environment LCLS-driven Workflow : Data and analysis co

22、de coming in from outside NERSC security concern LCLSContainerizing Open SciencePost-experiment data analysisEveryone agrees this is essential (federally mandated!), but noone knows how to do it properly/coherentlyAlgorithms: need to run scripts that produced the resultsEnvironment: need to replicat

23、e the OS, software libraries, compiler versionData: large volumes, databases, calibration data, metadataScientific Reproducibility /ostp_public_access_memo_2013.pdf Containers foreverInce, Hatton & Graham-Cumming, Nature 482, 485 (2012)Scientific communication relies on evidence that cannot be e

24、ntirely included in publications, but the rise of computational science has added a new layer of inaccessibility. Although it is now accepted that data should be made available on request, the current regulations regarding the availability of software are inconsistent. We argue that, with some excep

25、tions, anything less than the release of source programs is intolerable for results that depend on computation. The vagaries of hardware, software and natural language will always ensure that exact reproducibility remains uncertain, but withholding code increases the chances that efforts to reproduc

26、e results will fail.Containers offer the possibility of encapsulating analysis code and compute environment to ensure reproducibility of algorithms and environment. Enable reproduction of results on any compute systemContainers forever?In case you cant think of anything to talk aboutMake this publis

27、hable: DOI for DockerHub images, as for github repos. Link github/Docker repos? How to link data to containers?How to maintain containers over the long term?Long-term data access efforts in many areas of science thinking 20 years ahead. Are containers viable in this timeframe? Discussion PointsBacku

28、p SlidesShifter!=Docker User runs as the user in the container not root Image modified at container construction time: Modifies /etc, /var, /opt replaces /etc/passwd, /etc/group other files for site/security needs adds /var/hostsfile to identify other nodes in the calculation (like $PBS_NODEFILE) In

29、jects some support software in /opt/udiImage Adds mount points for parallel filesystems Your homedir can stay the same inside and outside of the container Site configurable Image readonly on the Computational Platform to modify your image, push an update using Docker Shifter only uses mount namespaces, not network or process namespaces Allows your ap

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論