Building-Chemical-Informatics-Grid-Indiana-University建筑化學(xué)信息學(xué)網(wǎng)格-印第安那大學(xué)教學(xué)課件_第1頁
Building-Chemical-Informatics-Grid-Indiana-University建筑化學(xué)信息學(xué)網(wǎng)格-印第安那大學(xué)教學(xué)課件_第2頁
Building-Chemical-Informatics-Grid-Indiana-University建筑化學(xué)信息學(xué)網(wǎng)格-印第安那大學(xué)教學(xué)課件_第3頁
Building-Chemical-Informatics-Grid-Indiana-University建筑化學(xué)信息學(xué)網(wǎng)格-印第安那大學(xué)教學(xué)課件_第4頁
Building-Chemical-Informatics-Grid-Indiana-University建筑化學(xué)信息學(xué)網(wǎng)格-印第安那大學(xué)教學(xué)課件_第5頁
已閱讀5頁,還剩120頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

AcknowledgmentsCICCresearchersanddeveloperswhocontributedtothispresentation:Prof.GeoffreyFox,Prof.DavidWild,Prof.MookieBaik,Prof.GaryWiggins,Dr.JungkeeKim,Dr.RajarshiGuha,SimaPatel,SmithaAjay,XiaoDongThanksalsotoProf.PeterMurrayRustandtheWWMMgroupatCambridgeUniversityMoreinfo:and/wiki.ChemicalInformaticsandtheGridAnoverviewofthebasicproblemandsolutionChemicalInformaticsasaGridApplicationChemicalInformaticsistheapplicationofinformationtechnologytoproblemsinchemistry.Exampleproblems:managingdatainlargescaledrugdiscoveryandmolecularmodelingBuildingBlocks:ChemicalInformaticsResources:ChemicaldatabasesmaintainedbyvariousgroupsNIHPubChem,NIHDTPApplicationcodes(bothcommercialandopensource)Datamining,clusteringQuantumchemistryandmolecularmodelingVisualizationtoolsWebresources:journalarticles,etc.AChemicalInformaticsGridwillneedtointegratetheseintoacommon,looselycoupled,distributedcomputingenvironment.Problem:ConnectingItTogetherTheproblemisdefininganarchitecturefortyingallofthesepiecesintoadistributedcomputingsystem.A“Grid”HowcanIcombineapplicationcodes,webresources,anddatabasestosolveaparticularproblemthatinterestsme?Specifically,howdoIbuildaruntimeenvironmentthatcanconnectthedistributedservicesIneedtosolveaninterestingproblem?Foracademicandgovernmentresearchers,howcanIdoallofthisinanopenfashion?DataandservicescancomefromanywhereThatis,Imustavoidproprietaryinfrastructure.NIHRoadmapforMedicalResearch

/TheNIHrecognizeschemicalandbiologicalinformationmanagementascriticaltomedicalresearch.Federallyfundedhighthroughputscreeningcenters.100-200HTSassaysperyearonsmallmolecules.100,000’sofsmallmoleculesanalyzedDatapublished,publiclyavailablethroughNIHPubChemonlinedatabase.Whatdoyoudowithallofthisdata?High-ThroughputScreeningTestingperhapsmillionsofcompoundsinacorporatecollectiontoseeifanyshowactivityagainstacertaindiseaseproteinHigh-ThroughputScreeningTraditionally,smallnumbersofcompoundsweretestedforaparticularprojectortherapeuticareaAbout10yearsago,technologydevelopedthatenabledlargenumbersofcompoundstobeassayedquicklyHigh-throughputscreeningcannowtest100,000compoundsadayforactivityagainstaproteintargetMaybetensofthousandsofthesecompoundswillshowsomeactivityfortheproteinThechemistneedstointelligentlyselectthe2-3classesofcompoundsthatshowthemostpromiseforbeingdrugstofollow-upInformaticsImplicationsNeedtobeabletostorechemicalstructureandbiologicaldataformillionsofdatapointsComputationalrepresentationof2DstructureNeedtobeabletoorganizethousandsofactivecompoundsintomeaningfulgroupsGroupsimilarstructurestogetherandrelatetoactivityNeedtolearnasmuchinformationaspossible

(datamining)ApplystatisticalmethodstothestructuresandrelatedinformationNeedtousemolecularmodelingtogaindirectchemicalinsightintoreactions.TheSolution,PartI:WebServicesWebServicesprovidethemeansforwrappingdatabases,applications,webscavengers,etc,withprogramminginterfaces.WSDLdefinitionsdefinehowtowriteclientstotalkwithdatabases,applications,etc.WebServicemessagingthroughSOAPDiscoveryservicessuchasUDDI,MDS,andsoon.ManytoolkitsavailableAxis,.NET,gSOAP,SOAP::Lite,etc.WebServicescanbecombinedwitheachotherintoworkflowsWorkflow==usecasescenarioMoreaboutthislater.BasicArchitectures:Servlets/CGIandWebServicesBrowserWebServerHTTPGET/POSTDBJDBCWebServerDBJDBCBrowserWebServerSOAPGUIClientSOAPWSDLWSDLWSDLWSDLSolutionPartII:GridResourcesManyGridtoolsprovidepowerfulbackendservicesGlobus:uniform,secureaccesstocomputingresources(likeTeraGrid)Filemanagement,resourceallocationmanagement,etc.Condor:jobschedulingoncomputerclustersandcollectionsSRB:datagridaccessOGSA-DAI:uniformGridinterfacetodatabases.ThesehaveWebServiceaswellasotherinterfaces(orequivalently,protocols).Solution,PartIII:DomainSpecificToolsandStandards-->MoreServicesForChemicalInformatics,wehaveanumberoftoolsandstandards.ChemicalstringrepresentationsSMILES,InChIChemistryMarkupLanguageXMLlanguagefordescribing,exchangingdata.JUMBO5:aCMLparserandlibraryGlueToolsandApplicationsChemistryDevelopmentKit(CDK)OpenBabelThesearethebasisforbuildinginteroperableChemicalInformaticsWebServicesAnalogoussituationsexistforotherdomainsAstronomy,Geosciences,Biology/BioinformaticsSolutionPartIV:WorkflowsWorkflowenginesallowyoutoconnectservicestogetherintointerestingcompositeapplications.Thisallowsyoutodirectlyencodeyourscientificusecasescenarioasagraphofinteractingservices.TherearemanyworkflowtoolsWe’llbrieflycovertheselater.Generalguidanceistobuildwebservicesfirstandthenuseworkflowtoolsontopoftheseservices.Don’tgetmarriedtoaparticularworkflowtechnologyyet,unlesssomeonepaysyou.SolutionPartV:UserInterfacesWebServicesallowyoutocleanlyseparateuserinterfacesfrombackendservices.Model-view-controllerpatternforwebapplicationsClientenvironmentsincludeGridandwebservicescriptingenvironmentsDesktoptoolslikeTavernaandKeplerPortlet-basedWebportalsystemsTypically,desktoptoolslikeTavernaareusedbypoweruserstodefineinterestingworkflows.Portalsareforrunningcannedworkflows.NextstepsNextwewillreviewtheonlinedatabaseresourcesthatareavailabletous.DatabasescomeintwovarietiesJournaldatabasesDatadatabasesAswewilldiscuss,itisusefultobuildservicesandworkflowsforautomaticallyinteractingwithbothtypes.OnlineChemicalJournalandDataResourcesMEDLINE:OnlineJournalDatabaseMEDLINE(MedicalLiteratureAnalysisandRetrievalSystemOnline)isaninternationalliteraturedatabaseoflifesciencesandbiomedicalinformation.Itcoversthefieldsofmedicine,nursing,dentistry,veterinarymedicine,andhealthcare.MEDLINEcoversmuchoftheliteratureinbiologyandbiochemistry,andfieldswithnodirectmedicalconnection,suchasmolecularevolution.ItisaccessedviaPubMed./wiki/MedlinePubMed:JournalSearchEnginePubMedisafreesearchengineofferedbytheUnitedStatesNationalLibraryofMedicineaspartoftheEntrez

informationretrievalsystem.ThePubMedserviceallowssearchingtheMEDLINEdatabase.MEDLINEcoversover4,800journalspublishedintheUnitedStatesandmorethan70othercountriesprimarilyfrom1966tothepresent.InadditiontoMEDLINE,PubMedalsooffersaccessto:OLDMEDLINEforpre-1966citations.Citationstoarticlesthatareout-of-scope(e.g.,generalscienceandchemistry)fromcertainMEDLINEjournalsIn-processcitationswhichprovidearecordforanarticlebeforeitisindexedwithMeSHandaddedtoMEDLINECitationsthatprecedethedatethatajournalwasselectedforMEDLINEindexingSomelifesciencejournals/entrez/query/static/overview.htmlPubChem:ChemicalDatabasePubChemisadatabaseofchemical

molecules.ThesystemismaintainedbytheNationalCenterforBiotechnologyInformation(NCBI)whichbelongstotheUnitedStatesNationalInstitutesofHealth(NIH).PubChemcanbeaccessedforfreethroughawebuserinterface.AndWebServicesforprogrammaticaccessPubChemcontainsmostlysmallmoleculeswithamolecularmassbelow500.AnyonecancontributeThedatabaseisfreetouse,butitisnotcurated,sovalueofaspecificcompoundinformationcouldbequestionable.NIHfundedHTSresultsare(intendedtobe)availablethroughpubchem./NIHDTPDatabasePartofNIH’sDevelopmentalTherapeuticsProgram.Screensupto3,000compoundsperyearforpotentialanticanceractivity.Utilizes59differenthumantumorcelllines,representingleukemia,melanomaandcancersofthelung,colon,brain,ovary,breast,prostate,andkidney.DTPscreeningresultsarepartofPubChemandalsoavailableasaseparatedatabase./Examplescreeningresults.Positiveresults(redbartorightofverticalline)indicatesgreaterthanaveragetoxicityofcelllinetotestedagent./docs/compare/compare.htmlDTPandCOMPARECOMPAREisanalgorithmforminingDTPresultdatatofindandrankordercompoundswithsimilarDTPscreeningresults.WhyCOMPARE?Discoveredcompoundsmaybelesstoxictohumansbutjustaseffectiveagainstcancercelllines.Maybemucheasier/safertomanufacture.Maybeaguidetodeeperunderstandingofexperiments/docs/compare/compare_methodology.htmlManyOtherOnlineDatabasesComplementaryproteininformationIndianaUniversity:VarunaprojectDiscussedinthispresentationUniversityofMichigan:BindingMOAD“MotherofAllDatabases”Largestcurateddatabaseofprotein-ligandcomplexesSubsetofproteindatabankProf.HeatherCarlsonUniversityofMichigan:PDBBindProvidesacollectionofexperimentallymeasuredbindingaffinitydata(Kd,Ki,andIC50)exclusivelyfortheprotein-ligandcomplexesavailableintheProteinDataBank(PDB)Dr.ShaomengWangThePointIs…Allofthesedatabasescanbeaccessedonlinewithhuman-usableinterfaces.Butthat’snotsoimportantforourpurposesMoreimportantly,manyofthemarebeginningtodefineWebServiceinterfacesthatletotherprogramsinteractwiththem.Plentyoftoolsandlibrariescansimulatebrowsers,soyoucanalsobuildyourownservice.Thisallowsustoremotelyanalyzedatabaseswithclusteringandotherapplicationswithoutmodifyingthedatabasesthemselves.Canbecombinedwithtextminingtoolsandwebrobotstofindoutwhoelseisworkinginthearea.EncodingchemistryChemicalMachineLanguagesInterestingly,chemistryhasdefinedthreesimplelanguagesforencodingchemicalinformation.InChI,SMILES,CMLCangeneratethesebyhandorautomaticallyInChIsandSMILEScanrepresentmoleculesasasinglestring/characterarray.UsefulaskeysfordatabasesandforsearchqueriesinGoogle.YoucanconvertbetweenSMILESandInChIsOpenBabel,OELib,JOELibCMLisanXMLformat,andmoreverbose,butbenefitsfromXMLcommunitytoolsSMILES:SimplifiedMolecularInputLineEntrySpecificationLanguagefordescribingthestructureofchemicalmoleculesusingASCIIstrings.

daylight/dayhtml/doc/theory/theory.smiles.htmlInChI:InternationalChemicalIdentifierIUPACandNISTStandardsimilartoSMILESEncodesstructuralinformationaboutcompoundsBasedonopenanstandardandalgorithms.wwmm.ch.cam.ac.uk/inchifaq/InChIinPublicChemistryDatabasesUSNationalInstituteofStandardsandTechnology(NIST)-150,000structuresNIH/NCBI/PubChemproject->3.2millionstructuresThomsonISI-2+millionstructuresUSNationalCancerInstitute(NCI)Database-23+millionstructuresUSEnvironmentalProtectionAgency(EPA)-DSSToXDatabase-1450structuresKyotoEncyclopaediaofGenesandGenomes(KEGG)database-9584structuresUniversityofCaliforniaatSanFranciscoZINC->3.3millionstructuresBRENDAenzymeinformationsystem(UniversityofCologne)-36,000structuresChemicalEntitiesofBiologicalInterest(ChEBI)databaseoftheEuropeanBioinformaticsInstitute-5000structuresUniversityofCaliforniaCarcinogenicPotencyProject-1447structuresCompendiumofPesticideCommonNames-1437(2019-03-03)structuresJournalsandSoftwareUsingInChI

JournalsNatureChemicalBiology.BeilsteinJournalofOrganicChemistrySoftwareACD/LabsACD/ChemSketch.ChemAxonMarvin.SciTegicPipelinePilot.CACTVSChemoinformaticsToolkitbyXemistry,GmbH.wwmm.ch.cam.ac.uk/inchifaq/ChemistryMarkupLanguageCMLisanXMLmarkuplanguageforencodingchemicalinformation.DevelopedbyPeterMurrayRust,HenryRzepaandothers.ActuallydatesfromtheSGMLdaysbeforeXMLMoreverbosethanInChIandSMILESButinheritsXMLschema,namespaces,parsers,XPATH,languagebindingtoolslikeXMLBeans,etc.NotlimitedtostructuralinformationHasOpenBabelsupport.cml.sourceforge/,cml.sourceforge/wiki/index.php/Main_PageInChIComparedtoSMILESSMILESisproprietaryanddifferentalgorithmscangivedifferentresults.SevendifferentuniqueSMILESforcaffeineonWebsites:[c]1([n+]([CH3])[c]([c]2([c]([n+]1[CH3])[n][cH][n+]2[CH3]))[O-])[O-]CN1C(=O)N(C)C(=O)C(N(C)C=N2)=C12Cn1cnc2n(C)c(=O)n(C)c(=O)c12Cn1cnc2c1c(=O)n(C)c(=O)n2CN1(C)C(=O)N(C)C2=C(C1=O)N(C)C=N2O=C1C2=C(N=CN2C)N(C(=O)N1C)CCN1C=NC2=C1C(=O)N(C)C(=O)N2COntheotherhand,someclaimSMILESaremoreintuitiveforhumanreaders.wwmm.ch.cam.ac.uk/inchifaq/ACMLExamplemedicalcomputing/xml_biosciences.htmlClusteringTechniques,ComputingRequirements,andClusteringServicesComputationaltechniquesfororganizingdataTheStorySoFarWe’vediscussedmanagingscreeningassayoutputasthekeyproblemwefaceMustsiftthroughmountainsofdatainPubChemandDTPtofindinterestingcompounds.NIHfundedHighThroughputScreeningwillmakethisveryimportantinthenearfuture.Neednowawaytoorganizeandanalyzethedata.ClusteringandDataAnalysisClusteringisatechniquethatcanbeappliedtolargedatasetstofindsimilaritiesPopulartechniqueinchemicalinformaticsDatasetsaresegmentedintogroups(clusters)inwhichmembersofthesameclusteraresimilartoeachother.Clusteringisdistinctfromclassification,Therearenopre-determinedcharacteristicsusedtodefinethemembershipofacluster,Althoughitemsinthesameclusterarelikelytohavemanycharacteristicsincommon.Clusteringcanbeappliedtochemicalstructures,forexample,inthescreeningofcombinatorialorMarkushcompoundlibrariesinthequestfornewactivepharmaceuticals.WealsonotethatthesetechniquesarefairlyprimitiveMoreinterestingclusteringtechniquesexistbutapparentlyarenotwellknownbythechemicalinformaticscommunity.Non-HierarchicalClusteringClustersformaroundcentroids.Thenumberofwhichcanbespecifiedbytheuser.Allclustersrankequallyandthereisnoparticularrelationshipbetweenthem.digitalchemistry.co.uk/prod_clustering.htmlHierarchicalClusteringClustersarearrangedinhierarchiesSmallerclustersarecontainedwithinlargerones;thebottomofthehierarchyconsistsofindividualobjectsin"singleton"clusters,whilethetopofitconsistsofoneclustercontainingalltheobjectsinthedataset.Suchhierarchiescanbebuilteitherfromthebottomup(agglomerative)orthetopdownwards(divisive)digitalchemistry.co.uk/prod_clustering.htmlFingerprintingandDictionaries--WhatIsYourParameterSpace?ClusteringalgorithmsrequireaparameterspaceClustersdefinedalongcoordinateaxes.Coordinateaxesdefinedbyadictionaryofchemicalstructures.Usebinaryon/offforfingerprintingaparticularcompoundagainstadictionary.digitalchemistry.co.uk/prod_fingerprint.htmlClusterAnalysisandChemicalInformaticsUsedfororganizingdatasetsintochemicalseries,tobuildpredictivemodels,ortoselectrepresentativecompoundsClusteringMethodsJarvis-PatrickandvariantsO(N2),singlepartitionWard’smethodHierarchical,regardedasbest,butatleastO(N2)K-means<O(N2),requiressetnoofclusters,alittle“messy”Sphere-exclusion(Butina)Fast,simple,similartoJPKohonennetworkClustersarrangedin2Dgrid,idealforvisualizationLimitationsofWard’smethodfor

largedatasets(>1m)BestalgorithmshaveO(N2)timerequirement(RNN)Requiresrandomaccesstofingerprintshencesubstantialmemoryrequirements(O(N))ProblemofselectionofbestpartitioncanselectdesirednumberofclustersEasilyhit4GBmemoryaddressinglimiton32bitmachinesApproximately2mcompoundsScalingupclusteringmethodsParallelizationClusteringalgorithmscanbeadaptedformultipleprocessorsSomealgorithmsmoreappropriatethanothersforparticulararchitecturesWard’shasbeenparallelizedforsharedmemorymachines,butoverheadconsiderableNewmethodsandalgorithmsDivisive(“bisecting”)K-meansmethodHierarchicalDivisiveApprox.O(NlogN)DivisiveK-meansClusteringNewhierarchicaldivisivemethodHierarchybuiltfromtopdown,insteadofbottomupDividecompletedatasetintotwoclusters ContinuedividinguntilallitemsaresingletonsEachbinarydivisiondoneusingK-meansmethodOriginallyproposedfordocumentclustering“BisectingK-means”Steinbach,KarypisandKumar(Univ.Minnesota)

/~karypis/publications/Papers/PDF/doccluster.pdfFoundtobemoreeffectivethanagglomerativemethodsFormsmoreuniformly-sizedclustersatgivenlevelBCIDivkmeansSeveraloptionsfordetailedoperationSelectionofnextclusterfordivisionsize,variance,diameteraffectsselectionofpartitionsfromhierarchy,notshapeofhierarchyOptionswithineachK-meansdivisionstepdistancemeasurechoiceofseedsbatch-modeorcontinuousupdateofcentroidsterminationcriterionHavedevelopedparallelversionforLinuxclusters/gridsinconjunctionwithBCIFormoreinformation,seeBarnardandEngelstalksat:cisrg.shef.ac.uk/shef2019/conference.htm

Comparativeexecutiontimes

NCIsubsets,2.2GHzIntelCeleronprocessor

7h27m3h06m2h25m44mDivisiveK-means:ConclusionsMuchfasterthanWard’s,speedcomparabletoK-means,suitableforverylargedatasets(millions)TimerequirementsapproximatelyO(NlogN)Currentimplementationcancluster1mcompoundsinunderaweekonalow-powerdesktopPCCluster1mcompoundsinafewhourswitha4-nodeparallelLinuxclusterBetterbalanceofclustersizesthanWardsorKmeansVisualinspectionofclusterssuggestsbetterassemblyofcompoundseriesthanothermethodsBetterclusteringofactivestogetherthanpreviously-studiedmethodsMemoryrequirementsminimalExperimentsusingAVIDDclusterandTeragridforthcoming

(50+nodes)ConclusionsEffectiveexploitationoflargevolumesanddiversesourcesofchemicalinformationisacriticalproblemtosolve,withapotentialhugeimpactonthedrugdiscoveryprocessMostinformationneedsofchemistsanddrugdiscoveryscientistsareconceptuallystraightforward,butcomplextoimplementAllofthetechnologyisnowinplacetoimplementmayoftheseinformationneed“use-cases”:thefourlevelmodelusingservice-orientedarchitecturestogetherwithsmartclientslooklikeaneatwayofdoingthisInconjunctionwithgridcomputing,rapidandeffectiveorganizationandvisualizationoflargechemicaldatasetsisfeasibleinawebserviceenvironmentSomepiecesaremissing:Chemicalstructuresearchofjournals(waitforInChI)AutomatedpatentsearchingEffectivedatasetorganizationEffectiveinterfaces,especiallyvisualizationoflargenumbersof2DstructuresDivisiveK-MeansasaWebServiceThepreviousexercisewasintendedtoshowthatDivisiveK-MeansisaclassicexampleofGridapplication.NeedstobeparallelizedShouldrunonTeraGridHowdoyoumakethisintoaservice?We’llgoonasmalltourbeforegettingbacktoourproblem.WrappingScienceApplicationsasServicesScienceGridservicestypicallymustwraplegacyapplicationswritteninCorFortran.YoumusthandlesuchproblemsasSpecifyingseveralinputandoutputfilesThesemayneedtobestagedinLaunchingexecutablesandmonitoringtheirprogress.SpecifyingenvironmentvariablesOftenthesehavealsoshellscriptstodosomemiscellaneoustasks.HowdoyouconvertthistoWSDL?Or(equivalently)howdoyouautomaticallygeneratetheXMLjobdescriptionforWS-GRAM?GenericServiceToolkit(GFAC)

(G.Kandaswamy,IUandRENCI)TheGenericServiceToolkitcan"wrap"anycommand-lineapplicationasanapplicationservice.Givenasetofinputparameters,itrunstheapplication,monitorstheapplicationandreturnstheresults.Requiresnomodificationtoprogramcode.Alsohaswebuserinterfacegeneratingtools.Whenauseraccessesanapplicationservice,theuserispresentedwithagraphicaluserinterface(GUI)tothatservice.TheGUIcontainsalistofoperationsthattheuserisallowedtoinvokeonthatservice.Afterchoosinganoperation,theuserispresentedwithaGUIforthatoperation,whichallowstheusertospecifyalltheinputparameterstothatoperation.Theusercantheninvoketheoperationontheserviceandgettheoutputresults./gfac/OPAL(S.Krishan,SDSC)Featuresincludescheduling(usingGlobusandCondor/SGE)andsecurity(usingGSI-basedcertificates),andpersistentstatemanagement.TheWSDLdefinesoperationstodothefollowing:getAppMetadata:includesusageinformation,arbitraryapplication-specificmetadataspecifiedasanarrayofotherelements,e.g.descriptionofthevariousoptionsthatarepassedtotheapplicationbinary.launchJob:runsjobwithspecifiedinputandreturnsaJobID.queryStatus:returnsstatuscode,message,andURLoftheworkingdirectorygetOutputs:returnstheoutputsfromajobthatisidentifiedbyaJobID.URLsforthestandardoutputanderrorArrayofstructuresrepresentingtheoutputfilenamesandURLsgetOutputAsBase64ByName:ThisoperationreturnsthecontentsofanoutputfileasBase64binary.destroy:ThisoperationdestroysarunningjobidentifiedbyaJobID.launchJobBlocking:Thisoperationrequiresthelistofargumentsasastring,andanarrayofstructuresrepresentingtheinputfiles./gridsphere/gridsphere?cid=nbcrwsOurSolution:ApacheAntServicesWe’vefoundusingApacheAnttobeveryusefulforwrappingservices.Cancallexecutables,setenvironmentvariables.Lotsofusefulbuilt-inshell-liketasks.Extensible(writeyourowntasks).DevelopbuildscriptstorunyourapplicationYoucaneasilycallAntfromotherJavaprograms.SojustwriteawrapperserviceWeusebothblocking(holdconnectionuntilreturn)andnon-blockingversion(suitableforlongrunningcodes).Innon-blockingcase,“Context”webserviceisusedforcallbacks.FlowChartofSMILEStoClusterPartitionedofBCIWebServiceSMILEStringMakebitsDictionary(Default)Fingerprint(*.scn)DivKmeansClusterHierarchy(*.dkm)OptclusRNNclusOneColumnProcessMergeProcessExtractedClusterHierarchy(*.clu)NewSMILEStringGeneratingFingerprintsClusteringFingerprintsGeneratingthebestlevelsSMILEStoDKMExtractingindividualclusterpartitionsbestlevelBCIClusteringServiceMethodsServiceMethodDescriptionInputOutputmakebitsGenerateGeneratefingerprintsfromaSMILESstructureSMIstringFingerprintstringdivkmGenerateClusterfingerprintswithDivkmeansSCNstringClusteredHierarchysmile2dkmMakebits+divkmSMIstringClusteredHierarchyoptclusGenerateGeneratethebestlevelsinahierarchyDKMstringBestpartitionclusterlevelrnnclusGenerateExtractindividualclusterpartitionsDKMstringIndiv.clusterpartitionssmile2ClusterPartitionedGenerateanewSMILESstructurew/extracol.SMIstringNewSMILESstructureALibraryofChemicalInformaticsWebServicesAllServicesGreatandSmallLikemostGrids,aChemicalInformaticsGridwillhavetheclassicstyles:DataGridServices:theseprovideaccesstodatasourceslikePubChem,etc.ExecutionGridServices:usedforrunningclusteranalysisprograms,molecularmodelingcodes,etc,onTeraGridandsimilarplaces.ButwealsoneedmanyadditionalservicesHandlingformatconversions(InChI<->SMILES)ShippingandmanipulatingtabulardataDeterminingtoxicityofcompoundsGeneratingbatch2DimagesSooneofourcoreactivitiesis“buildlotsofservices”VOTables:HandlingTabularDataDevelopedbytheVirtualObservatorycommunityforencodingastronomydata.TheVOTableformatisanXMLrepresentationofthetabulardata(datacomingfromBCI,NIHDTPdatabases,andsoon).VOTables-compatibletoolshavebeenbuiltWejustinheritthem.SAVOTandJAVOTJAVAParserAPIsforVOTableallowustoeasilybuildVOTable-basedapplicationsWebServicesSpreadsheetPlottingapplications.VOPlotandTopCataretwoDocumentStructureofVOTableCompoundNameClusterNumberAcemetacin1Candesartan1Acenocoumarol2Dicumarol

2

Phenprocoumon2Trioxsalen2Warfarin2<?xmlversion="1.0"?><VOTABLEversion="1.1“xmlns:xsi=/2019/XMLSchema-instancexsi:noNamespaceSchemaLocation="ivoa/xml/VOTable/VOTable/v1.1"><RESOURCE><TABLEname="results"><FIELDname=“CompoundName"ID="col1"datatype=“char"arraysize=“*”/><FIELDname=“ClustureNumber”ID="col2“datatype=“int”/><DATA><TABLEDATA><TR><TD>Acemetacin</TD><TD>1</TD</TR><TR><TD>Candesartan</TD><TD>1</TD></TR><TR><TD>Acenocoumarol</TD><TD>2</TD></TR><TR><TD>Dicumarol</TD><TD>2</TD></TR><TR><TD>Phenprocoumon</TD><TD>2</TD></TR><TR><TD>Trioxsaken</TD><TD>2</TD></TR><TR><TD>warfarin</TD><TD>2</TD></TR></TABLEDATA></DATA></TABLE></RESOURCE></VOTABLE>

mrtd1.txt–smilesrepresentationofchemicalcompoundsalongwithitsproperties

TavernaClient

WSDLVOTableGeneratorService

retrieveVOTableDocumentTomcatServerVOTableGeneratorServicemrtd1.txtvotable.xmlVOPlotVotable.xml:xmlrepresentationofmrtd1.txtfileVOPlotApplicationfromgeneratedvotable.xmlfile:GraphplottedonMass(X–axis)andPSA(Y-axis)OtherUsesforVOTablesVOTablesisausefulintermediateformatforexchangingdatabetweendatabases.Simpleexample:exchangedatabetweenVARUNAdatabases.EachstudentintheBaikgroupmaintainshis/heroncopy(sandboxpurposes).Oftenneedtoimport/exportindividualdatasets.Itisalsogoodforstoringintermediateresultsinworkflows.Valueisnottheformat,butthefactthattheXMLcanbemanipulatedprogrammatically.Unions,subset,intersectionoperationsMoreServices:WWMMServicesServicesDescriptionsInputOutputInChIGoogleSearchanInChIstructurethroughGoogleinchiBasictypeSearchresultinHTMLformatInChIServerGenerateInChIversionformatAnInChIstructureOpenBabelServerTransformachemicalformattoanotherusingOpenBabelformatinputDataoutputDataoptionsConvertedchemicalstructurestringCMLRSSServerGenerateCMLRSSfeedfromCMLdatamol,titledescriptionlink,sourceConvertedCMLRSSfeedofCMLdataCDK-Based

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論