第九屆r會(huì)議北京演講可公開(kāi)slides軟件工具a mark chen deep dive in database_第1頁(yè)
第九屆r會(huì)議北京演講可公開(kāi)slides軟件工具a mark chen deep dive in database_第2頁(yè)
第九屆r會(huì)議北京演講可公開(kāi)slides軟件工具a mark chen deep dive in database_第3頁(yè)
第九屆r會(huì)議北京演講可公開(kāi)slides軟件工具a mark chen deep dive in database_第4頁(yè)
第九屆r會(huì)議北京演講可公開(kāi)slides軟件工具a mark chen deep dive in database_第5頁(yè)
已閱讀5頁(yè),還剩40頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶(hù)提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、 Deep dive the In-database RBuilding intelligent applications using SQL Server R Services: an End to End WalkthroughMark Chen eMail: Advanced Analytics TSP - Global Black Belt - Greater China RegionYoull learn an end-to-end solution for predictive modeling using SQL Server R ServicesThis sessions is

2、 based on a well-known public data set, the New York City taxi datasetYou will see a combination of R code, SQL Server data, and custom SQL functions to build a classification model that indicates the probability that the driver will get a tip on a particular taxi tripYoull also learn your R model t

3、o SQL Server and use server data to generate scores based on the modelIntroduction to Microsoft R ServerBuilding intelligent applications using SQL Server R Services: an End to End WalkthroughConsistent experience from on-premises to cloudIndustry leaderA fraction of the costMicrosoftTableauOracle$1

4、20$480$2,230Self-service BI per userIn-memory across all workloadsbuilt-inbuilt-inbuilt-inbuilt-inbuilt-inR + in-memoryat massive scale6 years in a rowleast vulnerable0140033429221552264320691849301020304050607080123456SQL ServerOracleMySQLSAP HANA #1 performanceTPC-HOracle is #5#2SQL Server#1#1SQL

5、Server#3SQL ServerSQL Server 2016: Everything built-inTypical advanced analytics lifecycleIngestTransformExploreModelDeployScoreVisualizeMeasureModelScore(x)PreparationModelingOperationalizationData Scientist should be creating / testing modelsData scientist average wage - 350 K yearIngestTransformE

6、xploreModelDeployScoreVisualizeMeasureModelScore(x)PreparationModelingOperationalizationBut the reality is different Data scientist focus timeIngestTransformExploreModelDeployScoreVisualizeMeasureModelScore(x)PreparationModelingOperationalization80%5%15%DecisionsOperationazeMaking Advanced Analytics

7、 a team sportPreparationmodel Free and open source R distribution Enhanced and distributed by Revolution AnalyticsRevolution R Open Secure, Scalable and Supported Distribution of R With commercial licensed components created by Revolution AnalyticsRevolution R Enterprise Free and open source R distr

8、ibution Enhanced and distributed by Revolution AnalyticsMicrosoft R Open Secure, Scalable and Supported Distribution of R With proprietary components created by Revolution AnalyticsMicrosoft R Server Microsoft R Server for Redhat Linux Microsoft R Server for SUSE Linux Microsoft R Server for Teradat

9、a DB Microsoft R Server for Hadoop Microsoft SQL Server 2016 Enterprise Edition R Services includes allowance for R Server for WindowsMicrosoft R ServerMicrosoft R Application Network: Open Source Components CommercialElements Data manipulationDescriptive Stats Visualisation Decision TreesDecision F

10、orestsGradient Boosted D TreesNave BayesStepwise Regression GLMs Linear & Logistic Regression K-MeansSimulationPredictionCombination OS R R OpenR OpenRSR ConnectorR+CRANDistributedRScaleRConnectRDeployRR ServerR ServerExample: fraud analytics deployed to BI toolExample: Market Basket Analysis in

11、 HTML tool Size of circles indicate credit card balance, and the darkness of the circle shows the prediction of fraud Example: integration with Excel User selected and entered parameters used to run various R models and produce different R model outputs DemoNew York City Taxi Web AppData SourcesMicr

12、osoft R Server Provided:Big Data ScaleParallelized ComputationNo RAM LimtationsTransparent Parallelization to Accelerate & Scale Computation ParallelizedAlgorithmsParallelized Data Access“Chunked” Processing To Alleviate RAM LimitsFirst Two Key InnovationsData SourcesRemote Execution Slashes Mov

13、ement & Reduces CopyingR Packages for Data AccessR Packages for Data AccessParallelizedAlgorithmsR Packages for Data AccessR Packages for Data AccessParallelized Data AccessRemote ExecutionRemote Execution ContextThird Key InnovationSQL Server 2016AgilityIn DB analytics shrinks analysis time and

14、 enables agile response and reaction.Code Portability Across Platforms/ No model RecodingScripts and models can be executed on a variety of platformsChoiceReduce or Eliminate Data MovementHybrid memory & disk scalability Model on Prem, Score on Cloud or Vice VersaCost EffectiveIncluded in SQL 20

15、16 Enterprise EditionNo dedicated Hardware/Platform RequirementLeverage existing R or SQL SkillsSQLSQLSERVER 2016 SERVER 2016 Revolution R OpenRevolution R EnterpriseR IDEData Scientist WorkstationSQL Server 2016Revolution R OpenRevolution R EnterpriseR ScriptResults Execution near-db123sqlCompute -

16、 RxInSqlServer()rxSetComputeContext(sqlCompute)linModObj - rxLinMod()AA ExtensionsPersona: Data Scientist / R Developer Working from my R IDE on my workstation, I can execute an R script that runs near-database, and get the results back.ApplicationSQL Server 2016Revolution R OpenRevolution R Enterpr

17、iseSystem Stored ProcedureResults: scores, plotsThe stored procedure contains R code and executes near-db.123exec sp_execute_external_script langague = R, script = - R code -AA ExtensionsModelPrepareSQL 2016OperationalizeR & ScaleR ModelsModelPrepareSQL 2016OperationalizeOperationalizeR & Sc

18、aleR ModelsR Models VAMPsTeradata DatabaseDataDatabase NodesStorageParseEngineExternal Stored ProcedureTable OperatorTable OperatorTable OperatorTable OperatorDesktops & ServersR ServerR WorkstationR ServerR ServerR ServerR ServerR ServerMachine DataNew Data SourcesData SuppliersStructured DataI

19、BMMainframeData SourcesBusiness Analysts(Alteryx, Tableau, QlikView, Cognos, Microstrategy, Datameer etc.)Power Analysts(R Studio, DevelopR + Microsoft R for Windows.)Line of Business users(Analytic Apps, Rules Engines, etc.)Analytics ConsumersWebService with DeployRScored DataClient Server(s)Master

20、 DataDataHadoopR ModelExecution in-TeradataDataR ModelsExecutionin HadoopR + CRANRevoRDistributedRConnectRScaleRMicrosoft R Server for HadoopMicrosoft R Server for LinuxMicrosoft R Server for WindowsMicrosoft R Server for Teradata DBTeradata DatabaseR Model ExecutionDataDataR ModelsExecutionin Terad

21、ataDataData Billion row logistic regression with “a few variables” 4 iterations R on Teradata 114 (6 nodes, 72 cores) Scales linearly as # of rows increases Scales linearly with number of nodes 6 node test system is 3 x faster than 2 node test system Running locally on 4 core AWS Server with 15GB RA

22、M 1699 secs !5+ hours to 40 seconds run timerowsminutesMoving the data out to “beside” R serverMove the algorithm to run Microsoft R Server inside Teradata database DataNodesCorporateApplicationsDesktops & ServersMicrosoft R EnterpriseHadoopLinuxHPA or LASRHadoop ClusterMapReduceHPA or LASROther

23、 Hadoop Users & ApplicationsSAS User or App.HDFSLinuxHadoop ClusterR Users & ApplicationsOther YARN TasksMapReduceOther JobsYARNOther Hadoop Users & ApplicationsHDFSMicrosoft R Server“Inside” Hadoop ArchitectureSAS HPA & LASR “Alongside” ArchitectureWhat is an intelligent application

24、? Traditional Applications Use program logic based on static rules and actions Intelligent Applications Leverage large scale enterprise data, external data sources to infer patterns of behavior, make personalized predictions and take automated actions dynamically. Examples Forecasting, Product Recom

25、mendations, Churn prediction and actions, Preventive maintenance, Fraud detectionApplications that leverage data and derive insights and actionsProcess for building Intelligent ApplicationsIngestImport data from multiple enterprise & external data sourcesExploreDevelop deep data understanding to

26、 glean patterns.FeaturizeCreate InsightfulFeatures to facilitateModel buildingModelBuild and evaluatePredictive models.Select best models.ConsumeDevelop apps toconsume the predictive models.Ingest Data Import data into SQL Server 2016 using standard tools like Bulk Copy(BCP), SSIS, Polybase, Azure D

27、ata Factory Import data directly into in-memory columnstore indexes for parallel loadsExplore Data Summarize data by SQL queries Clean data by SQL queries Visualize data by R scripts embedded in stored procedures or client R code executed in-database R scripts in stored procedures are executed by SQ

28、L Server R Services in database R scripts in stored procedures are execute in parallel in SQL Server R ServicesCreate data features Use functions in SQL to create new features Use functions defined in R that are embedded in SQL stored procedures to create new features SQL functions might be more sui

29、table in creating some features, while R might be more suitable for some others. Select the most efficient way to improve you productivity.Build models Build predictive models in R scripts, which are embedded in SQL stored procedures R has a rich library of predictive analytics algorithms. The store

30、d procedures are executed within the database server, so are the R scripts Call a simple predict() function in R to evaluate the model on validation data, and use another single function to extract performance metrics Persist model as a record in the databasesConsume model form applications Wrap up

31、calling the predictive models to predict as stored procedures Intelligent apps written in any language can consume model by invoking stored procedures.IngestImport data from multiple enterprise & external data sourcesExploreDevelop deep data understanding to glean patterns.FeaturizeCreate Insigh

32、tfulFeatures to facilitateModel buildingModelBuild and evaluatePredictive models.Select best models.ConsumeDevelop apps toconsume the predictive models.Process for building Intelligent ApplicationsImport data into SQL Server 2016 using standard tools like Bulk Copy(BCP), SSIS, Polybase, Azure Data F

33、actoryImport data directly into in-memory columnstore indexes for parallel loadsSummarize data by SQL queriesClean data by SQL queriesVisualize data by R scripts embedded in stored procedures or client R code executed in-databaseR scripts in stored procedures are executed by SQL Server R Services in

34、 databaseR scripts in stored procedures are execute in parallel in SQL Server R ServicesUse functions in SQL to create new featuresUse functions defined in R that are embedded in SQL stored procedures to create new featuresSQL functions might be more suitable in creating some features, while R might be more suitable for some others.Select the most efficient way to improve you productivity.Build predictive models in R scripts, which are embedded in SQL stored proceduresR has a rich library of predictive analytics algorithms. The stored procedures are executed within the database se

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶(hù)所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶(hù)上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶(hù)因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論