阿里云開源大數(shù)據(jù)Workshop 杭州-構(gòu)建流批一體湖倉架構(gòu)打造下一代的數(shù)據(jù)湖格式_第1頁
阿里云開源大數(shù)據(jù)Workshop 杭州-構(gòu)建流批一體湖倉架構(gòu)打造下一代的數(shù)據(jù)湖格式_第2頁
阿里云開源大數(shù)據(jù)Workshop 杭州-構(gòu)建流批一體湖倉架構(gòu)打造下一代的數(shù)據(jù)湖格式_第3頁
阿里云開源大數(shù)據(jù)Workshop 杭州-構(gòu)建流批一體湖倉架構(gòu)打造下一代的數(shù)據(jù)湖格式_第4頁
阿里云開源大數(shù)據(jù)Workshop 杭州-構(gòu)建流批一體湖倉架構(gòu)打造下一代的數(shù)據(jù)湖格式_第5頁
已閱讀5頁,還剩213頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

OPENINGASFMember,ApacheCeleborn/Flink/HBase/PaimonPMCMember阿里云智能EMR負責人DataTrendsDataVolume:AIfurtherdrivesmassivedataexplosion,farexceedingthedatagrowthofthepreviouseracomputation,andmanagementAIModelsVedioAIModelsAnalyticDataOthersPicturesTheEvolutionofDataArchitectureApplicationsDatabaseReportsETLDataDataExploreETLDatawarehousesstructured,semistructuredandunstructuredDataRealtimeAnalyticsMachineLearningDatascienceReportsstructured,structured,semistructuredandunstructuredDatastreamingAnalyticsMachineLearningDatascienceDataWarehouseReportsDatawarehousesETLApplicationsDatabaseDataWarehouseReportsDatawarehousesETLApplicationsDatabaseTheDatawarehouseArchitecturewillbewillbediscardedApplication··out-of-box,EasytouseStrengthsWeaknessesDataTheDataLakeArchitectureDataLakeApplicationstoreAllReportsMachineLearningDatascienceReportsMachineLearningDatascienceDataExploreETLDatawarehousesStrengthsWeaknessesStrengthsWeaknessesstructured,semistructuredandunstructuredDataDataWarehouseReportsDatawarehousesETLApplicationsDatabaseDataWarehouseReportsDatawarehousesETLApplicationsDatabaseDataLake+Datawarehouse=DataLakehousestreamingAnalyticsMachineLearningDatasciencestructured,semistructuredstreamingAnalyticsMachineLearningDatasciencestructured,semistructuredandunstructuredDataReportsMachineLearningRealtimeAnalyticsDatascienceDataExploreETLDatawarehousesstructured,ReportsMachineLearningRealtimeAnalyticsDatascienceDataExploreETLDatawarehousesstructured,semistructuredandunstructuredDataDevOpsComputingEnginesGovernanceServicesManagementServicesDataFormatsApachepaimonDataStorageAlibabaCloudopenLakeTheLakehousesolutiononAlibabacloudDataworksIDECopilotE-MapReduceApplicationApplicationApachePaimon(LakeFormat)OSS-HDFS(LakeStorage)TieredStorageTieredStorageCompactionAuthenticationAuthorizationBUILDOPENSOURCECOMPATIBLELAKEHOUSEONALIBABACLOUDASFMember,ApacheCeleborn/Flink/HBase/PaimonPMCMember阿里云智能EMR負責人RDBMRDBMSLogsODSDWDDWSLogsRecapTheLakehousesolutiononAlibabacloudE-MapReduceApplicationApplicationApachePaimon(LakeFormat)OSS-HDFS(LakeStorage)TieredStorageCompactionAuthenticationAuthorizationServerlessSparkTransformsDataManagementwithOne-Stop,FullyManagedServicesforSeamlessDevelopment,Scheduling,andMaintenance.?One-stopdataengine?VisualizedjobandworkflowthroughputforIO-intensiveAppScenarioAppScenarioVersionControlAccountingAccountingObjectStorageObjectStorageServiceSessionManagement(ResourceforInteractiveSessionManagement(ResourceforInteractiveQuery)QueueManagement(ResourceforETL)CatalogViewArtifactsManagementcontrolplaneDataCatalogViewArtifactsManagementIntelligentDiagnosecontrolplaneJobMonitorandIntelligentDiagnoseCanvasEditorWorkflowInstanceMonitor-SingleExecutionCanvasEditorWorkflowInstanceMonitor-SingleExecutionViewWorkflowInstanceMonitor-GlobalViewTestingEnvironmentTestingEnvironment?AlibabaCloudLinux3?NativeOperatorHardwareawarenessoptimizationacceleration?zstd-ptgcompressionacceleration?Datashufflereducedupto40%NativeC++Integrationintegration?ApacheTopLevelProject,donatedby?ApacheTopLevelProject,donatedbyAlibabaCloud?Enterprisesecurityassurancewithdataencryption?EnhancedIOscheduling,flowcontrolandquotamanagementMulti-?Widelyadoptedin?Successfullysupportsjobwith600TB+shuffledataScalability69%PerformanceboostthanYARNexternalshufflePerformancegainincreaseswithshuffledatascale?SupportsSparkAQEFunctionality??Spark-submitCompatibleJobSubmission?Gitintegration(PlaOSS-HDFS?Workspace?WorkflowsFunction-wiseEMRServerlessSparkYESYESYESYESWorkflowManagementYESYESYESYESYESCatalogandAuthenticationYESYESYESYESAuditingYESYESYESYESYESAssistant/CopilotYESYESTheLakehousesolutiononAlibabacloudE-MapReduceApplicationApplicationApachePaimon(LakeFormat)OSS-HDFS(LakeStorage)TieredStorageCompactionAuthenticationAuthorizationEMRserverlessstarServerlessStarRocksOffersaHigh-Performance,All-Scenario,Blazing-FastandUnifiedDataLakehouseAnalyticsService.100%CompatiblewithOpen-sourceStarRocks,3XFasterth?SIMD-Optimizedqueryengine?Fullstackvectorizedte?DisAggandVirproductArchitectApplicationApplicationScenarioAd-hocdashboardOperationanalyticsUserprofileReal-timeanalyticsSelf-servicereporting…ProductLayerStarRocks-instanceLayerStoragelayerVirtualWarehouseLakehouseAnalyticsShared-NothingArchitectureDataLakeTableFormatStarRocksTableFormatDataLakeSQLEditorDataLoadingSecuritySQLprofManagementalertVirtualWarehouseVirtualWarehouseHealthanalysisUpgrading…InstanceManagementAuto-ScalingFastandunified?Acomprehensivevectorizedexecutionengine,modernizedcost-basedoptimizer(CBO),withconcurrencyreachingtensofthousandsofqueriespersecond(QPS).?Fullycompatiblewithdatalakeformats,offeringmorethana3XperformanceimprovementrelativetoTrino.?SupportsmaterializedviewELTscenarios,enablingone-stepdatatierprocessing.Separationofstorageandcompute?Optimizedcomputationalelasticityforon-demandusage,withthepotentialtoreducestoragecostsbyupto60%.?Offersmulti-computingclustercapabilities,ensuringresourceisolationbetweendifferentbusinessunitswithoutinterference.?Variouscachingstrategiesavailable,allowingcustomerstoflexiblyconfigureaccordingtotheirbusinessneeds.?Outofbox,theStarRocksManageroffersawiderangeofenterprise-levelfeatures.?Intelligentdiagnosticsandanalysis,providingcomprehensiveanalysisinconjunctionwithcustomerbusinessoperations.controlplanestarManagerHighlightsExtremeHighlightsExtremeElasticityDis-aggregationSupportSlowSQLProfileandDiagnoseInstanceDiagnoseLakeQueryAccelerationLakeQueryAccelerationHive/Paimon/Iceberg/HudiADSADSDWSHive/Hive/Paimon/Iceberg/HudiOn-demandSecond-levelElasticitywithLowCostComprehensiveloadanalysisanddiagnostic3x-5xfasterthanTrinoSignificantlyfasterthanClickHouseandApacheDorisSupportexternalMVandLakehouseHierSophisticatedcachingandtieredstoragecapabilityRecapTheLakehousesolutiononAlibabacloudE-MapReduceApplicationApplicationApachePaimon(LakeFormat)OSS-HDFS(LakeStorage)AuthenticationAuthorizationTieredStorageCompactionAuthenticationAuthorizationTieredStorageCompaction?DatabricksDeltaFunctionality?MetaRetrieval?MetaStatsforCBOFullyManaged?Serverless,Elastic?HighAvailable?HighThroughputs?Import/Exportfrom/toHMSAuthenticationAuthorizationAuditing?AuditLogforAuthorization?AuditLogforMetaOperation?AuditLogforDataOperation(WIP)CompactionManagerITieredStorageManagerThanksYuLiliyu@李勁松ApachePaimonPMCChairCONTENTS1.OpenLake:一套存儲對接全生態(tài)2.ApachePaimon與開源計算引擎3.ApachePaimon與自研計算引擎4.ApachePaimon實踐場景CONTENTS數(shù)據(jù)湖到湖倉一體數(shù)據(jù)交換0101010101010101010101010100101湖格式SDK讀寫湖倉一體元數(shù)據(jù)湖格式0101010101010101010101010100101數(shù)據(jù)架構(gòu)的選擇批式數(shù)倉實時湖倉實時數(shù)倉openLakeTheLakehousesolutiononAlibabacloudDataworksIDECopilotE-MapReduceApplicationApplicationApachePaimon(LakeFormat)OSS-HDFS(LakeStorage)TieredStorageTieredStorageCompactionAuthenticationAuthorizationpaimonCOMPUTINGPLATF○RMPaimon+開源大數(shù)據(jù)?共享存儲,計算平權(quán)?流批一體,實時升級?實時離線,極速查詢ApplicationIngestionApplicationIngestionstreamingIngestion實時OLAPOLAP實時OLAPOLAPongoingBatchLeftBatchLeftJoinAggregatestreamingstreamingpartiaupdateAggregate001011101010101010101010101001001阿里云Flink+paimon:streamingLakehouse45流寫更新入湖45流寫更新入湖多表數(shù)據(jù)打?qū)?5流讀變更日志流讀變更日志spark+paimon:(Higherisbetter)(Higherisbetter)20阿里云離線數(shù)據(jù)極速阿里云阿里云starPaimon:Deletionvectors模式paimonDataLakeInformation:BridgetoMC&HoloDLF打通自研計算引擎?MaxCompute:ExternalSchema$ApachePaimon(LakeFormat)OSS-HDFS(LakeStorage)即將發(fā)布?ALIORC格式?批寫支持ApachePaimon(LakeFormat)OSS-HDFS(LakeStorage)即將發(fā)布ApachePaimon(LakeFormat)OSS-HDFS(LakeStorage)paimon某新能源汽車公司在阿里云上的實踐 ApplicationStreamingIngestion streaming異步compactionAppend表changelog=lookup某游戲公司在阿里云上的實踐 ApplicationStreamingIngestion實時OLAP01011101011110101010101010101010010010streamingBatch異步compaction主鍵表主鍵表Append表changelog=inputdeletion-vectorsApacheApachePaimon某本地生活公司在阿里云上的實踐 ApplicationStreamingIngestion高性能OLAPAppend表Cluster:Z-orderThanks李勁松ApachePaimonPMCChair阿里云實時湖倉及Flink產(chǎn)品技術(shù)介紹阿里云計算平臺1大數(shù)據(jù)實時湖倉發(fā)展趨勢洞察2基于阿里云實時計算Flink構(gòu)建實時湖倉3阿里云實時計算Flink產(chǎn)品能力解讀CONTENTS4典型落地架構(gòu)及案例分享大數(shù)據(jù)進入實時化湖倉時代!3.02.02.0數(shù)倉2023~2020-2022數(shù)據(jù)湖數(shù)據(jù)倉庫數(shù)據(jù)湖結(jié)構(gòu)化,半結(jié)構(gòu)化及非結(jié)構(gòu)化數(shù)據(jù)02基于阿里云實時計算Flink構(gòu)建實時湖倉(streaminglakeStreamingLakehouse分鐘級新鮮度分鐘級新鮮度低成本低成本全鏈路實時全鏈路實時秒級查詢響應(yīng)秒級查詢響應(yīng)Streaming:T+1s流/批流/批流/批流/批流/批流/批?低成本OSS存儲構(gòu)建Paimon?深度集成Flink全鏈路實時化?低成本全鏈路實時化?流批存儲計算統(tǒng)一?一套平臺具備數(shù)據(jù)管理、調(diào)度?開放支持多引擎?離線全鏈路實時加速?實時鏈路降本?流批存儲計算統(tǒng)一實時湖倉全鏈路實時加速端到端,全鏈路實時流動,實時更新,分鐘級新鮮度,全鏈路可查,秒級查詢響應(yīng)! 數(shù)據(jù)攝取數(shù)據(jù)存儲數(shù)據(jù)計算數(shù)據(jù)查詢?Streaming?StreamingETL?開放支持多種計算引擎?開放支持多種Olap引擎?外表方式查詢秒級響應(yīng)?也可直接upload到引擎?基于內(nèi)存優(yōu)化查詢性能?Upsert/Partial-Update?TimeTravel?BatchOverwrite/Query實時入湖入倉-簡化操作CTAS分庫分表合并同步CDAS整庫同步實時入湖入倉兼容表變更(schemaEvolution)?支持通過Catalog來實現(xiàn)元數(shù)據(jù)的自動發(fā)現(xiàn)和管理?配合CTAS語法,實現(xiàn)數(shù)據(jù)的同步和表結(jié)構(gòu)變更自動同步?支持讀取數(shù)據(jù)變更和表結(jié)構(gòu)變更并同步到下游,數(shù)據(jù)和表結(jié)構(gòu)變更都可以保證順序?同步到Paimontable時Partitionby可自動兼容有無分區(qū)字段實時入湖入倉-多種過程操作SELECTWHEREGROUPBYSELECTWHEREGROUPBYTop-NINSERTSQLAPImapkeyByaggregateDataStreamAPIMoresourcesareontheway實時湖倉低成本存儲 DistributedFileSystem(HDFS/OSS/S3)低延時低延時低成本低成本流批存儲流批存儲支持數(shù)據(jù)流批計算?通過兩階段提交保證數(shù)據(jù)ExactlyOnceBatchBatchStreaStream03阿里云實時計算Flink產(chǎn)品能力解讀阿里云實時計算Flink產(chǎn)品豐富的企業(yè)級能力數(shù)據(jù)攝取數(shù)據(jù)攝取?Yaml模版統(tǒng)一元數(shù)據(jù)(catalog)上下游SSL支持升級企業(yè)級安全能力基礎(chǔ)設(shè)施、平臺系統(tǒng)安全多維度,提供全面的安全加固功能來保障數(shù)據(jù)安全!云上大數(shù)據(jù)服務(wù)如何保障企業(yè)數(shù)據(jù)和服務(wù)安全構(gòu)建全面、多層次的安全管理能力,持續(xù)保護云上數(shù)據(jù)及服務(wù)安全通AccessKey帶來的安全風險復(fù)復(fù)自定義connector管理catalog管理deployment動態(tài)更新lineage數(shù)據(jù)血緣deploymentTarget改造UDF注冊作業(yè)資源自動調(diào)優(yōu)基于業(yè)務(wù)處理復(fù)雜度與數(shù)據(jù)流量,資源動基于業(yè)務(wù)處理復(fù)雜度與數(shù)據(jù)流量,資源動態(tài)調(diào)整資源利用率低資源利用率低采集指標采集指標成本高作業(yè)吞吐低,延遲高過低作業(yè)吞吐低,延遲高過低(易發(fā)生FailOver啟動速度慢作業(yè)管理平臺更新作業(yè)過高(易發(fā)生FailOver啟動速度慢作業(yè)管理平臺更新作業(yè)過高指標分析指標分析綜合各指標生成調(diào)優(yōu)執(zhí)行計劃重啟作業(yè)動態(tài)更新作業(yè)部署集群04典型落地架構(gòu)及案例分享 簡單SQL 駕FlinkBinlogDashboardskBinlogDashboardskk?Hologres、Paimon都具備流式訪問能力,故數(shù)倉各層可以根據(jù)存儲成本、業(yè)務(wù)時效性進行選擇?數(shù)據(jù)直接入Hologres:提供秒級時效性+?OLAP引擎可選,支持StarRocks、Trino等典型客戶落地案例過離線數(shù)據(jù)處理;理離線數(shù)據(jù);過程中,兩條技術(shù)棧開發(fā)、維護成本高,存儲成開發(fā)效率提升進一倍,每年節(jié)省存儲成本KW,查詢效率提升3倍;?從兩條鏈路簡化到一條鏈路,簡化了系統(tǒng)的復(fù)雜度;運維工?一套SQL/Table、一套schema,大幅提升開發(fā)效率;?大量縮減Kafka集群,每年節(jié)省KW成本;kafkakafkakafkakafka增量databasedatabase增量databasedatabase加工StarRocks二加工Paimon二加工PaimonThanks釘釘信:tute2014阿里巴巴智能引擎事業(yè)部技術(shù)專家CONTENTS1、產(chǎn)品背景簡介2、解決方案舉例---搜索離線平臺3、生產(chǎn)作業(yè)調(diào)優(yōu)及社區(qū)合作4、FutureTransactionsAlgorithmdataEventsLogsTransactionsAlgorithmdataEventsLogs業(yè)務(wù)場景及產(chǎn)品定義BinlogMessageQueueDatabaseMysqlODPSPaimonOfflineSystemStreamStreamProcessingBatchProcessingBatchProcessingMessageQueueODPSPaimonHologresFileSystemSearchEngineAdvertisingEngineRecommendationEngineSampleEngine2、業(yè)務(wù)多且邏輯復(fù)雜3、性能調(diào)優(yōu)難、運維門檻高基于該業(yè)務(wù)場景我們做了一個提供AI領(lǐng)域e2e的ETL數(shù)據(jù)處理解決方案的產(chǎn)品產(chǎn)品技術(shù)架構(gòu) 淘寶天貓本地生活菜鳥高德AE飛豬LazadaOpenSearch…產(chǎn)品端 淘寶天貓本地生活菜鳥高德AE飛豬LazadaOpenSearch…產(chǎn)品端據(jù)搜推平臺離線推理大模型視覺平臺評測特征樣本平臺UI&&WebIDE(開發(fā)、配置、運維、監(jiān)控、報警)Embedding計算數(shù)據(jù)集成樣本處理數(shù)據(jù)集成樣本處理SQLAdHocOLAP流計算批計算流批一體用戶插件調(diào)度編排AirflowAirflow調(diào)度VVP提作業(yè)、開發(fā)、運維Catalog(Meta、版本、血緣、Dataset)Restune作業(yè)彈性資源Celeborn統(tǒng)一Shuffle服務(wù)Paimon湖格式PaimonTDDLSwiftVVP提作業(yè)、開發(fā)、運維Catalog(Meta、版本、血緣、Dataset)Restune作業(yè)彈性資源Celeborn統(tǒng)一Shuffle服務(wù)Paimon湖格式PaimonTDDLSwift消息隊列湖表存儲優(yōu)化服務(wù)VVRDRCHA3SparkHologres分布式kv存儲ODPSUDxFCDCPangu(分布式文件系統(tǒng))ConnectorASI(支持K8S協(xié)議的統(tǒng)一調(diào)度、統(tǒng)一資源池)計算存儲計算搜索離線平臺定義如何將來源于不同維度數(shù)據(jù)源的數(shù)據(jù)匯集到同一個頁面?QueryQuery在線搜索引擎在線搜索引擎user_iduser_name1vivo“103”:“vivo-x50-pro”}3搜索在線集群1131vivo31131vivo3131vivo“103”:“vivo-x50-pro”}3商品維度Hologres寬表DimJoin商品維度Hologres寬表賣家維度Hologres寬表發(fā)送到搜索引擎joinKey:賣家維度Hologres寬表發(fā)送到搜索引擎user_iduser_name1vivo“103”:“vivo-x50-pro”}3業(yè)務(wù)邏輯翻譯為VVR作業(yè)、流批一體業(yè)務(wù)邏輯圖VVR批作業(yè)VVR流作業(yè)nn:11:n1:11:n1:1商品擴展表賣家表商品擴展表賣家表同步層同步層…………ssScanDimJoinDimJoin DimJoinJoin層ScanDimJoinDimJoin DimJoinJoin層業(yè)務(wù)開發(fā)模式為了方便業(yè)務(wù)開發(fā),提供“托拉拽”開發(fā)圖,方便業(yè)務(wù)方描述業(yè)務(wù)邏輯業(yè)務(wù)“托拉拽”開發(fā)圖啟動全量(生成調(diào)度圖)Airflow任務(wù)調(diào)度圖分配存儲資源注冊產(chǎn)出信號同步層全量優(yōu)化---數(shù)據(jù)集成寫數(shù)據(jù)Supportclonelatestsnap/apache/paimon/pull/3159/apache/paimon/pull/3287/apache/paimon/pull/3426SupportclonelatestsnapSupportremovedborpModifythedefaultSupportremovedborpModifythedefaultvalueof"target-file-size”/apache/paimon/pull/3721/apache/paimon/pull/3779一、原鏈路缺點1、并發(fā)有上限限制,吞吐受限,加并發(fā)有拉掛庫的風險。2、核心庫拉取時間只能晚上。二、新鏈路預(yù)期收益寫數(shù)據(jù)HologresSinkHologresSinkOperatorODPSSourceOperatorODPSSourceOperatorHologresSinkOperatorRPCRPCODPSSourceOperatorHologresSinkOperatorRPCRPCRPCODPSSourceOperatorHologresSinkOperatorRPCRPCODPSSourceOperatorHologresSinkOperatorRPCRPCTaskManager嘗試:所有節(jié)點都Chain在一起。缺點:由于多個應(yīng)用共享一個Hologres集群,所以容易造成HologresWorker網(wǎng)絡(luò)繁忙,各應(yīng)用之間相互影響吞吐。Job1JobnJob2NetworkBusy!!!寫數(shù)據(jù)HologresHologresSinkOperatorhologres_hash_function(key)ODPSSourceOperatorrRPCODPSSourceOperatorrRPCWorkerHologresHologresSinkOperatorODPSSourceOperatorRPCODPSSourceOperatorRPCWorkerHologreHologresSinkOperatorODPSSourceOperatorrODPSSourceOperatorrRPCWorkerHologreHologresSinkOperatorRPCTaskManagerRPCTaskManagerWorkerTaskManager按照Hologres對主鍵Key做Hash分Shard的規(guī)則在VVR中自定義Partitioner,加層Shuffle,以此減少HologresWorker的網(wǎng)絡(luò)連接。VVRBatchJob資源消耗大。VVRVVRVVRVVRVVRJob2Worker寫數(shù)據(jù)VVRVVRWriteJobWriteFlushWorkerPanguHologresSQLServerlessServerlessTaskWorkerReadReadWrite1、嘗試Hologres起ServerlessTask,直讀ODPS盤古上的ORC,數(shù)據(jù)不再經(jīng)過WAL和MemoryTable。保證資源隔離,不再對HologresWorker上正在跑的Scan、點查等造成影響。2、缺點ODPS與Hologres之間不能有別的算子,比如UDTF。同步層全量優(yōu)化---結(jié)論優(yōu)點所有節(jié)點Chain在一起各應(yīng)用吞吐易相互影響應(yīng)用吞吐不會相互影響所以目前生產(chǎn)環(huán)境中,是根據(jù)不同業(yè)務(wù)場景選用不同的數(shù)據(jù)同步方法。ScanDimJoiScanHolo表ShardHolo表Shard數(shù)并發(fā)DimJoinHologresHologresSourceDimJoinHologresHologresSourceOperator……SinkSinkOperatorHologresHologresSourceHologresHologresSourceOperator……SinkSinkOperatorHologresHologresSourceHologresHologresSourceOperator……SinkSinkOperatorTaskManager缺點:1、Failover代價大,全量產(chǎn)出易延遲。2、混部環(huán)境惡劣,作業(yè)長尾嚴重ScansScans 近百個OperatorsSourceOperatorHologresSourceOperatorsSourceOperatorCelebornHolosSourceOperatorHologresSourceOperatorsSourceOperatorCelebornHolo表Shard數(shù)并發(fā)DimJoinHologres…SinkOperatorDimJoinHologreDimJoinHologres…SinkOperatoSinkOperatorDimJoinHologreDimJoinHologres…OperatoOperatorTaskManagerTaskTaskManager缺點:1、JobManagerOOM2、TM與JobManager心跳超時3、JobManager與Zookeeper斷連并發(fā)上限HologresSourceOperatorJobManager缺點:穩(wěn)定性雖高,但耗時不算優(yōu)秀Holo表ShardHologresSourceOperatorJobManager缺點:穩(wěn)定性雖高,但耗時不算優(yōu)秀Holo表Shard數(shù)并發(fā)并發(fā)上限TaskManager起備份Task解長尾問題資源復(fù)用Load低的機器跑更多的Task用限制資源的方式實現(xiàn)分批調(diào)度,完成一個Task起一個TaskSpeculativeSchedulerDimJoinDimJoinHologresSinkSinkOperatorScanSScan/apache/paimon/pull/3474Supportcustomcommituserprefix/apache/paimon/pull/3507Introducepartitionmarkdonewithend-inputVVRVVR有限流耗時Max(T1,T2)https:///apache/paimon/pull/3584批作業(yè)耗時T2…批作業(yè)耗時T2…DeltaDeltaFullDeltaFullDelta耗時(T1耗時(T1+T2)優(yōu)點++理效率相對不高所以目前生產(chǎn)環(huán)境中,是根據(jù)不同業(yè)務(wù)場景選用不同的數(shù)據(jù)處理方法。慢節(jié)點慢節(jié)點SourceSinkSourceSinRound-RobinSourceOperatorRecSourceOperatorRecordWriterConnectio下游TaskConnectio下游TaskManagerMap<ResultSubMap<ResultSubPartitionId,Weight>ConnectionRecordSerializerResultSubPartition1ResultSubPartitionResultSubPartition2上游TaskManager每當有新消息下發(fā)要選擇哪個SubPartition時,DynamicRebalanceChannalSelector會選擇一個權(quán)重最低的下游并發(fā)發(fā)送Buffer。App2App1App2App1App2App1App2App1App3App3App3App3App4App4App4App4VVRBatchJob+PanguVVR有限流Job+PaimonFuture---Open-Lake、湖倉一體阿里云客戶阿里云客戶阿里巴巴集團數(shù)據(jù)業(yè)務(wù)SparkSparkMaxComputeRealtimeComputeE-MapReduceOpen-LakeApachePaimon01010101010101010101010101010101E-mail:hongli.wwj@MC湖倉一體實踐及開源開放生態(tài)融合王燁(萌豆)-高級技術(shù)專家?王燁(花名:萌豆)、高級技術(shù)專家?阿里云MaxCompute團隊,湖倉一體化方向產(chǎn)品技術(shù)負責人,SQL引擎、優(yōu)化器方向核心研發(fā)Email:ye.wangy@alibaba-inc1.從聯(lián)邦計算出發(fā),構(gòu)建基礎(chǔ)湖倉能力2.以數(shù)據(jù)湖為重心,全棧優(yōu)化外表性能CONTENTS3.高效分析非結(jié)構(gòu)化數(shù)據(jù),強化AI應(yīng)用4.多引擎間平權(quán),開源開放融合新架構(gòu)產(chǎn)品介紹“云原生大數(shù)據(jù)計算服務(wù)(MaxCompute)是一種快速、完全托管的TB/PB級數(shù)據(jù)倉庫解決方案。MaxCompute向用戶提供了完善的數(shù)據(jù)導(dǎo)入方案以及多種經(jīng)典的分布式計算模型,能夠更快速的解決用戶海量數(shù)據(jù)計算問題,有

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論