分布式流式數(shù)據(jù)處理框架:功能對比以及性能評估_第1頁
分布式流式數(shù)據(jù)處理框架:功能對比以及性能評估_第2頁
分布式流式數(shù)據(jù)處理框架:功能對比以及性能評估_第3頁
分布式流式數(shù)據(jù)處理框架:功能對比以及性能評估_第4頁
分布式流式數(shù)據(jù)處理框架:功能對比以及性能評估_第5頁
已閱讀5頁,還剩43頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

FunctionalComparisonandPerformance

EvaluationOverview

Streaming

Core

MISC

Performance

BenchmarkChooseyour

weapon

!2*Other

names

and

brands

maybe

claimed

asthe

property

of

others.Execution

Model

+Fault

ToleranceMechanismContinuous

StreamingMicro-BatchApacheStorm*TwitterHeron*AapcheFlink*ApacheGearpump*Apache

SparkStreaming*Apache

StormTrident*Source

Operator

SinkSourceOperatorSink4*Other

names

and

brands

maybe

claimed

asthe

property

of

others.Continuous

StreamingAckperRecordContinuous

StreamingMicro-BatchCheckpoint

“perBatch”Checkpoint

perBatchApacheStorm*TwitterHeron*AapcheFlink*ApacheGearpump*Apache

SparkStreaming*Apache

StormTrident*StorageStorageStorageSourceOperatorSinkSourceOperatorSinkSourceoffsetOperator

Sinkoffsetstate

strackstate

str

job

statusididJobManager/HDFSAckerDriverHDFSThisisthecriticalpart,asitaffects

many

features5*Other

names

and

brands

maybe

claimed

asthe

property

of

others.Continuous

StreamingAckperRecordContinuous

StreamingMicro-BatchCheckpoint

“perBatch”Checkpoint

perBatchApacheStorm*TwitterHeron*AapcheFlink*ApacheGearpump*Apache

SparkStreaming*Apache

StormTrident*LowLatencyHighLatencyLowOverheadHighThroughputHighOverheadLowThroughput6*Other

names

and

brands

maybe

claimed

asthe

property

of

others.Delivery

GuaranteeApacheStorm*TwitterHeron*AapcheFlink*ApacheGearpump*Apache

SparkStreaming*Apache

StormTrident*Atleast

onceExactlyonce?

Ackersknow

about

ifarecord

is

processedsuccessfullyor

not.

Ifitfailed,replayit.?

State

is

persisted

indurablestorage?

Checkpointis

linked

withstate

storage

perBatch?

Thereis

nostateconsistency

guarantee.7*Other

names

and

brands

maybe

claimed

asthe

property

of

others.Native

StateOperatorApacheStorm*TwitterHeron*AapcheFlink*ApacheGearpump*Apache

SparkStreaming*Apache

StormTrident*Yes*YesYes?

Storm:?

FlinkJava

API:

ValueState

ListState?Spark1.5:

KeyValueStateupdateStateByKey

ReduceState?

Spark1.6:

mapWithState?

Heron:X

UserMaintain?

FlinkScala

API:

mapWithState?

Trident:

persistentAggregateState?

Gearpump

persistState8*Other

names

and

brands

maybe

claimed

asthe

property

of

others.APIApacheStorm*TwitterHeron*CompositionalApacheGearpump*?

Highly

customizable

operator

basedonbasic

building

blocks?

Manualtopology

definition

andoptimizationTopologyBuilderbuilder=newTopologyBuilder();builder.setSpout(“input",newRandomSentenceSpout(),1);builder.setBolt("split",newSplitSentence(),3).shuffleGrouping("spout");builder.setBolt("count",newWordCount(),

2).fieldsGrouping("split",newFields("word"));“foo,foo,

bar”“foo”,“foo”,“bar”{“foo”:

2,

“bar”:1}SpoutBoltBolt10*Other

names

and

brands

maybe

claimed

asthe

property

of

others.Apache

SparkStreaming*Apache

StormTrident*DeclarativeAapcheFlink*ApacheGearpump*?

Higher

order

function

asoperators

(filter,

mapWithState…)?

Logical

planoptimizationDataStream<String>text=env.readTextFile(params.get("input"));DataStream<Tuple2<String,Integer>>counts

=text.flatMap(newTokenizer()).keyBy(0).sum(1);“foo,foo,

bar”“foo”,“foo”,“bar”{“foo”:

1,

“foo”:1,

“bar”:1}{“foo”:

2,

“bar”:1}11*Other

names

and

brands

maybe

claimed

asthe

property

of

others.Statistical?

Data

scientistfriendly?

Dynamic

typeApache

SparkStreaming*ApacheStorm*TwitterHeron*?StructuredStreaming*?ApacheStorm*PythonRlines<-textFile(sc,

“input”)words<-flatMap(lines,function(line)

{lines=ssc.textFileStream(params.get("input"))words=lines.flatMap(lambdaline:line.split(“,"))pairs=words.map(lambdaword:(word,

1))counts=pairs.reduceByKey(lambdax,y:

x

+y)counts.saveAsTextFiles(params.get("output"))strsplit(line,“

”)[[1]]})wordCount<-lapply(words,function(word)

{list(word,1L)}counts<-reduceByKey(wordCount,“+”,

2L)12*Other

names

and

brands

maybe

claimed

asthe

property

of

others.SQLApache

SparkStreaming*AapcheStructuredStreamingApache

StormTrident*PureStyleFusionStyleFlink*CREATE

EXTERNALTABLEInputDStream.transform((rdd:

RDD[Order],time:

Time)

=>{import

sqlContext.implicits._rdd.toDF.registAsTempTableval

SQL

="SELECT

ID,UNIT_PRICE

*QUANTITYAS

TOTAL

FROM

ORDERS

WHERE

UNIT_PRICE

*QUANTITY>

50"ORDERS

(ID

INTPRIMARY

KEY,UNIT_PRICE

INT,QUANTITYINT)LOCATION

'kafka://localhost:2181/brokers?topic=orders'TBLPROPERTIES

'{...}}‘INSERT

INTO

LARGE_ORDERS

SELECT

ID,UNIT_PRICE

*QUANTITYAS

TOTAL

FROM

ORDERS

WHERE

UNIT_PRICE

*QUANTITY>50val

largeOrderDF

=

sqlContext.sql(SQL)largeOrderDF.toRDD})bin/storm

sql

XXXX.sql13*Other

names

and

brands

maybe

claimed

asthe

property

of

others.SummaryCompositionalDeclarativePython/RSQLApache

SparkStreaming*X√√√ApacheStorm*NOTsupportaggregation,windowing

andjoining√X√X√X√√√X√XApache

StormTrident*ApacheGearpump*XXAapcheFlink*Supportselect,

from,where,unionXTwitterHeron*√?X14*Other

names

and

brands

maybe

claimed

asthe

property

of

others.RuntimeModelTwitterHeron*?

Single

Taskon

SingleProcessJVMProcessJVMProcessConnectwithlocalSMConnectwithlocalSMTaskTaskThreadThreadThreadThreadAapcheFlink*?

MultiTasksof

MultiApplications

onSingle

ProcessJVMProcessJVMProcessTaskTaskTaskTaskTaskThreadThreadThreadTaskThreadThreadTasktask

from

application

Atask

from

application

B16*Other

names

and

brands

maybe

claimed

asthe

property

of

others.?

MultiTasksof

Singleapplication

on

Single

ProcessApache

SparkStreaming*o

Single

taskonsinglethreadJVMProcessJVMProcessTaskTaskTaskTaskTaskThreadThreadThreadThreadThreadApacheStorm*Apache

StormTrident*ApacheGearpump*o

Multi

tasksonsinglethreadJVMProcessJVMProcessTaskTaskTaskTaskTaskTaskTaskTaskThreadThreadThreadThread17*Other

names

and

brands

maybe

claimed

asthe

property

of

others.●Window

Support

●Out-of-order

Processing

●Memory

Management●ResourceManagement

●WebUI

●CommunityMaturityWindow

Supportsmaller

than

gapttSliding

WindowCount

Window?Session

Window??session

gapSession

WindowSlidingWindowCountWindowApache

SparkStreaming*Apache√√X√√X√XX?XXX√Storm*Apache

StormTrident*√Apache√?√Gearpump*Apache

Flink*ApacheHeron*XX19*Other

names

and

brands

maybe

claimed

asthe

property

of

others.Out-of-orderProcessingProcessing

TimeEvent

TimeWatermarkApache

Spark√√√√√√√?√X?√Streaming*ApacheStorm*Apache

StormTrident*X√X√ApacheGearpump*AapcheFlink*√√TwitterHeron*XX20*Other

names

and

brands

maybe

claimed

asthe

property

of

others.Memory

ManagementJVM

ManageSelf

Manage

on-heapSelf

Manage

off-heapApache

Spark√√√√√√?√√?√Streaming*AapcheFlink*ApacheStorm*XXXXXXApacheGearpump*TwitterHeron*21*Other

names

and

brands

maybe

claimed

asthe

property

of

others.ResourceManagementStandaloneYARNMesosApache

Spark√√√√√√√√?√?√√√?√?XStreaming*ApacheStorm*Apache

StormTrident*ApacheGearpump*AapcheFlink*√XTwitterHeron*√√22*Other

names

and

brands

maybe

claimed

asthe

property

of

others.WebUISubmitJobsCancelJobsInspectJobsShowStatisticsShowInput

RateCheckExceptionsInspectConfigAlertApacheSparkXX√√X√√√√X√√√√√√√√√√√√?√?X√√√√√√√√√√XXXXXStreaming*ApacheStorm*ApacheGearpump*ApacheFlink*TwitterHeron*√?23*Other

names

and

brands

maybe

claimed

asthe

property

of

others.Past

1Months

Summary

onGitHubCommunity

Maturity10008006004002000CommitsCommittor780ApacheInitiationTimeContributorsTopProjectApacheSpark21718413020132011201420102014201492621921102213420205Streaming*SparkStorm

GearpumpFlinkHeronApacheStorm*Source

website:

/apache/spark/pulse/monthlyPast

3Months

Summary

onJIRAApacheGearpump*Incubator2015Created514Resloved25002000150010005002161ApacheFlink*2084423716177TwitterHeron*2014N/A0SparkStorm

GearpumpFlinkHeronSource

website:

/jira/secure/Dashboard.jspa24*Other

names

andbrandsmay

beclaimed

astheproperty

ofothers.Inteldoes

notcontrolor

auditthird-party

benchmark

dataorthe

websites

referenced

inthisdocument.

Youshouldvisitthereferenced

websiteandconfirmwhether

referenced

dataareaccurate.HiBench

6.0Test

Philosophical?

“LazyBenchmarking”?

Simple

test

caseinfer

practicalusecaseCluster

SetupApacheKafka*

ClusterNameVersion?CPU:

2

xIntel(R)

Xeon(R)

CPU

E5-2699

v3@

2.30GHzMem:

128

GBDisk:8

xHDD

(1TB)Network:

10

Gbpsx3x7Java...1???ScalaApache

Hadoop*Apache

Zookeeper*Apache

Kafka*Apache

Spark*Apache

Storm*Apache

Flink*Apache

Gearpump*Test

Cluster?CPU:

2

xIntel(R)

Xeon(R)

CPU

E5-2697

v2@

2.70GHzCore:

20

/

24Mem:

80

/128

GBDisk:8

xHDD

(1TB

)Network:

10

Gbps??????Apache

Heron*

require

specific

Operation

System

(Ubuntu/CentOS/MacOS)Structured

Streaming

doesn’tsupport

Kafka

sourceyet

(Spark

2.0)27*Other

names

and

brands

maybe

claimed

asthe

property

of

others.ArchitectureTestCluster(Standalone)KafkaBrokerDataGeneratorKafkaBrokerClientMasterTopicAInTimeTopicAKafkaBrokerSlaveSlaveSlave20Core80GMem20Core80GMem20Core80GMemSlaveSlaveSlaveSlaveOutTime20Core80GMem20Core80GMem20Core80GMem20Core80GMemFileSystemResultMetricsReaderOutTime–InTimeFramework

ConfigurationFrameworkRelated

ConfigurationApache

SparkStreaming*7Executor140Parallelism7TaskManager140ParallelismAapcheFlink*ApacheStorm*28Worker140KafkaSpoutApacheGearpump*28Executors140KafkaSource29*Other

names

and

brands

maybe

claimed

asthe

property

of

others.RawInputData?

Kafka

Topic

Partition:140?

SizePerMessage(configurable):

200bytes?

RawInput

MessageExample:“0,6,nbizrgdziebsaecsecujfjcqtvnpcnxxwiopmddorcxnlijdizgoi,1991-06-10,0.115967035,Mozilla/5.0

(iPhone;

U;CPUlike

Mac

OS

X)AppleWebKit/420.1

(KHTML

like

Gecko)

Version/3.0

Mobile/4A93Safari/419.3,YEM,YEM-AR,snowdrops,1”?

Strong

Type:

classUserVisit

(ip,

sessionId,browser)5

minutes?

Keepfeedingdataat

specific

ratefor

5minutes30Data

InputRateThroughputMessage/SecondKafkaProducerNum40KB/s400KB/s4MB/s0.2K2K1120K200K400K2M140MB/s80MB/s400MB/s600MB/s800MB/s111015203M4MTest

Case:IdentityTheapplication

readsinput

datafrom

Kafka

andthen

writesresult

toKafka

immediately,

thereisno

complexbusinesslogic

involved.ResultP99

Latency

(s)8765432100100200300InputRate(MB/s)ApacheFlink*400500600700800ApacheSpark*ApacheStorm*withoutAckApacheStorm*withAck*Other

namesand

brands

maybe

claimed

astheproperty

ofothers.Formore

completeinformationaboutperformanceandbenchmark

results,visit

/benchmarks.Results

havebeenestimated

orsimulatedusing

internal

Intelanalysisorarchitecturesimulationormodeling,andprovided

to

youforinformationalpurposes.

Any

differencesinyoursystemhardware,

software

orconfigurationmayaffect

youractual

performance.TestCase:RepartitionBasically,

thistestcasecan

standfor

theefficiency

of

datashuffle.NetworkShuffleResultP99

Latency

(s)Throughput

(MB/s)8006004002000400300200100000200400600800200400600800InputRate(MB/s)InputRate(MB/s)ApacheSpark*ApacheFlink*ApacheSpark*ApacheFlink*ApacheStorm*withoutAckApacheGearpump*ApacheStorm*withAckApacheStorm*withoutAckApacheGearpump*ApacheStorm*withAck*Other

namesand

brands

maybe

claimed

astheproperty

ofothers.Formore

completeinformationaboutperformanceandbenchmark

results,visit

/benchmarks.Results

havebeenestimated

orsimulatedusing

internal

Intelanalysisorarchitecturesimulationormodeling,andprovided

to

youforinformationalpurposes.

Any

differencesinyoursystemhardware,

software

orconfigurationmayaffect

youractual

performance.Observation?

SparkStreaming

needto

scheduletaskwithadditionalcontext.Under

tinybatchintervalcase,

the

overhead

couldbedramaticworse

comparedtoother

frameworks.?

According

toourtest,minimumBatchInterval

ofSparkisabout80ms

(140tasksperbatch),otherwise

taskscheduledelaywill

keepincreasing?

Repartition

is

heavyforeveryframework,

but

usually

it’sunavoidable.?

Latencyof

Gearpumpis

stillquite

loweven

under800MB/s

inputthroughput.Test

Case:

StatefulWordCountNative

stateoperator

issupported

byall

frameworks

weevaluatedStateful

operator

performance

+Checkpoint/Acker

costResultP99

Latency

(s)Throughput

(MB/s)100808007006005004003002001000604020002004006008000200400600800InputRate(MB/s)InputRate(MB/s)ApacheSpark*ApacheFlink*withoutCPApacheGearpump*ApacheFlink*ApacheStorm*ApacheSpark*ApacheStorm*ApacheFlink*Gearpump**Other

namesand

brands

maybe

claimed

astheproperty

ofothers.Formore

completeinformationaboutperformanceandbenchmark

results,visit

/benchmarks.Results

havebeenestimated

orsimul

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論