




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、Apache Samza: 領(lǐng)英大規(guī)模數(shù)據(jù)流處理(Secret Kung Fu of Massive Scale Stream Processing with Apache SamzaLinkedIn)介紹Samza LinkedInSamza大規(guī)模流處理的4大秘籍 總結(jié)展望ApacheSamzaApacheSamza is adistributedstream andbatch processingframework.DBchange captureKafkaHadoopKinesisKafkaHadoopCloud ServiceElastic SearchsamzaApacheSamza
2、 社區(qū)Apache頂級(jí)項(xiàng)目(2014年12月起)7次發(fā)布(0.70.13)14Committers62Contributors用 戶 :LinkedIn, Uber, MetaMarkets, Netflix, Intuit, TripAdvisor, MobileAware, Optimizely Samza LinkedIn200 應(yīng)用Production : A Trillion Events +/天單個(gè)節(jié)點(diǎn):1.1M Messages / 秒廣泛應(yīng)用在各個(gè)領(lǐng)域,數(shù)量在過去兩年內(nèi)指數(shù)級(jí)增長(zhǎng)SecurityNotificationsNews classificationPerformanc
3、e monitoring實(shí)時(shí)推送通知系統(tǒng)Notification ProcessorUser Chat EventUser Action EventConnectio n Activity EventRestful Servicesinvitations, mailbox, connection graph,network feed, and commentsMember profile databas eAggregati on EngineChannel SelectionState store6input1input2input3(D Local Data Access Remote D
4、atabase Lookup Remote Service Calloutput介紹Samza LinkedInSamza大規(guī)模流處理的4大秘籍總結(jié)展望Samza 數(shù)據(jù)處理秘籍之面向數(shù)據(jù)流的編程模型High-level API范例分析對(duì)千個(gè)PageViewEvent的Kafka數(shù)據(jù)流(Partitioned by page key),每五分鐘統(tǒng)計(jì)次每個(gè)用戶的event 數(shù)量,然后發(fā)送給另個(gè)Kafka topic.tt+5傳統(tǒng)的事件處理編程模型如果使用基本的Event Processing編程,對(duì)每個(gè)event都需要做如下的工作:將原PageViewEvent對(duì)用戶Id進(jìn)行repartition
5、在repartition后,每個(gè)Event都根據(jù)key = (timestamp, memberId) 寫入個(gè)key-value store.當(dāng)5分鐘window timer到來時(shí),對(duì)這個(gè)kv store進(jìn)行過去五分鐘的range query,對(duì)這五分鐘內(nèi)出現(xiàn)的所有用戶和Pageview進(jìn)行統(tǒng)計(jì)統(tǒng)計(jì)結(jié)果發(fā)送到另個(gè)Kafka topic此編程模型效率低,程序冗長(zhǎng),容易出錯(cuò),可維護(hù)性差。Re#paronInsert into KV StoreWindow &CountsendToPageViewEventPageViewEven tByMemberIdPageViewEventPer Member
6、Stream數(shù)據(jù)流編程模型public class RepartitionAndCounterExample implements StreamApplication Override public void init(StreamGraph graph, Config config) Supplier initialValue = () - 0; MessageStream pageViewEvents =graph.getInputStream(pageViewEventStream, (k, m) - (PageViewEvent) m);OutputStream pageViewEve
7、ntPerMemberStream = graph.getOutputStream(pageViewEventPerMemberStream, m - m.memberId, m - m);pageViewEvents.partitionBy(m - m.memberId).window(Windows.keyedTumblingWindow (m - m.memberId, Duration.ofMinutes(5), initialValue, (m, c) - c + 1).map(MyStreamOutput:new).sendTo(pageViewEventPerMemberStre
8、am);運(yùn)行可視化工具實(shí)例可視化鏈接Samza Operatorsfilterselect a subset of messages from the streammapmap one input message to an output messageflatMapmap one input message to 0 or more output messagesmergeunion all inputs into a single output streampartitionByre-partition the input messages based on a specific fiel
9、dsendTosend the result to an output streamsinksend the result to an external system (e.g. external DB)windowwindow aggregation on the input streamjoinjoin messages from two input streamsstateless functionsI/O functionsstateful functionsSamza 數(shù)據(jù)處理秘籍之二可擴(kuò)展的數(shù)據(jù)存取scalable data access范例分析在上面的PageViewEvent統(tǒng)
10、計(jì)實(shí)例中,用戶需要1.保存統(tǒng)計(jì)的中間結(jié)果以便故障恢復(fù)2.讀取遠(yuǎn)程的用戶數(shù)據(jù)信息本地?cái)?shù)據(jù)存取Samza提供基千內(nèi)存或RocksDb的Key-Value Store 用千高 速本地?cái)?shù)據(jù)存取Samza ProcessorState StoreKafka Change LogSamza ProcessorAdjunct Data StoreChange CaptureDatabase本地狀態(tài)本地?cái)?shù)據(jù)1.1MTPSOn a single machine100 xFaster1.2TBState60 xFasterthan bootstrapInput StreamOutput StreamInput S
11、treamOutput Stream遠(yuǎn)程數(shù)據(jù)存取Samza支持Native異步數(shù)據(jù)處理Samza提供multi-threaded同步數(shù)據(jù)處理Samza ProcessorEvent LoopSingle threadNon-blockingTask max concurrencyRestful ServicesJava NIO, Netty100 xParallelismSamza 數(shù)據(jù)處理秘籍之三統(tǒng)的流處理和批處理Unified Stream & Batch Processing范例分析Kafka提供實(shí)時(shí)的PageViewEvent,同樣的數(shù)據(jù)也被存儲(chǔ)到了Hadoop HDFS。用戶需要隨時(shí)
12、選擇不 同的數(shù)據(jù)源進(jìn)行處理,那需要兩套完全不同的處 理流程嗎?統(tǒng)的流處理和批處理streams.pageViewEventStream.system=kaEa streams.pageViewEventS=PageViewEvent systems.kaEa.samza.factory=org.apache.samza.system.kaEa.KaEaSystemFactory systems.kaEa.consumer.zookeeper.connect=localhost:2181/ systems.kaEducer.bootstrap.servers=localhost:9092Samz
13、a應(yīng)用不需要任何程序流程的改變,只需要 在Configuration里修改數(shù)據(jù)源。Kafka 數(shù)據(jù)源HDFS 數(shù)據(jù)源streams.pageViewEventStream.system=hdfs streams.pageViewEventS=hdfs:/mydbsnapshot/PageViewEvent/ systems.hdfs.samza.factory=org.apache.samza.system.hdfs.HdfsSystemFactory性能比較統(tǒng)計(jì)各國用戶人數(shù):通過統(tǒng)計(jì),得到用戶人數(shù)最多的N個(gè)國家Test data: member profile (242 GB, 487 fi
14、les, around 450 million records)Samza 數(shù)據(jù)處理秘籍之四靈活的部署方式Flexible deployment models范例分析在實(shí)際中,用戶運(yùn)行數(shù)據(jù)處理的機(jī)群是多種多樣 的,怎么才 夠在不同的機(jī)群結(jié)構(gòu)里運(yùn)行同樣的 Samza應(yīng)用?Samza-as-a-library 部署Stream ApplicationStream Processormain() runnLocal Application start RunnerZookeeperStreamProcessorContainerSamzaJobCoordinatorLeaderStreamProce
15、ssorSamzaJobContainerCoordinatorStreamProcessorSamzaJobContainerCoordinatorSamza集成在用戶程序當(dāng)中, 用戶完全掌握自己的流處理, 如應(yīng) 用的生態(tài)周期和資源的分配和管理Samza inaCluster 部署NMNMNMNMNMNMRMRMYARNResource ManagersNodes in the YARNclusterYARN processes (RM/NM)Samza ContainersStream ApplicationJob Runner run-app.sh runRemote ApplicationRunner startsubmitSa
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 寫土方合同范例
- 保潔工用工合同范例
- 買房購房合同范例
- 2025年二上語文期末試題及答案
- 大氣旅游景點(diǎn)
- 會(huì)員購車服務(wù)合同范例
- 2025年雙端面磨床合作協(xié)議書
- pet材料購銷合同范本
- 代言解約合同范本
- 保障性苗圃合同范例
- 2023年安徽審計(jì)職業(yè)學(xué)院?jiǎn)握新殬I(yè)適應(yīng)性測(cè)試題庫及答案解析
- LS/T 3311-2017花生醬
- 蘇教版二年級(jí)科學(xué)下冊(cè)第10課《認(rèn)識(shí)工具》教案(定稿)
- GB/T 40262-2021金屬鍍膜織物金屬層結(jié)合力的測(cè)定膠帶法
- GB/T 3279-2009彈簧鋼熱軋鋼板
- GB/T 16823.3-2010緊固件扭矩-夾緊力試驗(yàn)
- 應(yīng)用文寫作-第四章公務(wù)文書(請(qǐng)示報(bào)告)課件
- Premiere-視頻剪輯操作-課件
- PDCA降低I類切口感染發(fā)生率
- 麻醉藥理學(xué)阿片類鎮(zhèn)痛藥PPT
- 新湘版小學(xué)科學(xué)四年級(jí)下冊(cè)教案(全冊(cè))
評(píng)論
0/150
提交評(píng)論