動態(tài)分享網站的構建_第1頁
動態(tài)分享網站的構建_第2頁
動態(tài)分享網站的構建_第3頁
動態(tài)分享網站的構建_第4頁
動態(tài)分享網站的構建_第5頁
已閱讀5頁,還剩60頁未讀 繼續(xù)免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

1、動態(tài)分享網站的構建Building a dynamic and responsive PinterestPinterest Products (5 mins)The Evolution of Pinterest Architecture (10 mins)Real-time Data Replication for RocksDB (10 mins)Automated Cluster Management and Recovery for Stateful Services (10 mins)Unified ML Serving Platform (10 mins)Pinterest Miss

2、ionOur mission at Pinterest is tohelp people discover and do what they loveBoardsTopicsFollowing FeedHome FeedRelated PinsSome Pinterest Numbers 1500 employees 250M MAU50% of MAU is from USA 175 Billion pinsSome Pinterest Numbers 3 Billion boards81% Pinterest users are females50% of Pinterest users

3、earn $50K yearly87% of Pinterest users have purchased a product because of PinterestPinterest Products (4 mins)The Evolution of Pinterest Architecture (10 mins)Real-time Data Replication for RocksDB (10 mins)Automated Cluster Management and Recovery for Stateful Services (10 mins)Unified ML Serving

4、Platform (10 mins)Pinterest in 2015The majority of content on Pinterest was pre- generated for users prior to login. It was stored statically in HBase and served directly upon entering the service.Example architecture (Following Feed 2015)Weak Points in 2015 ArchitectureHard to tweak or experiment w

5、ith new ideas/models on different components in the systemFeatures used to rank contents could be weeks old (could not leverage latest pin/user/board data, not to mention real-time user actions)Unnecessarily large HBase storage consumptions (users who never return, a large number of concurrent exper

6、iments)We did want to go fully online!Requirements to Go Fully OnlineRelationship data (following graph, owner to board, board to pin and topic to pin mappings) need to be stored in a way suitable for real-time update and low latency query (multiple request trips, big fanout)Filtering, light weight

7、scoring need to happen close to storage at the retrieval stageLow latency ML serving platformTechnical Decisions to solve the problemsAdopted C+, FBThrift and FollyLower long tail latency (big fanout request)Finer control of system performance (some services are CPU extensive)Shared libraries (a sin

8、gle repo across the company)Technical Decisions to solve the problemsBuilt distributed stateful services from scratchCustomized data model and indexesComplicated filtering/light weight scoring close to storageFull control of every component in the systems (operate it at scale with confidence, easy t

9、o adapt to new feature requests)Technical Decisions to solve the problemsAdopted RocksDB as storage engineWidely usedOptimized for server loadRocksplicatorOpen sourced in 2016 and keep improving and adding new features/pinterest/rocksplicatorMajor Problems Solved by RocksplicatorReal-time Data repli

10、cation for RocksDBAutomated Cluster Management and Recovery for Stateful ServiceResilient request routerMany other small libraries and tools for productionizing C+ serviceCommon Architecture of Rocksplicator Powered SystemsCreate/Open DBAdd/Remove DB for replicationData Replicationlocal updatesremot

11、e updatesRocksDB ReplicatorAdmin tool/systemCluster managementGetDB()Read/WriteApplication APIAdmin APIRocks DBRocks DBRocks DBRocks DBApplication LogicAdmin LogicExample architecture (Following Feed 2018)Pinterest Products (5 mins)The Evolution of Pinterest Architecture (10 mins)Real-time Data Repl

12、ication for RocksDB (10 mins)Automated Cluster Management and Recovery for Stateful Services (10 mins)Unified ML Serving Platform (10 mins)RocksDBAn embedded storage engine libraryLack of replication supportReplication Design DecisionsMaster-Slave replicationReplicating multiple RocksDB instances in

13、 one processMaster/Slave role is assigned at RocksDB instance levelLow replication latencyReplication ImplementationUse RocksDB WAL sequence # as global replication sequence #FBThrift for RPCSlave Side WorkflowLatest SEQ #Thrift ServerWorker threadsDB1 MasterDB2 Slave Upstream: ip_PortSlave Side Wor

14、kflowGet update since SEQ# for DB2Latest SEQ #Thrift ServerWorker threadsDB1 MasterDB2 Slave Upstream: ip_PortSlave Side WorkflowGet update since SEQ# for DB2Updates since SEQ# for DB2Latest SEQ #Thrift ServerWorker threadsDB1 MasterDB2 Slave Upstream: ip_PortSlave Side WorkflowApply updatesGet upda

15、te since SEQ# for DB2Updates since SEQ# for DB2Latest SEQ #Thrift ServerWorker threadsDB1 MasterDB2 Slave Upstream: ip_PortReplication ImplementationA combination of pull & push based replicationMaster Side WorkflowGet updatessince SEQ# for DB1Thrift ServerWorker threadsDB1 MasterDB2 Slave Upstream:

16、 ip_PortMaster Side WorkflowThrift ServerWorker threadsGet updatessince SEQ# for DB1Send requestDB1 MasterDB2 Slave Upstream: ip_PortMaster Side WorkflowThrift ServerWorker threadsGet updatessince SEQ# for DB1Send requestHas updates since SEQ#?DB1 MasterDB2 Slave Upstream: ip_PortMaster Side Workflo

17、wGet updatessince SEQ# for DB1Thrift ServerWorker threadsSend requestYes, this is the dataHas updates since SEQ#?DB1 MasterDB2 Slave Upstream: ip_PortMaster Side WorkflowGet updatessince SEQ# for DB1Thrift ServerWorker threadsSend requestResponseYes, this is the dataHas updates since SEQ#?DB1 Master

18、DB2 Slave Upstream: ip_PortMaster Side WorkflowResponseGet updatessince SEQ# for DB1Thrift ServerWorker threadsSend requestResponseYes, this is the dataHas updates since SEQ#?DB1 MasterDB2 Slave Upstream: ip_PortMaster Side WorkflowNo, wait for my notificationGet updatessince SEQ# for DB1Thrift Serv

19、erWorker threadsSend requestHas updates since SEQ#?DB1 MasterDB2 Slave Upstream: ip_PortMaster Side WorkflowGet updatessince SEQ# for DB1Thrift ServerWorker threadsSend requestWritesNo, wait for my notificationHas updates since SEQ#?DB1 MasterDB2 Slave Upstream: ip_PortMaster Side WorkflowGet update

20、ssince SEQ# for DB1Thrift ServerWorker threadsSend requestHas updates since SEQ#?These are the new updates No, wait for my notificationWritesDB1 MasterDB2 Slave Upstream: ip_PortMaster Side WorkflowGet updatessince SEQ# for DB1Thrift ServerWorker threadsSend requestHas updates since SEQ#?These are t

21、he new updates No, wait for my notificationResponseDB1 MasterDB2 Slave Upstream: ip_PortWritesMaster Side WorkflowResponseGet updatessince SEQ# for DB1Thrift ServerWorker threadsSend requestHas updates since SEQ#?These are the new updates No, wait for my notificationResponseDB1 MasterDB2 Slave Upstr

22、eam: ip_PortWritesReplication Performance in Production 50MB/S per host (AWS i3.2x instance) SlaveHow to differentiate between rebalance and service restartHow to reliably setup replication graphMultiple replicas could do state transition concurrentlyMore details can be found on our blog post: /EyKM

23、oTLPinterest Products (5 mins)The Evolution of Pinterest Architecture (10 mins)Real-time Data Replication for RocksDB (10 mins)Automated Cluster Management and Recovery for Stateful Services (10 mins)Unified ML Serving Platform (10 mins)ML at PinterestUnified ML PlatformScorpion is the Pinterest ML Platform, a fully integrated solution for scoring pinsCommon Scoring and Logging Flo

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論