




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)
文檔簡介
數(shù)據(jù)分析工具:ApacheDruid:Druid集群部署與管理1數(shù)據(jù)分析工具:ApacheDruid:Druid集群部署與管理1.1ApacheDruid簡介ApacheDruid是一個開源的數(shù)據(jù)存儲和查詢系統(tǒng),專為實(shí)時分析大規(guī)模數(shù)據(jù)集而設(shè)計。它能夠處理PB級別的數(shù)據(jù),提供低延遲的數(shù)據(jù)查詢和聚合功能,適用于實(shí)時監(jiān)控、日志分析、商業(yè)智能等場景。Druid支持多種數(shù)據(jù)源,如CSV文件、數(shù)據(jù)庫、Hadoop等,并能夠?qū)崟r攝取數(shù)據(jù),無需批處理。1.1.1特點(diǎn)實(shí)時數(shù)據(jù)攝?。篋ruid能夠?qū)崟r處理數(shù)據(jù)流,無需等待批處理完成即可查詢數(shù)據(jù)。高性能查詢:通過預(yù)聚合和列式存儲,Druid能夠快速響應(yīng)查詢,即使在大規(guī)模數(shù)據(jù)集上也能實(shí)現(xiàn)亞秒級響應(yīng)??蓴U(kuò)展性:Druid集群可以水平擴(kuò)展,通過增加節(jié)點(diǎn)來處理更多數(shù)據(jù)和查詢。高可用性:Druid集群設(shè)計為高可用,能夠容忍節(jié)點(diǎn)故障,保證數(shù)據(jù)的完整性和查詢的連續(xù)性。1.2Druid的架構(gòu)與組件Druid的架構(gòu)由多個組件組成,每個組件負(fù)責(zé)不同的功能,共同協(xié)作以實(shí)現(xiàn)高效的數(shù)據(jù)處理和查詢。1.2.1組件Broker:負(fù)責(zé)接收查詢請求,優(yōu)化查詢計劃,并將查詢分發(fā)到Historical和Realtime節(jié)點(diǎn)。Historical:存儲歷史數(shù)據(jù),處理歷史數(shù)據(jù)的查詢。Realtime:攝取實(shí)時數(shù)據(jù)流,處理實(shí)時數(shù)據(jù)的查詢。Coordinator:管理數(shù)據(jù)段的加載和卸載,確保數(shù)據(jù)在Historical和Realtime節(jié)點(diǎn)之間正確分布。MiddleManager:負(fù)責(zé)數(shù)據(jù)段的下載和存儲,以及數(shù)據(jù)的預(yù)聚合。Overlord:管理實(shí)時數(shù)據(jù)攝取任務(wù),分配任務(wù)給Realtime節(jié)點(diǎn)。Indexer:用于批量加載數(shù)據(jù),可以是實(shí)時數(shù)據(jù)攝取的一部分,也可以獨(dú)立運(yùn)行。Segment:數(shù)據(jù)的最小存儲單位,包含預(yù)聚合的數(shù)據(jù)和元數(shù)據(jù)。1.2.2架構(gòu)圖graphTD
A[Broker]-->|Query|B[Historical]
A-->|Query|C[Realtime]
D[Coordinator]-->|Manage|B
D-->|Manage|C
E[Overlord]-->|Task|C
F[Indexer]-->|Load|B
G[MiddleManager]-->|Store|B
G-->|Store|C1.3Druid集群的工作原理Druid集群通過分布式架構(gòu)實(shí)現(xiàn)數(shù)據(jù)的存儲和查詢。數(shù)據(jù)被分割成多個段(Segment),每個段包含預(yù)聚合的數(shù)據(jù),以提高查詢性能。集群中的節(jié)點(diǎn)根據(jù)角色分工,共同處理數(shù)據(jù)和查詢。1.3.1數(shù)據(jù)攝取流程數(shù)據(jù)攝?。簲?shù)據(jù)通過Realtime節(jié)點(diǎn)或Indexer進(jìn)入集群。預(yù)聚合:MiddleManager對數(shù)據(jù)進(jìn)行預(yù)聚合,生成Segment。數(shù)據(jù)分發(fā):Coordinator管理數(shù)據(jù)段的分發(fā),確保數(shù)據(jù)在集群中均勻分布。數(shù)據(jù)存儲:Historical和Realtime節(jié)點(diǎn)存儲數(shù)據(jù)段,Historical存儲歷史數(shù)據(jù),Realtime存儲實(shí)時數(shù)據(jù)。1.3.2查詢處理流程查詢接收:Broker接收查詢請求,優(yōu)化查詢計劃。查詢分發(fā):Broker將查詢分發(fā)到Historical和Realtime節(jié)點(diǎn)。數(shù)據(jù)查詢:Historical和Realtime節(jié)點(diǎn)根據(jù)查詢請求處理數(shù)據(jù)。結(jié)果聚合:Broker收集各節(jié)點(diǎn)的查詢結(jié)果,進(jìn)行聚合,然后返回給客戶端。1.3.3示例:部署Druid集群以下是一個簡單的示例,展示如何使用Docker部署一個Druid集群。#下載Druid的Docker鏡像
dockerpulldruidio/druid:latest
#啟動Coordinator節(jié)點(diǎn)
dockerrun-d--namedruid-coordinator-p8081:8081druidio/druid:latestcoordinator
#啟動Historical節(jié)點(diǎn)
dockerrun-d--namedruid-historical-p8082:8082druidio/druid:latesthistorical
#啟動Realtime節(jié)點(diǎn)
dockerrun-d--namedruid-realtime-p8083:8083druidio/druid:latestrealtime
#啟動Broker節(jié)點(diǎn)
dockerrun-d--namedruid-broker-p8080:8080druidio/druid:latestbroker
#啟動MiddleManager節(jié)點(diǎn)
dockerrun-d--namedruid-middlemanager-p8091:8091druidio/druid:latestmiddlemanager
#啟動Overlord節(jié)點(diǎn)
dockerrun-d--namedruid-overlord-p8090:8090druidio/druid:latestoverlord
#啟動Zookeeper(Druid集群需要Zookeeper進(jìn)行協(xié)調(diào))
dockerrun-d--namezookeeper-p2181:2181zookeeper:latest1.3.4示例:數(shù)據(jù)攝取假設(shè)我們有一個CSV文件,包含用戶活動數(shù)據(jù),我們可以使用Druid的Indexer來攝取這些數(shù)據(jù)。{
"type":"index",
"spec":{
"dataSchema":{
"dataSource":"user_activity",
"parser":{
"type":"string",
"parseSpec":{
"format":"csv",
"timestampSpec":{
"column":"timestamp",
"format":"yyyy-MM-dd'T'HH:mm:ss.SSSZ"
},
"dimensionsSpec":{
"dimensions":["user_id","activity_type"],
"dimensionExclusions":[]
},
"columns":["timestamp","user_id","activity_type","duration"],
"skipHeaderRecord":true
}
},
"metricsSpec":[
{
"type":"count",
"name":"count"
},
{
"type":"doubleSum",
"name":"total_duration",
"fieldName":"duration"
}
],
"granularitySpec":{
"type":"uniform",
"segmentGranularity":"HOUR",
"queryGranularity":"MINUTE",
"rollup":true
}
},
"ioConfig":{
"type":"index",
"firehose":{
"type":"local",
"baseDir":"/data",
"filter":"user_activity.csv"
},
"appendToExisting":false
},
"tuningConfig":{
"type":"index",
"maxRowsInMemory":100000,
"maxRowsPerSegment":5000000,
"maxRowsInRollup":1000000
}
}
}將上述JSON配置文件保存為user_activity_index.json,然后使用以下命令啟動數(shù)據(jù)攝取任務(wù):curl-XPOST-H'Content-Type:application/json'--data-binary@user_activity_index.jsonhttp://druid-overlord:8090/druid/indexer/v1/task1.3.5示例:查詢數(shù)據(jù)查詢數(shù)據(jù)時,我們可以通過Broker節(jié)點(diǎn)發(fā)送查詢請求。以下是一個查詢示例,展示如何查詢用戶活動數(shù)據(jù)的總時長。{
"queryType":"timeseries",
"dataSource":"user_activity",
"granularity":"MINUTE",
"intervals":["2023-01-01T00:00:00.000Z/2023-01-02T00:00:00.000Z"],
"aggregations":[
{
"type":"doubleSum",
"name":"total_duration",
"fieldName":"duration"
}
],
"postAggregations":[
{
"type":"arithmetic",
"name":"avg_duration",
"fn":"/",
"fields":[
{
"type":"fieldAccess",
"name":"total_duration"
},
{
"type":"fieldAccess",
"name":"count"
}
]
}
],
"context":{
"timeout":"10s"
}
}將上述JSON配置文件保存為user_activity_query.json,然后使用以下命令發(fā)送查詢請求:curl-XPOST-H'Content-Type:application/json'--data-binary@user_activity_query.jsonhttp://druid-broker:8080/druid/v2通過上述示例,我們可以看到ApacheDruid如何通過其獨(dú)特的架構(gòu)和組件,實(shí)現(xiàn)大規(guī)模數(shù)據(jù)的實(shí)時攝取和高效查詢。在實(shí)際應(yīng)用中,Druid的配置和使用會更加復(fù)雜,但其核心原理和流程與上述示例相似。2部署ApacheDruid集群2.1環(huán)境準(zhǔn)備與要求在開始部署ApacheDruid集群之前,確保滿足以下環(huán)境要求:操作系統(tǒng):Druid支持在Linux和MacOS上運(yùn)行,推薦使用Linux系統(tǒng)以獲得最佳性能。JDK版本:需要安裝JDK1.8或更高版本。內(nèi)存:每個節(jié)點(diǎn)至少需要8GB的內(nèi)存,對于高性能查詢,建議配置更多內(nèi)存。磁盤空間:至少需要50GB的磁盤空間用于數(shù)據(jù)存儲和日志文件。網(wǎng)絡(luò):所有節(jié)點(diǎn)之間需要有良好的網(wǎng)絡(luò)連接,以確保數(shù)據(jù)同步和查詢響應(yīng)速度。2.2下載與安裝Druid2.2.1下載Druid訪問ApacheDruid的官方網(wǎng)站,下載最新版本的Druid。以druid-0.18.0為例,下載鏈接如下:wget/druid/0.18.0/apache-druid-0.18.0.tar.gz2.2.2解壓與安裝解壓下載的tar包,并將解壓后的目錄移動到一個合適的位置,例如/opt目錄下。tar-xzfapache-druid-0.18.0.tar.gz-C/opt/
cd/opt/
mvapache-druid-0.18.0druid2.2.3啟動DruidDruid集群由多個角色組成,包括Overlord、Coordinator、Historical、MiddleManager、Broker和Realtime。每個角色都需要在不同的機(jī)器上啟動,或者在單機(jī)環(huán)境下模擬集群。啟動Coordinatorcd/opt/druid/
./bin/start-coordinator.sh啟動Overlord./bin/start-overlord.sh啟動Historical./bin/start-historical.sh啟動MiddleManager./bin/start-middlemanager.sh啟動Broker./bin/start-broker.sh啟動Realtime./bin/start-realtime.sh2.3配置Druid集群Druid集群的配置主要涉及以下幾個方面:集群配置:定義集群的拓?fù)浣Y(jié)構(gòu),包括各個角色的機(jī)器列表和端口。數(shù)據(jù)源配置:定義數(shù)據(jù)源的屬性,如數(shù)據(jù)格式、存儲策略等。查詢配置:定義查詢的性能參數(shù),如緩存大小、查詢超時時間等。2.3.1配置集群編輯/opt/druid/conf/druid/_common/perties文件,添加集群配置信息。druid.zk.service.host=00
druid.zk.service.port=2181
druid.zk.service.path=/druid2.3.2配置數(shù)據(jù)源在/opt/druid/conf/druid/coordinator目錄下,創(chuàng)建一個數(shù)據(jù)源的配置文件,例如exampleDataSperties。druid.indexer.task.timeout=PT1H
druid.indexer.task.maxVirtSize=100GB
druid.indexer.task.maxRunning=10
druid.indexer.task.maxPending=102.3.3配置查詢在/opt/druid/conf/druid/broker目錄下,編輯perties文件,配置查詢參數(shù)。druid.broker.maxCacheSizeBytes=10GB
druid.broker.cache.size=10GB
druid.broker.cache.ttl=PT1H2.3.4啟動服務(wù)重新啟動所有Druid服務(wù),以使配置生效。./bin/restart.sh2.3.5監(jiān)控與管理Druid提供了Web界面進(jìn)行監(jiān)控和管理,訪問http://<Coordinator_IP>:8080/druid/indexer/v1/task可以查看和管理任務(wù)狀態(tài)。2.4示例:數(shù)據(jù)導(dǎo)入假設(shè)我們有一個CSV文件example_data.csv,內(nèi)容如下:timestamp,metric,host
2022-01-01T00:00:00.000Z,10.5,server1
2022-01-01T00:01:00.000Z,11.3,server2
2022-01-01T00:02:00.000Z,12.1,server1我們可以使用Druid的index任務(wù)來導(dǎo)入數(shù)據(jù)。創(chuàng)建一個JSON格式的任務(wù)配置文件exampleTask.json:{
"type":"index",
"spec":{
"dataSchema":{
"dataSource":"exampleDataSource",
"parser":{
"type":"string",
"parseSpec":{
"format":"csv",
"timestampSpec":{
"column":"timestamp",
"format":"iso"
},
"dimensionsSpec":{
"dimensions":["host"],
"dimensionExclusions":[]
},
"columns":["timestamp","metric","host"],
"skipHeaderRecord":true
}
},
"metricsSpec":[
{
"type":"doubleSum",
"name":"metric",
"fieldName":"metric"
}
],
"granularitySpec":{
"type":"uniform",
"segmentGranularity":"HOUR",
"queryGranularity":"MINUTE",
"rollup":true
}
},
"ioConfig":{
"type":"index",
"firehose":{
"type":"local",
"baseDir":"/path/to/data",
"filter":"example_data.csv"
},
"appendToExisting":false
},
"tuningConfig":{
"type":"index",
"maxRowsInMemory":100000,
"indexSpec":{
"bitmap":{
"type":"roaring"
}
}
}
}
}使用curl命令提交任務(wù):curl-XPOST-H'Content-Type:application/json'--data-binary@exampleTask.jsonhttp://<Coordinator_IP>:8081/druid/indexer/v1/task通過訪問http://<Coordinator_IP>:8080/druid/indexer/v1/task可以查看任務(wù)執(zhí)行狀態(tài)。2.5結(jié)論通過上述步驟,您可以成功部署和配置一個ApacheDruid集群,用于高效的數(shù)據(jù)分析和查詢。確保所有配置正確無誤,并根據(jù)實(shí)際需求調(diào)整參數(shù),以獲得最佳性能。3Druid集群組件詳解3.1Broker節(jié)點(diǎn)配置3.1.1原理Broker節(jié)點(diǎn)在ApacheDruid集群中主要負(fù)責(zé)處理客戶端的查詢請求。它不存儲數(shù)據(jù),而是從Historical節(jié)點(diǎn)獲取數(shù)據(jù)并進(jìn)行聚合計算,然后將結(jié)果返回給客戶端。Broker節(jié)點(diǎn)的配置優(yōu)化直接影響到查詢性能和效率。3.1.2內(nèi)容Broker節(jié)點(diǎn)的配置主要在druid-broker.conf文件中進(jìn)行。以下是一個示例配置:druid.broker.http.port=8082
druid.broker.queryCache.percent=0.1
druid.broker.maxConcurrentQueries=10
druid.broker.maxPendingConcurrentQueries=20
druid.broker.maxCacheSizeBytes=1073741824
druid.broker.cache.sizeBytes=1073741824
druid.broker.cache.type=onheap
druid.broker.cache.ttl=PT1H
druid.broker.cache.query.enabled=true
druid.broker.cache.query.maxSizeBytes=1073741824
druid.broker.cache.query.ttl=PT1H
druid.broker.cache.query.type=onheap
druid.broker.cache.query.enabled=true
druid.broker.cache.query.maxSizeBytes=1073741824
druid.broker.cache.query.ttl=PT1H
druid.broker.cache.query.type=onheap
druid.broker.cache.query.enabled=true
druid.broker.cache.query.maxSizeBytes=1073741824
druid.broker.cache.query.ttl=PT1H
druid.broker.cache.query.type=onheap
druid.broker.cache.query.enabled=true
druid.broker.cache.query.maxSizeBytes=1073741824
druid.broker.cache.query.ttl=PT1H
druid.broker.cache.query.type=onheap
druid.broker.cache.query.enabled=true
druid.broker.cache.query.maxSizeBytes=1073741824
druid.broker.cache.query.ttl=PT1H
druid.broker.cache.query.type=onheap
druid.broker.cache.query.enabled=true
druid.broker.cache.query.maxSizeBytes=1073741824
druid.broker.cache.query.ttl=PT1H
druid.broker.cache.query.type=onheap
druid.broker.cache.query.enabled=true
druid.broker.cache.query.maxSizeBytes=1073741824
druid.broker.cache.query.ttl=PT1H
druid.broker.cache.query.type=onheapdruid.broker.http.port:Broker節(jié)點(diǎn)的HTTP端口,用于接收查詢請求。druid.broker.queryCache.percent:用于緩存查詢結(jié)果的內(nèi)存比例。druid.broker.maxConcurrentQueries:Broker節(jié)點(diǎn)同時處理的最大查詢數(shù)量。druid.broker.maxPendingConcurrentQueries:Broker節(jié)點(diǎn)等待處理的最大查詢數(shù)量。druid.broker.maxCacheSizeBytes:Broker節(jié)點(diǎn)緩存的最大大小。druid.broker.cache.*:緩存配置,包括類型、大小和TTL。3.2Historical節(jié)點(diǎn)配置3.2.1原理Historical節(jié)點(diǎn)是ApacheDruid集群中的數(shù)據(jù)存儲節(jié)點(diǎn)。它負(fù)責(zé)存儲和維護(hù)數(shù)據(jù)段,是Broker節(jié)點(diǎn)查詢數(shù)據(jù)的來源。Historical節(jié)點(diǎn)的配置直接影響數(shù)據(jù)的存儲效率和查詢性能。3.2.2內(nèi)容Historical節(jié)點(diǎn)的配置主要在druid-historical.conf文件中進(jìn)行。以下是一個示例配置:druid.historical.http.port=8083
druid.historical.segmentCache.size=10737418240
druid.historical.segmentCache.ttl=PT24H
druid.historical.segmentCache.type=onheap
druid.historical.segmentCpress=true
druid.historical.segmentCpressType=gzip
druid.historical.segmentCpressThresholdBytes=10485760
druid.historical.segmentCpressBlockSizeBytes=1048576
druid.historical.segmentCpressLevel=6
druid.historical.segmentCpressEnabled=true
druid.historical.segmentCpressType=gzip
druid.historical.segmentCpressThresholdBytes=10485760
druid.historical.segmentCpressBlockSizeBytes=1048576
druid.historical.segmentCpressLevel=6druid.historical.http.port:Historical節(jié)點(diǎn)的HTTP端口。druid.historical.segmentCache.*:數(shù)據(jù)段緩存配置,包括大小、TTL、類型和壓縮設(shè)置。3.3MiddleManager節(jié)點(diǎn)配置3.3.1原理MiddleManager節(jié)點(diǎn)負(fù)責(zé)接收Overlord節(jié)點(diǎn)的指令,進(jìn)行數(shù)據(jù)段的加載、卸載和存儲。它在數(shù)據(jù)攝取和存儲過程中扮演重要角色。3.3.2內(nèi)容MiddleManager節(jié)點(diǎn)的配置主要在druid-middlemanager.conf文件中進(jìn)行。以下是一個示例配置:druid.middleManager.http.port=8091
druid.storage.type=local
druid.storage.basePath=/var/druid/data
druid.storage.extension=druid
druid.storage.maxSize=107374182400
druid.storage.minSize=10737418240
druid.storage.maxSegmentsToLoad=1000
druid.storage.maxSegmentsToUnload=100
druid.storage.maxSegmentsToLoadPerHour=100
druid.storage.maxSegmentsToUnloadPerHour=10
druid.storage.maxSegmentsToLoadPerInterval=10
druid.storage.maxSegmentsToUnloadPerInterval=1
druid.storage.maxSegmentsToLoadPerIntervalPerDataSource=1
druid.storage.maxSegmentsToUnloadPerIntervalPerDataSource=1druid.middleManager.http.port:MiddleManager節(jié)點(diǎn)的HTTP端口。druid.storage.*:存儲配置,包括類型、路徑、最大和最小大小、最大加載和卸載的數(shù)據(jù)段數(shù)量。3.4Coordinator節(jié)點(diǎn)配置3.4.1原理Coordinator節(jié)點(diǎn)是ApacheDruid集群的管理節(jié)點(diǎn),負(fù)責(zé)協(xié)調(diào)數(shù)據(jù)段在Historical節(jié)點(diǎn)之間的分配,確保數(shù)據(jù)的均衡分布和高可用性。3.4.2內(nèi)容Coordinator節(jié)點(diǎn)的配置主要在druid-coordinator.conf文件中進(jìn)行。以下是一個示例配置:druid.coordinator.http.port=8081
druid.coordinator.loadQueue.capacity=100
druid.coordinator.loadQueue.capacityPerDataSource=10
druid.coordinator.loadQueue.capacityPerInterval=1
druid.coordinator.loadQueue.capacityPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTier=10
druid.coordinator.loadQueue.capacityPerTierPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerInterval=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid.coordinator.loadQueue.capacityPerTierPerIntervalPerDataSource=1
druid
#數(shù)據(jù)分析工具:ApacheDruid:數(shù)據(jù)攝入與查詢
##數(shù)據(jù)攝入流程
在ApacheDruid中,數(shù)據(jù)攝入是指將數(shù)據(jù)從各種來源加載到Druid集群的過程。這一過程可以是實(shí)時的,也可以是離線的,取決于數(shù)據(jù)的性質(zhì)和需求。數(shù)據(jù)攝入流程主要包括以下步驟:
1.**數(shù)據(jù)準(zhǔn)備**:數(shù)據(jù)需要被格式化為Druid可以理解的格式,通常是JSON或CSV。
2.**數(shù)據(jù)攝入**:通過Druid的攝入工具,如`indexer`或`real-time`任務(wù),將數(shù)據(jù)加載到集群中。
3.**數(shù)據(jù)分片**:數(shù)據(jù)被分割成多個段,每個段可以獨(dú)立查詢,以提高查詢性能。
4.**數(shù)據(jù)復(fù)制**:為了提高可用性和容錯性,數(shù)據(jù)段會被復(fù)制到集群中的多個節(jié)點(diǎn)上。
##實(shí)時數(shù)據(jù)攝入配置
實(shí)時數(shù)據(jù)攝入允許Druid處理流式數(shù)據(jù),如來自Kafka的消息。下面是一個實(shí)時攝入任務(wù)的配置示例:
```json
{
"type":"realtime",
"spec":{
"dataSchema":{
"dataSource":"exampleDataSource",
"parser":{
"type":"string",
"parseSpec":{
"format":"json",
"timestampSpec":{
"column":"timestamp",
"format":"auto"
},
"dimensionsSpec":{
"dimensions":["dim1","dim2"],
"dimensionExclusions":[]
},
"metricsSpec":[
{
"type":"count",
"name":"count"
},
{
"type":"doubleSum",
"name":"metric1",
"fieldName":"value1"
}
],
"granularitySpec":{
"segmentGranularity":"HOUR",
"queryGranularity":"MINUTE",
"rollup":true
}
}
},
"ioConfig":{
"firehose":{
"type":"kafka",
"kafkaBrokers":"localhost:9092",
"topic":"exampleTopic"
},
"appendToExisting":false
},
"tuningConfig":{
"type":"druid_realtime_default_tuning",
"maxRowsInMemory":100000,
"maxRowsPerSegment":5000000,
"maxRowsInPendingSegment":1000000
}
},
"ioConfig":{
"type":"index",
"firehose":{
"type":"kafka",
"kafkaBrokers":"localhost:9092",
"topic":"exampleTopic"
},
"indexSpec":{
"type":"default",
"bitmapVersion":"v1",
"dimensionsSpec":{
"dimensions":["dim1","dim2"],
"spatialDimensions":[]
},
"metricsSpec":[
{
"type":"count",
"name":"count"
},
{
"type":"doubleSum",
"name":"metric1",
"fieldName":"value1"
}
],
"rollup":true
}
},
"tuningConfig":{
"type":"druid_realtime_default_tuning",
"maxRowsInMemory":100000,
"maxRowsPerSegment":5000000,
"maxRowsInPendingSegment":1000000
}
}
}3.4.3解釋dataSource:指定數(shù)據(jù)源的名稱。parser:定義數(shù)據(jù)的解析方式,包括時間戳和維度字段。metricsSpec:定義要計算的度量,如計數(shù)和數(shù)值求和。granularitySpec:定義數(shù)據(jù)的粒度,如小時或分鐘。firehose:配置數(shù)據(jù)攝入的來源,這里是Kafka。tuningConfig:調(diào)整實(shí)時攝入的性能參數(shù)。3.5離線數(shù)據(jù)攝入配置離線數(shù)據(jù)攝入適用于批量數(shù)據(jù)處理,如從HDFS或S3加載歷史數(shù)據(jù)。以下是一個離線攝入任務(wù)的配置示例:{
"type":"index",
"spec":{
"dataSchema":{
"dataSource":"exampleDataSource",
"parser":{
"type":"string",
"parseSpec":{
"format":"json",
"timestampSpec":{
"column":"timestamp",
"format":"auto"
},
"dimensionsSpec":{
"dimensions":["dim1","dim2"],
"dimensionExclusions":[]
},
"metricsSpec":[
{
"type":"count",
"name":"count"
},
{
"type":"doubleSum",
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 關(guān)于咨詢費(fèi)合同范例
- 促銷服務(wù)合同范例
- 吊裝項目風(fēng)險管理合同
- 2025年度環(huán)保材料代工合同與委托加工服務(wù)模板
- 2025年度無證房屋轉(zhuǎn)讓與物業(yè)管理合同
- 2025年度貨車司機(jī)勞動合同及運(yùn)輸風(fēng)險承擔(dān)
- 二零二五年度堰塘文化遺產(chǎn)保護(hù)與開發(fā)合同
- 二零二五年度出國勞務(wù)人員招聘與海外生活服務(wù)合同
- 2025年度舊房改造項目綠色施工承包合同
- 二零二五年度海洋經(jīng)濟(jì)就業(yè)勞動合同
- 《計算機(jī)視覺-基于OpenCV的圖像處理》全套教學(xué)課件
- 丹麥牛角包制作
- 胰腺假性囊腫護(hù)理查房
- QBT 3823-1999 輕工產(chǎn)品金屬鍍層的孔隙率測試方法
- 2024年時事政治熱點(diǎn)題庫200道完整版
- 服務(wù)項目驗(yàn)收單
- 剪叉式高空作業(yè)平臺檢查驗(yàn)收表
- 中職思想政治高教版(2023)心理健康與職業(yè)生涯第2課
- 2024年蘇州市職業(yè)大學(xué)單招職業(yè)技能測試題庫及答案解析
- 銷售部廉政培訓(xùn)課件
- 幽門螺旋桿菌科普文
評論
0/150
提交評論