This article is Part 4 of a series on Apache Druid Query Performance.
Once data modeling and query patterns are optimized, the final layer of tuning involves configuring the Druid cluster’s hardware and software resources. This is about ensuring your cluster has the right amount of memory, CPU, and concurrency settings to handle your specific workload efficiently.
Source Discussion: Based on detailed tuning guides and community discussions.
Answer: Correctly configuring threads and buffers is a balancing act to maximize CPU utilization without causing contention.
On Historical Nodes (The Workhorses):
druid.processing.numThreads: This is the most critical performance parameter on a data node. It defines the maximum number of segments that can be processed concurrently. The standard recommendation is (Number of CPU Cores) - 1. This aims to fully saturate the CPU while leaving one core for the OS and background tasks.druid.processing.buffer.sizeBytes: These are off-heap buffers for intermediate query results. A starting value of 512MiB to 1GiB is common. If a query’s intermediate results on a single segment exceed this, it will spill to disk, causing a massive performance drop.druid.processing.numMergeBuffers: A dedicated pool of buffers for GroupBy queries. A reasonable starting point is a quarter of druid.processing.numThreads. If the mergeBuffer/pendingRequests metric is consistently greater than zero, this value should be increased.On Broker Nodes (The Coordinators):
druid.server.http.numThreads, which should be sized based on expected client concurrency. They also require numMergeBuffers for the final merge step of GroupBy queries.Answer: Memory sizing is crucial for stability and performance. Druid uses both JVM heap and off-heap direct memory.
Historical Nodes:
-Xmx): The heap is used for partial query results and lookups. A good starting point is (0.5GiB * number of CPU cores). If you use lookups, you must add (2 * total size of all loaded lookups) to this value.-XX:MaxDirectMemorySize): This must be large enough to hold all processing and merge buffers. Calculate it with the formula: (druid.processing.numThreads + druid.processing.numMergeBuffers + 1) * druid.processing.buffer.sizeBytes.Broker Nodes:
-Xmx): The Broker heap scales with the number of segments in the cluster, as it holds segment timeline information. For small to medium clusters, 4-8GiB is a reasonable start. Very large clusters might need 30-60GiB.-XX:MaxDirectMemorySize): Brokers only need direct memory for merge buffers. The formula is (druid.processing.numMergeBuffers + 1) * druid.processing.buffer.sizeBytes.Source Discussion: Addresses the common “noisy neighbor” problem of resource contention.
Answer: To manage mixed workloads, you must implement workload management strategies. Druid provides two primary mechanisms: Query Laning for logical isolation and Service Tiering for physical isolation.
Answer: Druid supports two main types of query caching to improve performance for frequently accessed data. Caching is most effective for increasing concurrency, not for speeding up a single, slow query.
Configuration and Best Practices:
useCache and populateCache.TopN and Timeseries queries. For GroupBy queries, if the bottleneck is the final merge step on the Broker, caching provides less benefit.This post is the summary of a series on Apache Druid Query Performance. Apache Druid… Read More
This article is Part 3 of a series on Apache Druid Query Performance. Apache Druid… Read More
This article is Part 2 of a series on Apache Druid Query Performance. Apache Druid… Read More
This article is Part 1 of a series on Apache Druid Query Performance. Apache Druid… Read More
Unlock the full potential of your Apache Druid cluster. This series introduction explains why performance… Read More
This guide builds on Infrastructure Setup for Enterprise Apache Druid on Kubernetes – Building the… Read More