
This article is Part 3 of a series on Apache Druid Query Performance.
- Apache Druid Query Performance Bottlenecks: A Q&A Guide
- The Foundations of Apache Druid Performance Tuning: Data & Segments
- Apache Druid Advanced Data Modeling for Peak Performance
- Writing Performant Apache Druid Queries (You are here)
- Apache Druid Cluster Tuning & Resource Management
- Apache Druid Query Performance Bottlenecks: Series Summary
Best Practice: Whenever possible, structure your analytical questions to fit the Timeseries or TopN patterns to leverage their specialized, high-performance execution paths.
Answer: The single most important filtering strategy in Druid is to always filter on the __time column over the smallest possible interval. The second most impactful strategy is to use equality filters (= or IN) on low-to-medium cardinality dimensions.
- Time Filter is King: Druid is a time-series database first. Data is physically partitioned by time. A
WHERE __time BETWEEN...clause allows the Broker to perform “segment pruning,” immediately ignoring any segments outside the query’s time range. A query without a time filter is a worst-case scenario, forcing a full scan of the entire datasource. - Use Indexes with Equality Filters: Druid builds bitmap indexes for dimension columns. An equality filter like
WHERE country = 'US'can use these indexes to instantly identify matching rows without scanning the column’s values. - Avoid Performance Killers:
- Functions on Columns: Applying a function in the
WHEREclause (e.g.,WHERE UPPER(product) = 'BIKE') prevents index usage and forces a full scan and computation on every row. - Inequality and Leading Wildcards: Filters like
!=orLIKE '%value'are expensive as they cannot use indexes efficiently.
- Functions on Columns: Applying a function in the
Answer: EXPLAIN PLAN is a powerful Druid SQL command that shows you how Druid will execute your query. It translates your SQL into the native JSON query format and provides a query plan, allowing you to see exactly what operations will be performed.
To use it, simply prepend EXPLAIN PLAN FOR to your SQL query:
EXPLAIN PLAN FOR SELECT "page", COUNT(*) AS "Edits" FROM "wikipedia" WHERE "countryName" IS NOT NULL GROUP BY 1 ORDER BY 2 DESC LIMIT 10;
The output will be a JSON object containing two key pieces of information:
PLAN: A textual representation of the execution steps. This will show you if Druid is using aGroupBy,TopN, orTimeseriesengine, and what operations like filtering and sorting are being applied.RESOURCES: A list of the datasources the query will touch.QUERY: The full native JSON query that Druid generated. This is invaluable for deep debugging.
By inspecting the plan, you can verify if your query is being optimized as expected. For example, you can check if a GROUP BY... ORDER BY... LIMIT query is correctly being translated into a more performant TopN query.
Answer: Subqueries and UNION ALL are supported in Druid SQL, but they have performance implications that are important to understand.
- Subqueries: Druid often executes subqueries by materializing their full result set on the Broker before using them in the outer query (e.g., as the right-hand side of a
JOIN). If the subquery returns a large number of rows, this can create a memory bottleneck on the Broker. Druid has configuration limits (maxSubqueryRowsandmaxSubqueryBytes) to prevent out-of-memory errors. - Best Practice: Minimize the use of subqueries. If possible, pre-calculate the subquery’s result and ingest it as a separate lookup table. When a subquery is necessary, ensure it is as restrictive as possible, returning the minimum number of rows and columns needed.
UNION ALL: AUNION ALLquery is executed by running a separate query for each table in the union. The Broker then merges the results. This is generally more efficient than a subquery approach but can still be. If you find yourself frequently writingUNION ALLqueries over datasources with identical schemas, consider whether they should be ingested into a single datasource instead. If you must useUNION ALL, it can sometimes be faster to run the individual queries from your client application and perform the final merge there.
Optimizing this layer of your Druid cluster can be complex. For expert guidance and a professional review of your Druid architecture, consider reaching out to the specialists at Apache Druid professionals at Iunera.