This article is part of a series on deploying a production‑ready Apache Druid cluster on Kubernetes. In Part 3, we cover the complete installation and configuration of a production‑grade cluster, building on the foundation and preparation covered previously. See Part 1 and Part 2.

Introduction
After laying the infrastructure groundwork and preparing your deployment environment, it’s time for the main event: deploying your production-ready Apache Druid cluster on Kubernetes. This isn’t just about applying a few YAML files—we’re talking about configuring a sophisticated, distributed analytics engine that can handle enterprise-scale workloads.
Modern time series applications demand more than basic Druid setups. They require carefully tuned configurations, proper resource allocation, and enterprise-grade security implementations. The traditional approach of manually managing MiddleManager components is giving way to Kubernetes-native architectures that leverage the platform’s inherent capabilities.
This guide walks you through the complete installation process, from understanding the DruidCluster custom resource to implementing MM-less deployments and Zookeeper-less service discovery. We’ll cover everything from component-specific configurations to production deployment patterns, ensuring your Druid installation is ready for real-world enterprise demands.
Table of contents
- Introduction
- Understanding the DruidCluster Custom Resource
- Common Configuration Settings
- Coordinator Component Configuration
- Overlord Component Configuration
- Broker Component Configuration
- Historical Component
- Kubernetes-Native Task Execution
- Production Deployment Best Practices
- Integration with GitOps Workflow
- Deployment Patterns Comparison
- Frequently Asked Questions
- Conclusion
Understanding the DruidCluster Custom Resource
The heart of your Druid installation lies within the DruidCluster custom resource, which acts as the declarative blueprint for your entire cluster. This isn’t your typical Kubernetes deployment—it’s a sophisticated orchestration mechanism that manages multiple interconnected services with complex dependencies.
Let’s examine the foundational configuration that brings together all the infrastructure components we prepared in earlier parts:
apiVersion: "druid.apache.org/v1alpha1" kind: "Druid" metadata: name: iuneradruid namespace: druid spec: commonConfigMountPath: /opt/druid/conf/druid/cluster/_common rollingDeploy: false startScript: /druid.sh image: iunera/druid:29.0.1 imagePullPolicy: Always podAnnotations: reason: this-cluster-is-k8s-discovery podLabels: app.kubernetes.io/networkpolicy-group: druid securityContext: fsGroup: 1000 runAsUser: 1000 runAsGroup: 1000
This configuration establishes several critical principles. The rollingDeploy: false
setting ensures controlled deployments—crucial for production environments where you need predictable update behavior. The custom image specification allows for enterprise-specific extensions and security patches.
The security context configuration follows the principle of least privilege by running processes as non-root user 1000. This approach significantly reduces the attack surface while maintaining compatibility with persistent volume requirements.
Common Configuration Settings
Druid Extensions Load List
The Druid cluster (from iuneradruid-cluster.yaml line 65) enables a curated set of extensions that power security, ingestion, storage backends, query engines, and Kubernetes-native operation. These are declared centrally so that every service loads the same capabilities at startup:
# common.runtime.properties druid.extensions.loadList=["iu-code-ingestion-druid-extension", "druid-pac4j","druid-basic-security", "simple-client-sslcontext", "druid-multi-stage-query", "druid-distinctcount", "druid-s3-extensions", "druid-stats", "druid-histogram", "druid-datasketches", "druid-lookups-cached-global", "postgresql-metadata-storage", "druid-kafka-indexing-service", "druid-time-min-max", "druid-kubernetes-overlord-extensions", "druid-kubernetes-extensions"]
What each extension does and why it matters for sizing:
- iu-code-ingestion-druid-extension to add custom pre-ingestion parsers for Apache Druid: Custom ingestion logic packaged into the image. Ensure it’s present in the container; resource use depends on what it implements.
- druid-pac4j: Enables web SSO/OIDC authentication for human users.
- druid-basic-security: Provides internal basic auth/authz (metadata-backed) and the escalator; required for service-to-service auth.
- simple-client-sslcontext: Supplies an SSLContext for clients using the provided keystores/truststores; required for TLS-enabled outbound HTTP.
- druid-multi-stage-query: Activates the Multi-Stage Query (MSQ) engine used for parallel SQL-based ingestion/queries. Can increase Broker and task memory/CPU during large jobs; plan heap and direct memory accordingly.
- druid-distinctcount: Adds approximate distinct-count aggregators. Can use off-heap memory; size processing/direct buffers to match workload.
- druid-s3-extensions: Enables S3-compatible deep storage and indexer logs.
- druid-stats: Statistical aggregators (e.g., variance/stddev) used in ingestion/queries.
- druid-histogram: Histogram/approximation aggregators. May increase memory during heavy aggregations.
- druid-datasketches: Apache DataSketches (HLL, quantiles, theta) for approximate analytics. Influences processing memory; tune direct memory on Brokers/Tasks.
- druid-lookups-cached-global: Global cached lookups for fast dimension enrichment. Consumes heap on Brokers/Historicals; account for cache size.
- postgresql-metadata-storage: PostgreSQL connector for the Druid metadata store. Required when using Postgres;
- druid-kafka-indexing-service: Kafka ingestion (supervisor/task). Affects Overlord and task pods; Configure network policies to permit access.
- druid-time-min-max: Min/Max time aggregators and utilities used in queries/optimizations.
- druid-kubernetes-overlord-extensions: Kubernetes runner for MM-less tasks (K8s Jobs). Essential for k8s-native task execution and to reach proper autoscaling.
- druid-kubernetes-extensions: Kubernetes-native service discovery (no Zookeeper). Required for the Zookeeper-less setup.
Operational notes:
- All extensions must be available under the configured directory (
druid.extensions.directory=/opt/druid/extensions
) in the image. - Security/TLS-related extensions rely on the keystores mounted at
/druid/jks
and the TLS settings in common.runtime.properties. - Sketch/histogram extensions primarily impact processing memory; ensure
-XX:MaxDirectMemorySize
anddruid.processing.buffer.sizeBytes
are sized for your query patterns.
TLS Configuration
Enterprise deployments demand encryption at rest and in transit. The configuration integrates the TLS keystores we prepared in Part 2:
volumeMounts: - name: keystores mountPath: /druid/jks readOnly: true volumes: - name: keystores secret: secretName: iuneradruid-jks-keystores-secret env: - name: druid_server_https_keyStorePassword valueFrom: secretKeyRef: name: iuneradruid-jks-keystores-secret key: keystorepassword
The TLS configuration enables secure communication between all Druid components:
common.runtime.properties: | # global TLS settings druid.enableTlsPort=true # Disable non-TLS Ports druid.enablePlaintextPort=false # client side TLS settings druid.client.https.protocol=TLSv1.2 druid.client.https.trustStoreType=jks druid.client.https.trustStorePath=/druid/jks/truststore.jks druid.client.https.trustStorePassword=changeit druid.client.https.validateHostnames=false # server side TLS settings druid.server.https.keyStoreType=jks druid.server.https.keyStorePath=/druid/jks/keystore.jks druid.server.https.certAlias=druid druid.server.https.trustStoreType=jks druid.server.https.trustStorePath=/druid/jks/truststore.jks druid.server.https.trustStorePassword=changeit druid.server.https.validateHostnames=false druid.server.https.protocol=TLSv1.2 druid.server.https.includeCipherSuites=["TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384", "TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256"]
This configuration disables plain-text communication entirely, ensuring all inter-component communication is encrypted using TLS 1.2 with enterprise-grade cipher suites.
- Because
druid.enablePlaintextPort=false
, all component listeners run over HTTPS on their configured ports. - Component TLS ports:
- Broker: 8282
- Coordinator: 8281
- Historical: 8283
- Overlord: 8290
- Router: 9088
- Health probes and Services must use
scheme: HTTPS
for these endpoints.
Authentication / Authorization
Secure, multi-tenant clusters use a dual authenticator chain with RBAC-backed authorization and an internal escalator account for service-to-service calls (from iuneradruid-cluster.yaml) to ensure that the cluster still works even if other authentication backends are down.
common.runtime.properties: | # Authentication chain: internal basic auth first, then OIDC (pac4j) druid.auth.authenticatorChain=["db","pac4j"] # pac4j (OIDC) authenticator for human users druid.auth.authenticator.pac4j.name=pac4j druid.auth.authenticator.pac4j.type=pac4j druid.auth.authenticator.pac4j.authorizerName=pac4j druid.auth.pac4j.oidc.scope=openid profile email # druid.auth.pac4j.oidc.clientID=ENV_VAR_FROM_SECRET # druid.auth.pac4j.oidc.clientSecret=ENV_VAR_FROM_SECRET # druid.auth.pac4j.oidc.discoveryURI=ENV_VAR_FROM_SECRET # druid.auth.pac4j.cookiePassphrase=ENV_VAR_FROM_SECRET # Basic authenticator backed by metadata store for internal/service users druid.auth.authenticator.db.type=basic druid.auth.authenticator.db.skipOnFailure=true druid.auth.authenticator.db.credentialsValidator.type=metadata druid.auth.authenticator.db.authorizerName=db # druid.auth.authenticator.db.initialAdminPassword=ENV_VAR_FROM_SECRET # druid.auth.authenticator.db.initialInternalClientPassword=ENV_VAR_FROM_SECRET # Authorizers druid.auth.authorizers=["db", "pac4j"] druid.auth.authorizer.db.type=basic druid.auth.authorizer.db.roleProvider.type=metadata druid.auth.authorizer.pac4j.type=allowAll # Escalator (cluster-internal authorization) druid.escalator.type=basic druid.escalator.authorizerName=db # druid.escalator.internalClientUsername=ENV_VAR_FROM_SECRET # druid.escalator.internalClientPassword=ENV_VAR_FROM_SECRET
The corresponding Secrets are wired through environment variables:
env: # Druid Basic-Auth - name: druid_auth_authenticator_db_initialAdminPassword valueFrom: secretKeyRef: name: iuneradruid-auth-secrets key: initialAdminPassword - name: druid_escalator_internalClientUsername valueFrom: secretKeyRef: name: iuneradruid-auth-secrets key: internalUser - name: druid_auth_authenticator_db_initialInternalClientPassword valueFrom: secretKeyRef: name: iuneradruid-auth-secrets key: internalClientPassword - name: druid_escalator_internalClientPassword valueFrom: secretKeyRef: name: iuneradruid-auth-secrets key: internalClientPassword # OIDC (pac4j) secrets - name: druid_auth_pac4j_oidc_clientID valueFrom: secretKeyRef: name: iuneradruid-auth-secrets key: pac4j-clientID - name: druid_auth_pac4j_oidc_clientSecret valueFrom: secretKeyRef: name: iuneradruid-auth-secrets key: pac4j-clientSecret - name: druid_auth_pac4j_oidc_discoveryURI valueFrom: secretKeyRef: name: iuneradruid-auth-secrets key: pac4j-discoveryURI - name: druid_auth_pac4j_cookiePassphrase valueFrom: secretKeyRef: name: iuneradruid-auth-secrets key: pac4j-cookiePassphrase
Note: Using pac4j for authentication is optional. In the current phase of the project, pac4j is used for authentication and not authorization. Authorization is currently not possible with pac4j.
S3 Deep Storage and Indexing Logs
Druid stores immutable segments and indexing task logs in S3. The following cluster-wide settings belong in common.runtime.properties
and apply to all components:
common.runtime.properties: | # Deep Storage (S3) druid.storage.type=s3 druid.storage.bucket=<bucket ID> druid.storage.baseKey=deepstorage # Indexer task logs (S3) druid.indexer.logs.type=s3 druid.indexer.logs.s3Bucket=<bucket ID> druid.indexer.logs.s3Prefix=indexlogs
- Buckets and prefixes should exist and be lifecycle-managed according to your retention and cost policies.
- For S3 connectivity, provide credentials via your cloud-native mechanism (IRSA/role binding) or environment variables/secrets.
Zookeeper-less Installation (Kubernetes-Native Service Discovery)
Kubernetes-native discovery removes Zookeeper dependency and uses the Kubernetes API for service discovery. Traditional Druid deployments depend on Zookeeper for service discovery and coordination. While Zookeeper is battle-tested, it introduces additional complexity and operational overhead in Kubernetes environments. Modern Druid supports Kubernetes-native service discovery that leverages the platform’s built-in capabilities.
common.runtime.properties: | druid.zk.service.enabled=false druid.serverview.type=http druid.discovery.type=k8s druid.discovery.k8s.clusterIdentifier=iuneradruid druid.discovery.k8s.leaseDuration=PT60S druid.discovery.k8s.renewDeadline=PT17S druid.discovery.k8s.retryPeriod=PT5S
This configuration enables Druid components to discover each other through Kubernetes APIs, using the pods annotation for the component state (e.g. isLeader) and pod labels for cluster component identification.
Prerequisites
- Druid Kubernetes extension loaded in all services:
common.runtime.properties: | druid.extensions.loadList=["...", "druid-kubernetes-extensions"]
Dedicated ServiceAccount
for Druid pods and task pods. Ensure the operator runs Druid pods with this ServiceAccount (typically serviceAccountName
under the pod template/spec). Update your task template to use the same account:
apiVersion: v1 kind: ConfigMap metadata: name: default-task-template namespace: druid data: default-task-template.yaml: |+ apiVersion: v1 kind: PodTemplate template: spec: serviceAccount: druid restartPolicy: Never
RBAC Requirements for Service Discovery
Kubernetes service discovery requires appropriate RBAC permissions. The following configuration grants necessary access:
apiVersion: v1 kind: ServiceAccount metadata: name: druid namespace: druid --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: druid-k8s-discovery namespace: druid rules: - apiGroups: [""] resources: ["pods","endpoints","services"] verbs: ["get","list","watch"] - apiGroups: ["coordination.k8s.io"] resources: ["leases"] verbs: ["get","list","watch","create","update","patch"] --- kind: RoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: druid-k8s-discovery namespace: druid subjects: - kind: ServiceAccount name: druid namespace: druid roleRef: kind: Role name: druid-k8s-discovery apiGroup: rbac.authorization.k8s.io
This configuration follows a least-privilege model scoped to the namespace. Broaden permissions only if your custom templates or operational needs require additional resources.
Service Discovery Performance Optimizations
leaseDuration
(PT60S): How long a component holds its discovery leaserenewDeadline
(PT17S): Maximum time to renew lease before expirationretryPeriod
(PT5S): Interval between renewal attempts
These timings balance quick failure detection with reduced API server load. Shorter intervals improve recovery times but increase Kubernetes API traffic. These settings are tested with an AWS EKS control plane.
Comparison: Zookeeper vs Kubernetes Service Discovery
Feature | Zookeeper | Kubernetes Native |
---|---|---|
Operational Complexity | Requires separate Zookeeper cluster | Uses existing Kubernetes infrastructure |
High Availability | Requires 3+ Zookeeper nodes | Leverages Kubernetes control plane HA |
Network Dependencies | Additional network connections | Uses cluster networking |
Monitoring | Separate Zookeeper monitoring | Integrated Kubernetes monitoring |
Security | Separate security configuration | Inherits Kubernetes RBAC |
Backup/Recovery | Zookeeper-specific procedures | Kubernetes-native backup strategies |
Scalability | Zookeeper cluster limitations | Scales with Kubernetes control plane |
Coordinator Component Configuration
The Coordinator manages segment lifecycle, tier balancing, and retention enforcement. The configuration below mirrors iuneradruid-cluster.yaml and is adapted for TLS-only clusters:
coordinator: kind: StatefulSet druid.port: 8281 maxSurge: 2 maxUnavailable: 0 nodeType: coordinator livenessProbe: initialDelaySeconds: 10 periodSeconds: 5 failureThreshold: 3 httpGet: path: /status/health port: 8281 scheme: HTTPS readinessProbe: initialDelaySeconds: 10 periodSeconds: 5 failureThreshold: 3 httpGet: path: /status/health port: 8281 scheme: HTTPS runtime.properties: | druid.service=druid/coordinator druid.log4j2.sourceCategory=druid/coordinator druid.coordinator.balancer.strategy=cachingCost druid.serverview.type=http druid.indexer.storage.type=metadata druid.coordinator.startDelay=PT10S druid.coordinator.period=PT5S # Run the overlord service in the coordinator process druid.coordinator.asOverlord.enabled=false
- TLS health probes: Use scheme HTTPS on port 8281 because plaintext ports are disabled cluster‑wide.
- Balancer strategy:
cachingCost
is a production‑friendly default; alternatives includecost
orsegmentSize
. - Start delay and period:
PT10S
andPT5S
allow quick reaction to rule changes; increase intervals on very large clusters to reduce churn. - Separate Overlord:
druid.coordinator.asOverlord.enabled=false
runs Overlord separately, which matches the Kubernetes‑native, MM‑less task execution model. - Rollout safety:
maxSurge: 2
andmaxUnavailable: 0
enable zero‑downtime rolling updates for the Coordinator.
Overlord Component Configuration
We run Druid in MM-less mode: no MiddleManagers are deployed. Instead, the Overlord uses the Kubernetes task runner to launch each indexing task as its own Kubernetes Pod/Job, which improves isolation, right-sizes resources per task, and supports native autoscaling (as long your cluster is autoscaling). For the full configuration, templates, and operational guidance, see Kubernetes-Native Task Execution.
overlord: nodeType: "overlord" druid.port: 8290 livenessProbe: httpGet: path: /status/health port: 8290 scheme: HTTPS readinessProbe: httpGet: path: /status/health port: 8290 scheme: HTTPS runtime.properties: | druid.service=druid/overlord druid.indexer.queue.startDelay=PT5S druid.indexer.fork.property.druid.processing.intermediaryData.storage.type=deepstore druid.indexer.storage.type=metadata druid.indexer.runner.namespace=druid druid.indexer.task.encapsulatedTask=true druid.indexer.runner.type=k8s druid.indexer.queue.maxSize=30 druid.processing.intermediaryData.storage.type=deepstore druid.indexer.runner.k8s.adapter.type=customTemplateAdapter druid.indexer.runner.k8s.podTemplate.base=/druid/tasktemplate/default/default-task-template.yaml druid.indexer.runner.primaryContainerName=main druid.indexer.runner.debugJobs=false druid.indexer.runner.maxTaskDuration=PT4H druid.indexer.runner.taskCleanupDelay=PT2H druid.indexer.runner.taskCleanupInterval=PT10M druid.indexer.runner.K8sjobLaunchTimeout=PT1H druid.indexer.runner.labels={"app.kubernetes.io/networkpolicy-group":"druid"} druid.indexer.runner.graceTerminationPeriodSeconds=PT30S # showing task for 4 Weeks in WebUI druid.indexer.storage.recentlyFinishedThreshold=P4W volumeMounts: - name: default-task-template mountPath: /druid/tasktemplate/default/ volumes: - name: default-task-template configMap: name: default-task-template
- TLS health probes: Use scheme HTTPS on port 8290; plaintext ports are disabled cluster‑wide.
- Kubernetes‑native runner:
runner.type=k8s
withencapsulatedTask=true
andcustomTemplateAdapter
using the pod template at/druid/tasktemplate/default/default-task-template.yaml
. - Queue and cleanup tuning:
queue.startDelay=PT5S
,queue.maxSize=30
to bound concurrency; cleanup delay/intervals keep the namespace clean;K8sjobLaunchTimeout
guards stuck submissions. - Intermediary deepstore: intermediaryData.storage.type=deepstore aligns with the mounted deep storage.
- Volumes: default-task-template ConfigMap mount is required for pod templating.
Broker Component Configuration
The Broker handles query routing and result merging, and it is both CPU and memory intensive. The configuration below mirrors iuneradruid-cluster.yaml and is adapted for TLS-only clusters:
broker: kind: StatefulSet replicas: 1 maxSurge: 1 maxUnavailable: 0 druid.port: 8282 nodeType: broker readinessProbe: initialDelaySeconds: 60 periodSeconds: 10 failureThreshold: 30 httpGet: path: /druid/broker/v1/readiness port: 8282 scheme: HTTPS runtime.properties: | druid.service=druid/broker # HTTP server settings druid.server.http.numThreads=60 # HTTP client settings druid.broker.http.numConnections=50 druid.broker.http.maxQueuedBytes=10MiB # Processing threads and buffers druid.processing.buffer.sizeBytes=1536MiB druid.processing.numMergeBuffers=6 druid.processing.numThreads=1 druid.query.groupBy.maxOnDiskStorage=2 druid.query.groupBy.maxMergingDictionarySize=300000000 # Query cache druid.broker.cache.useCache=true druid.broker.cache.populateCache=true
- TLS readiness: Use scheme HTTPS on port 8282 because plaintext ports are disabled cluster‑wide.
- Processing buffers/threads: 1536MiB buffers, 6 merge buffers, and 1 processing thread align with the reference sizing; adjust for high concurrency or large groupBy queries.
- HTTP tuning: server numThreads=60; client connections=50 and maxQueuedBytes=10MiB balance throughput and backpressure.
- Caching: Broker cache
useCache=true
andpopulateCache=true
; validate hit rates and memory impact for your workloads. - Rollout safety:
maxSurge: 1
andmaxUnavailable: 0
enable safe rolling updates for Brokers.
Historical Component
Historical nodes serve as the data storage workhorses and must be sized for high segment cache throughput and long startup times while segments are loaded. The configuration below mirrors iuneradruid-cluster.yaml and is adapted for TLS-only clusters:
historical: druid.port: 8283 kind: StatefulSet nodeType: historical nodeConfigMountPath: /opt/druid/conf/druid/cluster/data/historical livenessProbe: initialDelaySeconds: 1800 periodSeconds: 5 failureThreshold: 3 httpGet: path: /status/health port: 8283 scheme: HTTPS readinessProbe: httpGet: path: /druid/historical/v1/readiness port: 8283 scheme: HTTPS periodSeconds: 10 failureThreshold: 18 runtime.properties: | druid.service=druid/historical druid.log4j2.sourceCategory=druid/historical druid.server.http.numThreads=60 # Processing threads and buffers druid.processing.buffer.sizeBytes=500MiB druid.query.groupBy.maxOnDiskStorage=10000000000 druid.processing.numMergeBuffers=4 druid.processing.numThreads=15 druid.processing.tmpDir=var/druid/processing # Segment storage druid.segmentCache.locations=[{"path":"/var/druid/segment-cache","maxSize":"200g"}] # Query cache druid.historical.cache.useCache=true druid.historical.cache.populateCache=true druid.cache.type=caffeine druid.cache.sizeInBytes=256MiB
- TLS health/readiness probes: Use scheme HTTPS on port 8283; plaintext ports are disabled cluster‑wide.
- Long liveness delay:
initialDelaySeconds: 1800
accounts for potentially long segment loading on startup to avoid premature restarts. - Readiness endpoint:
/druid/historical/v1/readiness
ensures the node only serves queries after segments are loaded and announced. - Processing and spilling: 500MiB processing buffers with groupBy on‑disk spill prevent OOMs on large groupBy queries; 4 merge buffers and 15 processing threads align with reference sizing.
- Segment cache:
segmentCache.locations
at/var/druid/segment-cache
with maxSize 200g; size underlying storage accordingly.
Kubernetes-Native Task Execution
The Evolution Beyond MiddleManager
Traditional Druid deployments relied on MiddleManager processes to spawn and manage indexing tasks. This approach worked well in simpler environments but created several challenges in Kubernetes: resource waste from always-running MiddleManager pods, complex resource allocation, and limited isolation between tasks.
The MM-less architecture leverages Kubernetes’ native job scheduling capabilities, treating each indexing task as a Kubernetes pod. This approach provides better resource utilization, stronger isolation, and simplified scaling.
Task Runner Configuration
The Overlord component orchestrates task execution through Kubernetes APIs instead of managing MiddleManager processes:
overlord: nodeType: "overlord" replicas: 1 runtime.properties: | druid.indexer.runner.type=k8s druid.indexer.runner.namespace=druid druid.indexer.task.encapsulatedTask=true druid.indexer.runner.k8s.adapter.type=customTemplateAdapter druid.indexer.runner.k8s.podTemplate.base=/druid/tasktemplate/default/default-task-template.yaml druid.indexer.runner.primaryContainerName=main
The druid.indexer.runner.type=k8s
configuration switches from traditional MiddleManager-based task execution to Kubernetes-native pods. Each indexing task becomes an independent pod with its own resource allocation and lifecycle.
Task Template Configuration
The task template defines how indexing pods are created. This ConfigMap-based approach provides flexibility and version control:
apiVersion: v1 kind: ConfigMap metadata: name: default-task-template namespace: druid data: default-task-template.yaml: |+ apiVersion: v1 kind: PodTemplate template: spec: restartPolicy: Never serviceAccount: default securityContext: fsGroup: 1000 runAsGroup: 1000 runAsUser: 1000 containers: - name: main image: iunera/druid:29.0.1 command: [sh, -c] args: ["/peon.sh /druid/data 1 --loadBroadcastSegments true"]
Advanced Task Management Configuration
The MM-less setup includes sophisticated task lifecycle management:
druid.indexer.runner.maxTaskDuration=PT4H druid.indexer.runner.taskCleanupDelay=PT2H druid.indexer.runner.taskCleanupInterval=PT10M druid.indexer.runner.K8sjobLaunchTimeout=PT1H druid.indexer.runner.graceTerminationPeriodSeconds=PT30S
These settings ensure that long-running tasks don’t consume resources indefinitely while providing sufficient time for complex indexing operations to complete. The cleanup intervals prevent accumulation of completed task pods.
Benefits Comparison: MiddleManager vs MM-less
Aspect | MiddleManager | MM-less (Kubernetes-native) |
---|---|---|
Resource Efficiency | Always-running MM pods consume resources | Dynamic pod creation only when needed |
Isolation | Tasks share MM process space | Each task gets isolated pod |
Scaling | Manual MM replica scaling required | Automatic scaling based on task queue |
Monitoring | Complex multi-level monitoring | Direct pod-level monitoring |
Resource Allocation | Fixed MM resources split between tasks | Dedicated resources per task |
Failure Recovery | MM failure affects multiple tasks | Task failures are isolated |
Kubernetes Integration | Limited integration | Native Kubernetes scheduling |
Production Deployment Best Practices
Enterprise Druid deployments require production-grade strategies that work consistently across dev → staging → prod while remaining auditable and reversible. Building on the GitOps workflow from Part 2, we recommend a small set of guardrails and patterns that you can apply regardless of cluster size:
- Environment strategy and promotion: Use Kustomize overlays per environment (dev, staging, prod) with clear promotion via Git PRs/tags. Keep environment deltas as patches, not forks.
- Version pinning and immutability: Pin chart/app versions and container images by digest; use semantic versioning and Git tags for releases. Record change windows in Git history.
- Progressive delivery: Start with blue‑green and optionally add canary for high‑risk changes; ensure fast, deterministic rollback paths.
- High‑availability guardrails: Define PodDisruptionBudgets (PDBs), readiness/liveness probes, maxSurge/maxUnavailable, anti‑affinity/topologySpreadConstraints.
- Autoscaling behavior: Tune HPA thresholds and scale-down policies; coordinate with Cluster Autoscaler capacity and set resource requests/limits to avoid CPU throttling.
- Observability and SLOs: Standardize health endpoints, dashboards, and alerts for query latency, task success rate, segment loading, and JVM/OS resources.
- Security by default: Least‑privilege ServiceAccounts and RBAC, NetworkPolicies, TLS everywhere with keystore rotation, image provenance/scanning, and secrets from sealed/encrypted sources.
- Data protection: Scheduled metadata backups, deep storage replication, periodic restore drills, and backup verification jobs.
- Cost and governance: Quotas and budgets per environment/tenant, cache sizing policies, and right‑sizing.
- GitOps workflow integration: PR‑based changes, policy-as-code validation, drift detection, and manual approvals for production.
Recommended GitOps layout with Kustomize overlays:
druid/ base/ kustomization.yaml druidcluster.yaml overlays/ dev/ kustomization.yaml patches.yaml staging/ kustomization.yaml patches.yaml prod/ kustomization.yaml patches.yaml
Blue-Green Deployment Implementation
Blue-green deployments provide zero-downtime updates for critical Druid clusters:
# Blue deployment metadata: name: iuneradruid-blue labels: deployment: blue app.kubernetes.io/version: "29.0.1" --- # Green deployment metadata: name: iuneradruid-green labels: deployment: green app.kubernetes.io/version: "29.0.2"
Service selectors can be updated to switch traffic between blue and green deployments, enabling instant rollbacks if issues arise.
Capacity Planning and Resource Allocation
Production Druid clusters require careful capacity planning based on expected workloads and growth projections.
Component Resource Recommendations
Component | Small Cluster | Medium Cluster | Large Cluster |
---|---|---|---|
Broker | 8Gi memory, 2 CPU | 16Gi memory, 4 CPU | 32Gi memory, 8 CPU |
Historical | 16Gi memory, 2 CPU | 32Gi memory, 4 CPU | 64Gi memory, 8 CPU |
Coordinator | 4Gi memory, 1 CPU | 8Gi memory, 2 CPU | 16Gi memory, 4 CPU |
Overlord | 2Gi memory, 1 CPU | 4Gi memory, 2 CPU | 8Gi memory, 4 CPU |
Health Checks and Monitoring Implementation
Production deployments require comprehensive health monitoring beyond basic liveness probes:
readinessProbe: initialDelaySeconds: 60 periodSeconds: 10 failureThreshold: 30 httpGet: path: /druid/broker/v1/readiness port: 8282 scheme: HTTPS livenessProbe: initialDelaySeconds: 1800 periodSeconds: 5 failureThreshold: 3 httpGet: path: /status/health port: 8283 scheme: HTTPS
The extended initialDelaySeconds
for Historical nodes accounts for segment loading time during startup.
Backup and Disaster Recovery Planning
Metadata Backup Strategy
# Automated PostgreSQL backup kubectl create cronjob postgres-backup \ --image=postgres:15 \ --schedule="0 2 * * *" \ --restart=OnFailure \ -- pg_dump -h postgres-postgresql-hl -U postgres postgres > /backup/druid-metadata-$(date +%Y%m%d).sql
Deep Storage Backup Verification
# S3 backup verification job apiVersion: batch/v1 kind: CronJob metadata: name: druid-backup-verification spec: schedule: "0 4 * * 0" # Weekly jobTemplate: spec: template: spec: containers: - name: backup-verify image: minio/mc:latest command: ["/bin/sh", "-c"] args: ["mc mirror --dry-run s3/druid-segments s3/druid-segments-backup"]
Integration with GitOps Workflow
Advanced FluxCD Configuration
Building on the GitOps foundation from Part 2, production Druid deployments benefit from sophisticated FluxCD configurations that handle complex dependency management and multi-cluster deployments.
The complete GitOps workflow integrates several FluxCD resources:
apiVersion: source.toolkit.fluxcd.io/v1beta2 kind: GitRepository metadata: name: druid-cluster-config namespace: flux-system spec: interval: 5m0s ref: branch: main secretRef: name: flux-system url: ssh://git@github.com/iunera/druid-cluster-config --- apiVersion: kustomize.toolkit.fluxcd.io/v1 kind: Kustomization metadata: name: druid-cluster namespace: flux-system spec: interval: 10m0s path: ./kubernetes/druid/druidcluster prune: true sourceRef: kind: GitRepository name: druid-cluster-config dependsOn: - name: druid-operator - name: druid-secrets
Configuration Drift Detection
GitOps provides automatic detection of configuration drift—when running configurations diverge from the Git repository state:
spec: interval: 5m0s # Check for drift every 5 minutes prune: true # Remove resources not in Git force: true # Override manual changes
This configuration ensures your production Druid cluster always matches the declared state in your Git repository.
Multi-Cluster Deployment Patterns
Enterprise environments often require deploying Druid across multiple Kubernetes clusters for high availability or geographic distribution:
# Cluster-specific kustomization apiVersion: kustomize.toolkit.fluxcd.io/v1 kind: Kustomization metadata: name: druid-us-east-1 namespace: flux-system spec: path: ./clusters/us-east-1/druid postBuild: substitute: cluster_region: "us-east-1" s3_endpoint: "s3.us-east-1.amazonaws.com"
Rollback Strategies and Version Control
GitOps enables sophisticated rollback strategies through Git operations:
# Immediate rollback to previous commit git revert HEAD --no-edit git push origin main # Rollback to specific version git reset --hard <commit-hash> git push --force-with-lease origin main
FluxCD automatically detects these changes and applies the rollback to your Kubernetes cluster.
Advanced Integration with Conversational AI
For enterprises leveraging advanced analytics capabilities, consider integrating with Apache Druid MCP server for conversational AI capabilities. This integration enables natural language querying of your time series data, making Druid accessible to non-technical stakeholders.
Deployment Patterns Comparison
Pattern | Traditional | Blue-Green | Canary | Rolling |
---|---|---|---|---|
Downtime | High | None | Minimal | None |
Risk | High | Low | Very Low | Medium |
Resource Usage | Low | High | Medium | Low |
Complexity | Low | Medium | High | Low |
Rollback Speed | Slow | Instant | Fast | Medium |
Testing Capability | Limited | Full | Partial | Limited |
Frequently Asked Questions
What are the advantages of MM-less Druid deployment?
You eliminate always-on MiddleManagers, run each indexing task as its own Kubernetes Pod/Job, right‑size resources per task, improve isolation and observability, and unlock native autoscaling. Troubleshooting is also simpler because each task has its own pod logs and lifecycle.
Do I still need Zookeeper?
No—when you enable Kubernetes-native discovery (druid.discovery.type=k8s
) and disable ZK (druid.zk.service.enabled=false
), Druid uses the Kubernetes API for discovery/announcements. Ensure the druid-kubernetes-extensions
is loaded and RBAC for Pods/Endpoints/Services/Leases is in place (see “Zookeeper-less Installation”).
How should I size resources for production?
Start from the guidance in this article: Brokers 16–32Gi/4–8 CPU, Historicals 32–64Gi with 200–500Gi ephemeral for segment cache, Coordinators 8–16Gi/2–4 CPU, Overlord 2–4Gi/1–2 CPU in MM‑less setups. Adjust based on query concurrency, segment size, and MSQ usage; validate with load tests.
How do I configure TLS with Java keystores?
Mount keystores at /druid/jks
, pass passwords via Secrets/env, set client/server TLS properties in common.runtime.properties
, and disable plaintext ports (druid.enablePlaintextPort=false
). Use TLSv1.2+ and approved cipher suites. Update probes and Services to use HTTPS schemes/ports.
What’s the recommended Backup/DR approach?
Back up the metadata store (e.g., PostgreSQL CronJobs), verify deep storage replication, and run periodic restore drills. Automate verification jobs (e.g., S3 mirror dry‑runs) and keep encryption keys separate. Use GitOps history for deterministic rollbacks.
How can I troubleshoot task execution in MM-less mode?
Check Overlord logs for scheduling and Kubernetes runner messages, verify service account RBAC, and inspect individual task pods with kubectl logs
/kubectl describe
. Confirm the task template ConfigMap is mounted and the customTemplateAdapter
path is correct.
Which production metrics matter most?
Query latency/throughput, task success rate, segment loading/dropping, JVM heap and direct memory, cache hit rate, disk usage for segment cache, and inter-component network I/O. Alert on query failures, slow segment loading, and resource saturation.
How do I rotate certificates without downtime?
Update the TLS Secret, trigger rolling restarts, and rely on readiness/liveness probes using HTTPS to validate. Consider cert‑manager for automated renewals. Use GitOps so rotations are auditable and revertible.
How do I migrate from an existing Zookeeper/MM cluster to Kubernetes‑native?
Stage a parallel cluster with K8s discovery and MM‑less runner, mirror ingestion, validate queries/latency, then flip traffic using blue‑green or router/service selector switches. Keep deep storage shared (or replicate) to minimize re‑ingestion when feasible.
Is pac4j required?
No. This guide uses pac4j (OIDC) for human SSO and basic auth for service users, but you can start with basic auth/escalator only and add pac4j later. Authorization is handled via the basic authorizer in this setup.
Conclusion
You now have a production‑grade, TLS‑only, MM‑less Apache Druid cluster running natively on Kubernetes and managed via GitOps. The patterns in this guide—Kubernetes service discovery, pod‑templated task execution, hardened RBAC, and environment‑based configuration—give you a repeatable, auditable path from dev to prod.
Go‑live checklist
- TLS everywhere with secret rotation procedure validated
- RBAC and service accounts least‑privilege, including task pods
- Health/readiness probes using HTTPS and realistic initial delays
- S3 deep storage and index logs configured with lifecycle/retention
- Backups and a restore drill completed for the metadata store
- Blue‑green or rollback plan documented and tested
- Dashboards and alerts for query latency, task success, segment loading, JVM/direct memory, and cache hit rate
What’s next
In the next part of the series we implement authentication and authorization, combining pac4j (OIDC) for human users with Druid’s basic authorizer for service‑to‑service access, and we harden roles, users, and secret management to meet enterprise requirements.