Apache Druid on Kubernetes: Production-ready with TLS, MM‑less, Zookeeper‑less, GitOps

This article is part of a series on deploying a production‑ready Apache Druid cluster on Kubernetes. In Part 3, we cover the complete installation and configuration of a production‑grade cluster, building on the foundation and preparation covered previously. See Part 1 and Part 2.

Introduction

After laying the infrastructure groundwork and preparing your deployment environment, it’s time for the main event: deploying your production-ready Apache Druid cluster on Kubernetes. This isn’t just about applying a few YAML files—we’re talking about configuring a sophisticated, distributed analytics engine that can handle enterprise-scale workloads.

Modern time series applications demand more than basic Druid setups. They require carefully tuned configurations, proper resource allocation, and enterprise-grade security implementations. The traditional approach of manually managing MiddleManager components is giving way to Kubernetes-native architectures that leverage the platform’s inherent capabilities.

This guide walks you through the complete installation process, from understanding the DruidCluster custom resource to implementing MM-less deployments and Zookeeper-less service discovery. We’ll cover everything from component-specific configurations to production deployment patterns, ensuring your Druid installation is ready for real-world enterprise demands.

Table of contents


Understanding the DruidCluster Custom Resource

The heart of your Druid installation lies within the DruidCluster custom resource, which acts as the declarative blueprint for your entire cluster. This isn’t your typical Kubernetes deployment—it’s a sophisticated orchestration mechanism that manages multiple interconnected services with complex dependencies.

Let’s examine the foundational configuration that brings together all the infrastructure components we prepared in earlier parts:

apiVersion: "druid.apache.org/v1alpha1"
kind: "Druid"
metadata:
  name: iuneradruid
  namespace: druid
spec:
  commonConfigMountPath: /opt/druid/conf/druid/cluster/_common
  rollingDeploy: false
  startScript: /druid.sh
  image: iunera/druid:29.0.1
  imagePullPolicy: Always
  podAnnotations:
    reason: this-cluster-is-k8s-discovery
  podLabels:
    app.kubernetes.io/networkpolicy-group: druid
  securityContext:
    fsGroup: 1000
    runAsUser: 1000
    runAsGroup: 1000

This configuration establishes several critical principles. The rollingDeploy: false setting ensures controlled deployments—crucial for production environments where you need predictable update behavior. The custom image specification allows for enterprise-specific extensions and security patches.

The security context configuration follows the principle of least privilege by running processes as non-root user 1000. This approach significantly reduces the attack surface while maintaining compatibility with persistent volume requirements.

Common Configuration Settings

Druid Extensions Load List

The Druid cluster (from iuneradruid-cluster.yaml line 65) enables a curated set of extensions that power security, ingestion, storage backends, query engines, and Kubernetes-native operation. These are declared centrally so that every service loads the same capabilities at startup:

# common.runtime.properties
druid.extensions.loadList=["iu-code-ingestion-druid-extension", "druid-pac4j","druid-basic-security", "simple-client-sslcontext", "druid-multi-stage-query", "druid-distinctcount", "druid-s3-extensions", "druid-stats", "druid-histogram", "druid-datasketches", "druid-lookups-cached-global", "postgresql-metadata-storage", "druid-kafka-indexing-service", "druid-time-min-max", "druid-kubernetes-overlord-extensions", "druid-kubernetes-extensions"]

What each extension does and why it matters for sizing:

  • iu-code-ingestion-druid-extension to add custom pre-ingestion parsers for Apache Druid: Custom ingestion logic packaged into the image. Ensure it’s present in the container; resource use depends on what it implements.
  • druid-pac4j: Enables web SSO/OIDC authentication for human users.
  • druid-basic-security: Provides internal basic auth/authz (metadata-backed) and the escalator; required for service-to-service auth.
  • simple-client-sslcontext: Supplies an SSLContext for clients using the provided keystores/truststores; required for TLS-enabled outbound HTTP.
  • druid-multi-stage-query: Activates the Multi-Stage Query (MSQ) engine used for parallel SQL-based ingestion/queries. Can increase Broker and task memory/CPU during large jobs; plan heap and direct memory accordingly.
  • druid-distinctcount: Adds approximate distinct-count aggregators. Can use off-heap memory; size processing/direct buffers to match workload.
  • druid-s3-extensions: Enables S3-compatible deep storage and indexer logs.
  • druid-stats: Statistical aggregators (e.g., variance/stddev) used in ingestion/queries.
  • druid-histogram: Histogram/approximation aggregators. May increase memory during heavy aggregations.
  • druid-datasketches: Apache DataSketches (HLL, quantiles, theta) for approximate analytics. Influences processing memory; tune direct memory on Brokers/Tasks.
  • druid-lookups-cached-global: Global cached lookups for fast dimension enrichment. Consumes heap on Brokers/Historicals; account for cache size.
  • postgresql-metadata-storage: PostgreSQL connector for the Druid metadata store. Required when using Postgres;
  • druid-kafka-indexing-service: Kafka ingestion (supervisor/task). Affects Overlord and task pods; Configure network policies to permit access.
  • druid-time-min-max: Min/Max time aggregators and utilities used in queries/optimizations.
  • druid-kubernetes-overlord-extensions: Kubernetes runner for MM-less tasks (K8s Jobs). Essential for k8s-native task execution and to reach proper autoscaling.
  • druid-kubernetes-extensions: Kubernetes-native service discovery (no Zookeeper). Required for the Zookeeper-less setup.

Operational notes:

  • All extensions must be available under the configured directory (druid.extensions.directory=/opt/druid/extensions) in the image.
  • Security/TLS-related extensions rely on the keystores mounted at /druid/jks and the TLS settings in common.runtime.properties.
  • Sketch/histogram extensions primarily impact processing memory; ensure -XX:MaxDirectMemorySize and druid.processing.buffer.sizeBytes are sized for your query patterns.

TLS Configuration

Enterprise deployments demand encryption at rest and in transit. The configuration integrates the TLS keystores we prepared in Part 2:

volumeMounts:
  - name: keystores
    mountPath: /druid/jks
    readOnly: true

volumes:
  - name: keystores
    secret:
      secretName: iuneradruid-jks-keystores-secret

env:
  - name: druid_server_https_keyStorePassword
    valueFrom:
      secretKeyRef:
        name: iuneradruid-jks-keystores-secret
        key: keystorepassword

The TLS configuration enables secure communication between all Druid components:

common.runtime.properties: |
  # global TLS settings
  druid.enableTlsPort=true
  # Disable non-TLS Ports
  druid.enablePlaintextPort=false

  # client side TLS settings
  druid.client.https.protocol=TLSv1.2
  druid.client.https.trustStoreType=jks
  druid.client.https.trustStorePath=/druid/jks/truststore.jks
  druid.client.https.trustStorePassword=changeit
  druid.client.https.validateHostnames=false

  # server side TLS settings
  druid.server.https.keyStoreType=jks
  druid.server.https.keyStorePath=/druid/jks/keystore.jks
  druid.server.https.certAlias=druid
  druid.server.https.trustStoreType=jks
  druid.server.https.trustStorePath=/druid/jks/truststore.jks
  druid.server.https.trustStorePassword=changeit
  druid.server.https.validateHostnames=false
  druid.server.https.protocol=TLSv1.2
  druid.server.https.includeCipherSuites=["TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384", "TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256"]

This configuration disables plain-text communication entirely, ensuring all inter-component communication is encrypted using TLS 1.2 with enterprise-grade cipher suites.

  • Because druid.enablePlaintextPort=false, all component listeners run over HTTPS on their configured ports.
  • Component TLS ports:
    • Broker: 8282
    • Coordinator: 8281
    • Historical: 8283
    • Overlord: 8290
    • Router: 9088
  • Health probes and Services must use scheme: HTTPS for these endpoints.

Authentication / Authorization

Secure, multi-tenant clusters use a dual authenticator chain with RBAC-backed authorization and an internal escalator account for service-to-service calls (from iuneradruid-cluster.yaml) to ensure that the cluster still works even if other authentication backends are down.

common.runtime.properties: |
  # Authentication chain: internal basic auth first, then OIDC (pac4j)
  druid.auth.authenticatorChain=["db","pac4j"]

  # pac4j (OIDC) authenticator for human users
  druid.auth.authenticator.pac4j.name=pac4j
  druid.auth.authenticator.pac4j.type=pac4j
  druid.auth.authenticator.pac4j.authorizerName=pac4j
  druid.auth.pac4j.oidc.scope=openid profile email
  # druid.auth.pac4j.oidc.clientID=ENV_VAR_FROM_SECRET
  # druid.auth.pac4j.oidc.clientSecret=ENV_VAR_FROM_SECRET
  # druid.auth.pac4j.oidc.discoveryURI=ENV_VAR_FROM_SECRET
  # druid.auth.pac4j.cookiePassphrase=ENV_VAR_FROM_SECRET

  # Basic authenticator backed by metadata store for internal/service users
  druid.auth.authenticator.db.type=basic
  druid.auth.authenticator.db.skipOnFailure=true
  druid.auth.authenticator.db.credentialsValidator.type=metadata
  druid.auth.authenticator.db.authorizerName=db
  # druid.auth.authenticator.db.initialAdminPassword=ENV_VAR_FROM_SECRET
  # druid.auth.authenticator.db.initialInternalClientPassword=ENV_VAR_FROM_SECRET

  # Authorizers
  druid.auth.authorizers=["db", "pac4j"]
  druid.auth.authorizer.db.type=basic
  druid.auth.authorizer.db.roleProvider.type=metadata
  druid.auth.authorizer.pac4j.type=allowAll

  # Escalator (cluster-internal authorization)
  druid.escalator.type=basic
  druid.escalator.authorizerName=db
  # druid.escalator.internalClientUsername=ENV_VAR_FROM_SECRET
  # druid.escalator.internalClientPassword=ENV_VAR_FROM_SECRET

The corresponding Secrets are wired through environment variables:

env:
  # Druid Basic-Auth
  - name: druid_auth_authenticator_db_initialAdminPassword
    valueFrom:
      secretKeyRef:
        name: iuneradruid-auth-secrets
        key: initialAdminPassword
  - name: druid_escalator_internalClientUsername
    valueFrom:
      secretKeyRef:
        name: iuneradruid-auth-secrets
        key: internalUser
  - name: druid_auth_authenticator_db_initialInternalClientPassword
    valueFrom:
      secretKeyRef:
        name: iuneradruid-auth-secrets
        key: internalClientPassword
  - name: druid_escalator_internalClientPassword
    valueFrom:
      secretKeyRef:
        name: iuneradruid-auth-secrets
        key: internalClientPassword

  # OIDC (pac4j) secrets
  - name: druid_auth_pac4j_oidc_clientID
    valueFrom:
      secretKeyRef:
        name: iuneradruid-auth-secrets
        key: pac4j-clientID
  - name: druid_auth_pac4j_oidc_clientSecret
    valueFrom:
      secretKeyRef:
        name: iuneradruid-auth-secrets
        key: pac4j-clientSecret
  - name: druid_auth_pac4j_oidc_discoveryURI
    valueFrom:
      secretKeyRef:
        name: iuneradruid-auth-secrets
        key: pac4j-discoveryURI
  - name: druid_auth_pac4j_cookiePassphrase
    valueFrom:
      secretKeyRef:
        name: iuneradruid-auth-secrets
        key: pac4j-cookiePassphrase

Note: Using pac4j for authentication is optional. In the current phase of the project, pac4j is used for authentication and not authorization. Authorization is currently not possible with pac4j.

S3 Deep Storage and Indexing Logs

Druid stores immutable segments and indexing task logs in S3. The following cluster-wide settings belong in common.runtime.properties and apply to all components:

common.runtime.properties: |
  # Deep Storage (S3)
  druid.storage.type=s3
  druid.storage.bucket=<bucket ID>
  druid.storage.baseKey=deepstorage

  # Indexer task logs (S3)
  druid.indexer.logs.type=s3
  druid.indexer.logs.s3Bucket=<bucket ID>
  druid.indexer.logs.s3Prefix=indexlogs
  • Buckets and prefixes should exist and be lifecycle-managed according to your retention and cost policies.
  • For S3 connectivity, provide credentials via your cloud-native mechanism (IRSA/role binding) or environment variables/secrets.

Zookeeper-less Installation (Kubernetes-Native Service Discovery)

Kubernetes-native discovery removes Zookeeper dependency and uses the Kubernetes API for service discovery. Traditional Druid deployments depend on Zookeeper for service discovery and coordination. While Zookeeper is battle-tested, it introduces additional complexity and operational overhead in Kubernetes environments. Modern Druid supports Kubernetes-native service discovery that leverages the platform’s built-in capabilities.

common.runtime.properties: |
  druid.zk.service.enabled=false
  druid.serverview.type=http
  druid.discovery.type=k8s
  druid.discovery.k8s.clusterIdentifier=iuneradruid
  druid.discovery.k8s.leaseDuration=PT60S
  druid.discovery.k8s.renewDeadline=PT17S
  druid.discovery.k8s.retryPeriod=PT5S

This configuration enables Druid components to discover each other through Kubernetes APIs, using the pods annotation for the component state (e.g. isLeader) and pod labels for cluster component identification.

Prerequisites

  • Druid Kubernetes extension loaded in all services:
common.runtime.properties: |
  druid.extensions.loadList=["...", "druid-kubernetes-extensions"]

Dedicated ServiceAccount for Druid pods and task pods. Ensure the operator runs Druid pods with this ServiceAccount (typically serviceAccountName under the pod template/spec). Update your task template to use the same account:

apiVersion: v1
kind: ConfigMap
metadata:
  name: default-task-template
  namespace: druid
data:
  default-task-template.yaml: |+
    apiVersion: v1
    kind: PodTemplate
    template:
      spec:
        serviceAccount: druid
        restartPolicy: Never

RBAC Requirements for Service Discovery

Kubernetes service discovery requires appropriate RBAC permissions. The following configuration grants necessary access:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: druid
  namespace: druid
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: druid-k8s-discovery
  namespace: druid
rules:
- apiGroups: [""]
  resources: ["pods","endpoints","services"]
  verbs: ["get","list","watch"]
- apiGroups: ["coordination.k8s.io"]
  resources: ["leases"]
  verbs: ["get","list","watch","create","update","patch"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: druid-k8s-discovery
  namespace: druid
subjects:
- kind: ServiceAccount
  name: druid
  namespace: druid
roleRef:
  kind: Role
  name: druid-k8s-discovery
  apiGroup: rbac.authorization.k8s.io

This configuration follows a least-privilege model scoped to the namespace. Broaden permissions only if your custom templates or operational needs require additional resources.

Service Discovery Performance Optimizations

  • leaseDuration (PT60S): How long a component holds its discovery lease
  • renewDeadline (PT17S): Maximum time to renew lease before expiration
  • retryPeriod (PT5S): Interval between renewal attempts

These timings balance quick failure detection with reduced API server load. Shorter intervals improve recovery times but increase Kubernetes API traffic. These settings are tested with an AWS EKS control plane.

Comparison: Zookeeper vs Kubernetes Service Discovery

FeatureZookeeperKubernetes Native
Operational ComplexityRequires separate Zookeeper clusterUses existing Kubernetes infrastructure
High AvailabilityRequires 3+ Zookeeper nodesLeverages Kubernetes control plane HA
Network DependenciesAdditional network connectionsUses cluster networking
MonitoringSeparate Zookeeper monitoringIntegrated Kubernetes monitoring
SecuritySeparate security configurationInherits Kubernetes RBAC
Backup/RecoveryZookeeper-specific proceduresKubernetes-native backup strategies
ScalabilityZookeeper cluster limitationsScales with Kubernetes control plane

Coordinator Component Configuration

The Coordinator manages segment lifecycle, tier balancing, and retention enforcement. The configuration below mirrors iuneradruid-cluster.yaml and is adapted for TLS-only clusters:

coordinator:
  kind: StatefulSet
  druid.port: 8281
  maxSurge: 2
  maxUnavailable: 0
  nodeType: coordinator
  livenessProbe:
    initialDelaySeconds: 10
    periodSeconds: 5
    failureThreshold: 3
    httpGet:
      path: /status/health
      port: 8281
      scheme: HTTPS
  readinessProbe:
    initialDelaySeconds: 10
    periodSeconds: 5
    failureThreshold: 3
    httpGet:
      path: /status/health
      port: 8281
      scheme: HTTPS

  runtime.properties: |

    druid.service=druid/coordinator
    druid.log4j2.sourceCategory=druid/coordinator

    druid.coordinator.balancer.strategy=cachingCost
    druid.serverview.type=http

    druid.indexer.storage.type=metadata
    druid.coordinator.startDelay=PT10S
    druid.coordinator.period=PT5S

    # Run the overlord service in the coordinator process
    druid.coordinator.asOverlord.enabled=false
  • TLS health probes: Use scheme HTTPS on port 8281 because plaintext ports are disabled cluster‑wide.
  • Balancer strategy: cachingCost is a production‑friendly default; alternatives include cost or segmentSize.
  • Start delay and period: PT10S and PT5S allow quick reaction to rule changes; increase intervals on very large clusters to reduce churn.
  • Separate Overlord: druid.coordinator.asOverlord.enabled=false runs Overlord separately, which matches the Kubernetes‑native, MM‑less task execution model.
  • Rollout safety: maxSurge: 2 and maxUnavailable: 0 enable zero‑downtime rolling updates for the Coordinator.

Overlord Component Configuration

We run Druid in MM-less mode: no MiddleManagers are deployed. Instead, the Overlord uses the Kubernetes task runner to launch each indexing task as its own Kubernetes Pod/Job, which improves isolation, right-sizes resources per task, and supports native autoscaling (as long your cluster is autoscaling). For the full configuration, templates, and operational guidance, see Kubernetes-Native Task Execution.

overlord:
  nodeType: "overlord"
  druid.port: 8290
  livenessProbe:
    httpGet:
      path: /status/health
      port: 8290
      scheme: HTTPS
  readinessProbe:
    httpGet:
      path: /status/health
      port: 8290
      scheme: HTTPS

  runtime.properties: |
    druid.service=druid/overlord
    druid.indexer.queue.startDelay=PT5S
    druid.indexer.fork.property.druid.processing.intermediaryData.storage.type=deepstore
    druid.indexer.storage.type=metadata
    druid.indexer.runner.namespace=druid
    druid.indexer.task.encapsulatedTask=true
    druid.indexer.runner.type=k8s
    druid.indexer.queue.maxSize=30
    druid.processing.intermediaryData.storage.type=deepstore
    druid.indexer.runner.k8s.adapter.type=customTemplateAdapter
    druid.indexer.runner.k8s.podTemplate.base=/druid/tasktemplate/default/default-task-template.yaml
    druid.indexer.runner.primaryContainerName=main
    druid.indexer.runner.debugJobs=false
    druid.indexer.runner.maxTaskDuration=PT4H
    druid.indexer.runner.taskCleanupDelay=PT2H
    druid.indexer.runner.taskCleanupInterval=PT10M
    druid.indexer.runner.K8sjobLaunchTimeout=PT1H
    druid.indexer.runner.labels={"app.kubernetes.io/networkpolicy-group":"druid"}
    druid.indexer.runner.graceTerminationPeriodSeconds=PT30S
    # showing task for 4 Weeks in WebUI
    druid.indexer.storage.recentlyFinishedThreshold=P4W

  volumeMounts:
    - name: default-task-template
      mountPath: /druid/tasktemplate/default/

  volumes:
    - name: default-task-template
      configMap:
        name: default-task-template
  • TLS health probes: Use scheme HTTPS on port 8290; plaintext ports are disabled cluster‑wide.
  • Kubernetes‑native runner: runner.type=k8s with encapsulatedTask=true and customTemplateAdapter using the pod template at /druid/tasktemplate/default/default-task-template.yaml.
  • Queue and cleanup tuning: queue.startDelay=PT5S, queue.maxSize=30 to bound concurrency; cleanup delay/intervals keep the namespace clean; K8sjobLaunchTimeout guards stuck submissions.
  • Intermediary deepstore: intermediaryData.storage.type=deepstore aligns with the mounted deep storage.
  • Volumes: default-task-template ConfigMap mount is required for pod templating.

Broker Component Configuration

The Broker handles query routing and result merging, and it is both CPU and memory intensive. The configuration below mirrors iuneradruid-cluster.yaml and is adapted for TLS-only clusters:

broker:
  kind: StatefulSet
  replicas: 1
  maxSurge: 1
  maxUnavailable: 0
  druid.port: 8282
  nodeType: broker
  readinessProbe:
    initialDelaySeconds: 60
    periodSeconds: 10
    failureThreshold: 30
    httpGet:
      path: /druid/broker/v1/readiness
      port: 8282
      scheme: HTTPS
  runtime.properties: |
    druid.service=druid/broker
    # HTTP server settings
    druid.server.http.numThreads=60
    # HTTP client settings
    druid.broker.http.numConnections=50
    druid.broker.http.maxQueuedBytes=10MiB

    # Processing threads and buffers
    druid.processing.buffer.sizeBytes=1536MiB
    druid.processing.numMergeBuffers=6
    druid.processing.numThreads=1
    druid.query.groupBy.maxOnDiskStorage=2
    druid.query.groupBy.maxMergingDictionarySize=300000000
    # Query cache
    druid.broker.cache.useCache=true
    druid.broker.cache.populateCache=true
  • TLS readiness: Use scheme HTTPS on port 8282 because plaintext ports are disabled cluster‑wide.
  • Processing buffers/threads: 1536MiB buffers, 6 merge buffers, and 1 processing thread align with the reference sizing; adjust for high concurrency or large groupBy queries.
  • HTTP tuning: server numThreads=60; client connections=50 and maxQueuedBytes=10MiB balance throughput and backpressure.
  • Caching: Broker cache useCache=true and populateCache=true; validate hit rates and memory impact for your workloads.
  • Rollout safety: maxSurge: 1 and maxUnavailable: 0 enable safe rolling updates for Brokers.

Historical Component

Historical nodes serve as the data storage workhorses and must be sized for high segment cache throughput and long startup times while segments are loaded. The configuration below mirrors iuneradruid-cluster.yaml and is adapted for TLS-only clusters:

historical:
  druid.port: 8283
  kind: StatefulSet
  nodeType: historical
  nodeConfigMountPath: /opt/druid/conf/druid/cluster/data/historical

  livenessProbe:
    initialDelaySeconds: 1800
    periodSeconds: 5
    failureThreshold: 3
    httpGet:
      path: /status/health
      port: 8283
      scheme: HTTPS
  readinessProbe:
    httpGet:
      path: /druid/historical/v1/readiness
      port: 8283
      scheme: HTTPS
    periodSeconds: 10
    failureThreshold: 18

  runtime.properties: |
    druid.service=druid/historical
    druid.log4j2.sourceCategory=druid/historical
    druid.server.http.numThreads=60
    # Processing threads and buffers
    druid.processing.buffer.sizeBytes=500MiB
    druid.query.groupBy.maxOnDiskStorage=10000000000
    druid.processing.numMergeBuffers=4
    druid.processing.numThreads=15
    druid.processing.tmpDir=var/druid/processing
    # Segment storage
    druid.segmentCache.locations=[{"path":"/var/druid/segment-cache","maxSize":"200g"}]
    # Query cache
    druid.historical.cache.useCache=true
    druid.historical.cache.populateCache=true
    druid.cache.type=caffeine
    druid.cache.sizeInBytes=256MiB
  • TLS health/readiness probes: Use scheme HTTPS on port 8283; plaintext ports are disabled cluster‑wide.
  • Long liveness delay: initialDelaySeconds: 1800 accounts for potentially long segment loading on startup to avoid premature restarts.
  • Readiness endpoint: /druid/historical/v1/readiness ensures the node only serves queries after segments are loaded and announced.
  • Processing and spilling: 500MiB processing buffers with groupBy on‑disk spill prevent OOMs on large groupBy queries; 4 merge buffers and 15 processing threads align with reference sizing.
  • Segment cache: segmentCache.locations at /var/druid/segment-cache with maxSize 200g; size underlying storage accordingly.

Kubernetes-Native Task Execution

The Evolution Beyond MiddleManager

Traditional Druid deployments relied on MiddleManager processes to spawn and manage indexing tasks. This approach worked well in simpler environments but created several challenges in Kubernetes: resource waste from always-running MiddleManager pods, complex resource allocation, and limited isolation between tasks.

The MM-less architecture leverages Kubernetes’ native job scheduling capabilities, treating each indexing task as a Kubernetes pod. This approach provides better resource utilization, stronger isolation, and simplified scaling.

Task Runner Configuration

The Overlord component orchestrates task execution through Kubernetes APIs instead of managing MiddleManager processes:

overlord:
  nodeType: "overlord"
  replicas: 1
  runtime.properties: |
    druid.indexer.runner.type=k8s
    druid.indexer.runner.namespace=druid
    druid.indexer.task.encapsulatedTask=true
    
    druid.indexer.runner.k8s.adapter.type=customTemplateAdapter
    druid.indexer.runner.k8s.podTemplate.base=/druid/tasktemplate/default/default-task-template.yaml
    druid.indexer.runner.primaryContainerName=main

The druid.indexer.runner.type=k8s configuration switches from traditional MiddleManager-based task execution to Kubernetes-native pods. Each indexing task becomes an independent pod with its own resource allocation and lifecycle.

Task Template Configuration

The task template defines how indexing pods are created. This ConfigMap-based approach provides flexibility and version control:

apiVersion: v1
kind: ConfigMap
metadata:
  name: default-task-template
  namespace: druid
data:
  default-task-template.yaml: |+
    apiVersion: v1
    kind: PodTemplate
    template:
      spec:
        restartPolicy: Never
        serviceAccount: default
        securityContext:
          fsGroup: 1000
          runAsGroup: 1000
          runAsUser: 1000
        containers:
        - name: main
          image: iunera/druid:29.0.1
          command: [sh, -c]
          args: ["/peon.sh /druid/data 1 --loadBroadcastSegments true"]

Advanced Task Management Configuration

The MM-less setup includes sophisticated task lifecycle management:

druid.indexer.runner.maxTaskDuration=PT4H
druid.indexer.runner.taskCleanupDelay=PT2H
druid.indexer.runner.taskCleanupInterval=PT10M
druid.indexer.runner.K8sjobLaunchTimeout=PT1H
druid.indexer.runner.graceTerminationPeriodSeconds=PT30S

These settings ensure that long-running tasks don’t consume resources indefinitely while providing sufficient time for complex indexing operations to complete. The cleanup intervals prevent accumulation of completed task pods.

Benefits Comparison: MiddleManager vs MM-less

AspectMiddleManagerMM-less (Kubernetes-native)
Resource EfficiencyAlways-running MM pods consume resourcesDynamic pod creation only when needed
IsolationTasks share MM process spaceEach task gets isolated pod
ScalingManual MM replica scaling requiredAutomatic scaling based on task queue
MonitoringComplex multi-level monitoringDirect pod-level monitoring
Resource AllocationFixed MM resources split between tasksDedicated resources per task
Failure RecoveryMM failure affects multiple tasksTask failures are isolated
Kubernetes IntegrationLimited integrationNative Kubernetes scheduling

Production Deployment Best Practices

Enterprise Druid deployments require production-grade strategies that work consistently across dev → staging → prod while remaining auditable and reversible. Building on the GitOps workflow from Part 2, we recommend a small set of guardrails and patterns that you can apply regardless of cluster size:

  • Environment strategy and promotion: Use Kustomize overlays per environment (dev, staging, prod) with clear promotion via Git PRs/tags. Keep environment deltas as patches, not forks.
  • Version pinning and immutability: Pin chart/app versions and container images by digest; use semantic versioning and Git tags for releases. Record change windows in Git history.
  • Progressive delivery: Start with blue‑green and optionally add canary for high‑risk changes; ensure fast, deterministic rollback paths.
  • High‑availability guardrails: Define PodDisruptionBudgets (PDBs), readiness/liveness probes, maxSurge/maxUnavailable, anti‑affinity/topologySpreadConstraints.
  • Autoscaling behavior: Tune HPA thresholds and scale-down policies; coordinate with Cluster Autoscaler capacity and set resource requests/limits to avoid CPU throttling.
  • Observability and SLOs: Standardize health endpoints, dashboards, and alerts for query latency, task success rate, segment loading, and JVM/OS resources.
  • Security by default: Least‑privilege ServiceAccounts and RBAC, NetworkPolicies, TLS everywhere with keystore rotation, image provenance/scanning, and secrets from sealed/encrypted sources.
  • Data protection: Scheduled metadata backups, deep storage replication, periodic restore drills, and backup verification jobs.
  • Cost and governance: Quotas and budgets per environment/tenant, cache sizing policies, and right‑sizing.
  • GitOps workflow integration: PR‑based changes, policy-as-code validation, drift detection, and manual approvals for production.

Recommended GitOps layout with Kustomize overlays:

druid/
  base/
    kustomization.yaml
    druidcluster.yaml
  overlays/
    dev/
      kustomization.yaml
      patches.yaml
    staging/
      kustomization.yaml
      patches.yaml
    prod/
      kustomization.yaml
      patches.yaml

Blue-Green Deployment Implementation

Blue-green deployments provide zero-downtime updates for critical Druid clusters:

# Blue deployment
metadata:
  name: iuneradruid-blue
  labels:
    deployment: blue
    app.kubernetes.io/version: "29.0.1"
---
# Green deployment
metadata:
  name: iuneradruid-green
  labels:
    deployment: green
    app.kubernetes.io/version: "29.0.2"

Service selectors can be updated to switch traffic between blue and green deployments, enabling instant rollbacks if issues arise.

Capacity Planning and Resource Allocation

Production Druid clusters require careful capacity planning based on expected workloads and growth projections.

Component Resource Recommendations

ComponentSmall ClusterMedium ClusterLarge Cluster
Broker8Gi memory, 2 CPU16Gi memory, 4 CPU32Gi memory, 8 CPU
Historical16Gi memory, 2 CPU32Gi memory, 4 CPU64Gi memory, 8 CPU
Coordinator4Gi memory, 1 CPU8Gi memory, 2 CPU16Gi memory, 4 CPU
Overlord2Gi memory, 1 CPU4Gi memory, 2 CPU8Gi memory, 4 CPU

Health Checks and Monitoring Implementation

Production deployments require comprehensive health monitoring beyond basic liveness probes:

readinessProbe:
  initialDelaySeconds: 60
  periodSeconds: 10
  failureThreshold: 30
  httpGet:
    path: /druid/broker/v1/readiness
    port: 8282
    scheme: HTTPS

livenessProbe:
  initialDelaySeconds: 1800
  periodSeconds: 5
  failureThreshold: 3
  httpGet:
    path: /status/health
    port: 8283
    scheme: HTTPS

The extended initialDelaySeconds for Historical nodes accounts for segment loading time during startup.

Backup and Disaster Recovery Planning

Metadata Backup Strategy

# Automated PostgreSQL backup
kubectl create cronjob postgres-backup \
  --image=postgres:15 \
  --schedule="0 2 * * *" \
  --restart=OnFailure \
  -- pg_dump -h postgres-postgresql-hl -U postgres postgres > /backup/druid-metadata-$(date +%Y%m%d).sql

Deep Storage Backup Verification

# S3 backup verification job
apiVersion: batch/v1
kind: CronJob
metadata:
  name: druid-backup-verification
spec:
  schedule: "0 4 * * 0"  # Weekly
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup-verify
            image: minio/mc:latest
            command: ["/bin/sh", "-c"]
            args: ["mc mirror --dry-run s3/druid-segments s3/druid-segments-backup"]

Integration with GitOps Workflow

Advanced FluxCD Configuration

Building on the GitOps foundation from Part 2, production Druid deployments benefit from sophisticated FluxCD configurations that handle complex dependency management and multi-cluster deployments.

The complete GitOps workflow integrates several FluxCD resources:

apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: GitRepository
metadata:
  name: druid-cluster-config
  namespace: flux-system
spec:
  interval: 5m0s
  ref:
    branch: main
  secretRef:
    name: flux-system
  url: ssh://git@github.com/iunera/druid-cluster-config
---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: druid-cluster
  namespace: flux-system
spec:
  interval: 10m0s
  path: ./kubernetes/druid/druidcluster
  prune: true
  sourceRef:
    kind: GitRepository
    name: druid-cluster-config
  dependsOn:
  - name: druid-operator
  - name: druid-secrets

Configuration Drift Detection

GitOps provides automatic detection of configuration drift—when running configurations diverge from the Git repository state:

spec:
  interval: 5m0s  # Check for drift every 5 minutes
  prune: true     # Remove resources not in Git
  force: true     # Override manual changes

This configuration ensures your production Druid cluster always matches the declared state in your Git repository.

Multi-Cluster Deployment Patterns

Enterprise environments often require deploying Druid across multiple Kubernetes clusters for high availability or geographic distribution:

# Cluster-specific kustomization
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: druid-us-east-1
  namespace: flux-system
spec:
  path: ./clusters/us-east-1/druid
  postBuild:
    substitute:
      cluster_region: "us-east-1"
      s3_endpoint: "s3.us-east-1.amazonaws.com"

Rollback Strategies and Version Control

GitOps enables sophisticated rollback strategies through Git operations:

# Immediate rollback to previous commit
git revert HEAD --no-edit
git push origin main

# Rollback to specific version
git reset --hard <commit-hash>
git push --force-with-lease origin main

FluxCD automatically detects these changes and applies the rollback to your Kubernetes cluster.

Advanced Integration with Conversational AI

For enterprises leveraging advanced analytics capabilities, consider integrating with Apache Druid MCP server for conversational AI capabilities. This integration enables natural language querying of your time series data, making Druid accessible to non-technical stakeholders.

Deployment Patterns Comparison

PatternTraditionalBlue-GreenCanaryRolling
DowntimeHighNoneMinimalNone
RiskHighLowVery LowMedium
Resource UsageLowHighMediumLow
ComplexityLowMediumHighLow
Rollback SpeedSlowInstantFastMedium
Testing CapabilityLimitedFullPartialLimited

Frequently Asked Questions

What are the advantages of MM-less Druid deployment?

You eliminate always-on MiddleManagers, run each indexing task as its own Kubernetes Pod/Job, right‑size resources per task, improve isolation and observability, and unlock native autoscaling. Troubleshooting is also simpler because each task has its own pod logs and lifecycle.

Do I still need Zookeeper?

No—when you enable Kubernetes-native discovery (druid.discovery.type=k8s) and disable ZK (druid.zk.service.enabled=false), Druid uses the Kubernetes API for discovery/announcements. Ensure the druid-kubernetes-extensions is loaded and RBAC for Pods/Endpoints/Services/Leases is in place (see “Zookeeper-less Installation”).

How should I size resources for production?

Start from the guidance in this article: Brokers 16–32Gi/4–8 CPU, Historicals 32–64Gi with 200–500Gi ephemeral for segment cache, Coordinators 8–16Gi/2–4 CPU, Overlord 2–4Gi/1–2 CPU in MM‑less setups. Adjust based on query concurrency, segment size, and MSQ usage; validate with load tests.

How do I configure TLS with Java keystores?

Mount keystores at /druid/jks, pass passwords via Secrets/env, set client/server TLS properties in common.runtime.properties, and disable plaintext ports (druid.enablePlaintextPort=false). Use TLSv1.2+ and approved cipher suites. Update probes and Services to use HTTPS schemes/ports.

What’s the recommended Backup/DR approach?

Back up the metadata store (e.g., PostgreSQL CronJobs), verify deep storage replication, and run periodic restore drills. Automate verification jobs (e.g., S3 mirror dry‑runs) and keep encryption keys separate. Use GitOps history for deterministic rollbacks.

How can I troubleshoot task execution in MM-less mode?

Check Overlord logs for scheduling and Kubernetes runner messages, verify service account RBAC, and inspect individual task pods with kubectl logs/kubectl describe. Confirm the task template ConfigMap is mounted and the customTemplateAdapter path is correct.

Which production metrics matter most?

Query latency/throughput, task success rate, segment loading/dropping, JVM heap and direct memory, cache hit rate, disk usage for segment cache, and inter-component network I/O. Alert on query failures, slow segment loading, and resource saturation.

How do I rotate certificates without downtime?

Update the TLS Secret, trigger rolling restarts, and rely on readiness/liveness probes using HTTPS to validate. Consider cert‑manager for automated renewals. Use GitOps so rotations are auditable and revertible.

How do I migrate from an existing Zookeeper/MM cluster to Kubernetes‑native?

Stage a parallel cluster with K8s discovery and MM‑less runner, mirror ingestion, validate queries/latency, then flip traffic using blue‑green or router/service selector switches. Keep deep storage shared (or replicate) to minimize re‑ingestion when feasible.

Is pac4j required?

No. This guide uses pac4j (OIDC) for human SSO and basic auth for service users, but you can start with basic auth/escalator only and add pac4j later. Authorization is handled via the basic authorizer in this setup.

Conclusion

You now have a production‑grade, TLS‑only, MM‑less Apache Druid cluster running natively on Kubernetes and managed via GitOps. The patterns in this guide—Kubernetes service discovery, pod‑templated task execution, hardened RBAC, and environment‑based configuration—give you a repeatable, auditable path from dev to prod.

Go‑live checklist

  • TLS everywhere with secret rotation procedure validated
  • RBAC and service accounts least‑privilege, including task pods
  • Health/readiness probes using HTTPS and realistic initial delays
  • S3 deep storage and index logs configured with lifecycle/retention
  • Backups and a restore drill completed for the metadata store
  • Blue‑green or rollback plan documented and tested
  • Dashboards and alerts for query latency, task success, segment loading, JVM/direct memory, and cache hit rate

What’s next
In the next part of the series we implement authentication and authorization, combining pac4j (OIDC) for human users with Druid’s basic authorizer for service‑to‑service access, and we harden roles, users, and secret management to meet enterprise requirements.