This article is part of a series on deploying a production‑ready Apache Druid cluster on Kubernetes. In Part 2, we focus on deployment preparation: selecting and installing the right Druid operator and implementing end‑to‑end TLS. Other parts cover the infrastructure foundation, cluster configuration and tuning, authentication and authorization, and day‑2 operations. See Part 1

Introduction
Deploying Apache Druid on Kubernetes is a production engineering task, not a container exercise. The objective is an enterprise‑grade, secure, and operable analytics platform. Success depends on upfront preparation—most notably, selecting a maintained Druid operator aligned with Kubernetes‑native patterns and implementing robust TLS across all traffic.
This guide details the preparation steps that reliably reduce operational risk in enterprise Druid environments. Whether you support big data time‑series applications or other high‑throughput analytics, decisions made here directly influence stability, security, and total cost of ownership.
As Adheip Singh has shown in operator‑driven Druid deployments, the right foundation turns ongoing operations from reactive maintenance into predictable automation. Proper preparation is essential.
Summary
This guide covers two critical steps to prepare a production‑ready Apache Druid deployment on Kubernetes: selecting and installing the right Druid operator and implementing end‑to‑end TLS/SSL for both north‑south and east‑west traffic. We explain why the datainfrahq operator is the preferred choice, provide step‑by‑step installation guidance, and show how to generate and package Java keystore/truststore artifacts for secure operation.
Key outcomes:
- Install the datainfrahq Apache Druid operator on Kubernetes and verify it using a minimal cluster.
- Set up GitOps with FluxCD for consistent, auditable deployments.
- Generate Java keystores/truststores and package them as encrypted Kubernetes Secrets using SOPS.
- Enable TLS for internal Druid services and user‑facing endpoints.
Prerequisites:
- A Kubernetes cluster (v1.25+) and kubectl configured; cluster‑admin permissions.
- FluxCD installed (or ready to install) for GitOps workflows. (See Part 1 – Infrastructure Setup for Enterprise Apache Druid on Kubernetes)
- SOPS with AGE keys configured (see Part 1) and access to a secrets repository.
- Java keytool and OpenSSL available locally; optional pwgen or openssl.
- Dedicated Kubernetes namespaces for Druid and for the operator (e.g. druid, druid-operator-system).
- Access to a Git repository for druid‑cluster‑config like this one.
Table of contents
- Introduction
- Summary
- 1) Install the Apache Druid Operator (datainfrahq)
- 2) Configure TLS: Certificates, Keystore, and Truststore
- Comparison: Druid Operators
- Comparison: TLS certificate management approaches
- Comparison: Manual vs. automated certificate deployment
- Comparison: Security considerations by method
- Conclusion
- FAQ
1) Install the Apache Druid Operator (datainfrahq)
Why datainfrahq? In our production reference, it aligns with Kubernetes‑native Druid patterns (e.g., no ZooKeeper for service discovery, Kubernetes Job‑based ingestion, autoscaling patterns), provides current CRDs, and exhibits controller behavior suitable for enterprise clusters.
Note: The similarly named druid‑io operator has historically lagged in maintenance and feature coverage and may not align with the Kubernetes‑native Druid patterns used here. For production deployments, prefer the datainfrahq operator.
Installation via Manifest (GitOps‑friendly)
The repository includes a compiled manifest that creates the CRD and installs the controller into druid-operator-system. In the following examples we refer to the https://github.com/iunera/druid-cluster-config repository.
# 0) Create namespace idempotently (safe if it already exists) kubectl create namespace druid-operator-system --dry-run=client -o yaml | kubectl apply -f - # 1) Apply the operator bundle (CRDs + controller) kubectl apply -f druid-cluster-config/kubernetes/druid-operator-system/druid-operator.manifest.yaml
Verify the Druid operator installation
# CRD present? kubectl get crd druids.druid.apache.org # Controller pod(s) running? kubectl -n druid-operator-system get pods -l app.kubernetes.io/name=druid-operator -o wide # Controller rollout healthy? kubectl -n druid-operator-system rollout status deploy -l app.kubernetes.io/name=druid-operator # Inspect logs (tail and follow) kubectl -n druid-operator-system logs -l app.kubernetes.io/name=druid-operator --tail=200 -f
Test the operator with a tiny cluster
To confirm the operator works correctly, deploy a minimal Druid cluster using the example from the repository, verify it’s running, and then clean it up:
# Create druid namespace if it doesn't exist kubectl create namespace druid --dry-run=client -o yaml | kubectl apply -f - # Deploy the tiny cluster example kubectl apply -f https://raw.githubusercontent.com/datainfrahq/druid-operator/master/examples/tiny-cluster-mmless.yaml # Wait for the cluster to be ready (may take a few minutes) kubectl -n druid get druid tiny-cluster -w # Verify pods are running kubectl -n druid get pods -l druid_cr=tiny-cluster # Check cluster status kubectl -n druid describe druid tiny-cluster # Clean up the test cluster kubectl delete -f https://raw.githubusercontent.com/datainfrahq/druid-operator/master/examples/tiny-cluster-mmless.yaml
This test confirms that the operator can successfully create and manage Druid clusters. The tiny cluster uses minimal resources and demonstrates basic functionality without requiring external dependencies like deep storage.
Common troubleshooting
- CRD missing after apply: re‑run apply, then
kubectl describe crd druids.druid.apache.org
to surface errors. - Namespace mismatch: controller resources expect the druid-operator-system namespace. Ensure it exists and that ServiceAccount/RoleBindings point there.
- Admission webhooks failing: verify certs in the manifest were installed; check
kubectl -n druid-operator-system get validatingwebhookconfigurations
. - Permissions:
kubectl auth can-i --list --namespace druid-operator-system
to confirm RBAC for the operator’s ServiceAccount.
Optional operator configuration examples
Limit the operator’s scope to your Druid namespace(s) by setting a watch namespace on the Deployment (common for kubebuilder controllers):
# Constrain operator to watch only the 'druid' namespace (example) kubectl -n druid-operator-system set env deployment/druid-operator WATCH_NAMESPACE=druid # Verify env is set kubectl -n druid-operator-system describe deploy druid-operator | sed -n '/Environment:/,$p' | head -n 20
Combine these manifests in your GitOps repository for final deployment.
GitOps with FluxCD: Dedicated Apache Druid Repository
For production environments, we recommend separating the Druid cluster configuration into a dedicated repository that’s managed via https://fluxcd.io/flux/. This approach provides better isolation, security, and change management for your Druid deployment.
Setting up the druid-cluster-config Repository
The GitOps approach uses a separate repository (e.g., druid-cluster-config
) that contains all Druid-specific configurations, manifests, and secrets. This separation allows for:
- Isolated access control: Different teams can manage infrastructure vs. Druid configurations
- Independent release cycles: Druid updates don’t require changes to the main infrastructure repository
- Enhanced security: Druid-specific secrets and configurations are contained in a dedicated repository
FluxCD Configuration for Druid Repository
Create a GitRepository source that points to your druid-cluster-config repository:
apiVersion: source.toolkit.fluxcd.io/v1 kind: GitRepository metadata: name: druid-cluster-config namespace: flux-system spec: interval: 1m0s ref: branch: main timeout: 60s url: ssh://git@github.com/iunera/druid-cluster-config
Then create a Kustomization that deploys the Druid configuration:
apiVersion: kustomize.toolkit.fluxcd.io/v1 kind: Kustomization metadata: name: druid-cluster-config namespace: flux-system spec: interval: 10m0s path: ./kubernetes/ prune: true sourceRef: kind: GitRepository name: druid-cluster-config
Repository Structure
The druid-cluster-config repository should follow this structure:
kubernetes/ ├── druid/ │ ├── druidcluster/ │ │ └── iuneradruid-cluster.yaml │ ├── secrets/ │ │ ├── keystores-secret.enc.yaml │ │ └── postgres-secret.enc.yaml │ └── kustomization.yaml ├── druid-operator-system/ │ ├── druid-operator.manifest.yaml │ └── kustomization.yaml └── kustomization.yaml
If you don’t like to store the encrypted secrets in the repository, you can create a separate repository like we have already done in the first part of this series.
This GitOps setup ensures that your Druid deployment is fully managed through code, providing the reliability and traceability required for enterprise production environments.
2) Configure TLS: Certificates, Keystore, and Truststore
Enterprise Druid requires TLS for internal traffic (brokers, coordinators, historicals, ingestors, routers) and for user access to the web console. Strong defaults plus automation make audits simpler and reduce operational risk.
Generate keystore and truststore (Java JKS)
Commands aligned with this repository’s reference (Java keytool + strong password):
# Generate keystore with a long random password (requires pwgen). The password is also saved for later use. keytool -keystore keystore.jks -storepass $(pwgen 64 -n1 -s | tr -d '\n' | tee keystorepassword) -keypass $(cat keystorepassword) -genkey -alias druid -keyalg RSA -keysize 4096 -validity 3650 -dname "CN=druid" -storetype JKS # Export the certificate in PEM format keytool -export -alias druid -keystore keystore.jks -rfc -file druid.cert -storepass $(cat keystorepassword) # OPTIONAL: base truststore from Java defaults cp -v $JAVA_HOME/lib/security/cacerts ./truststore.jks # Trust the new Druid cert (default cacerts password is 'changeit') keytool -import -file druid.cert -storepass changeit -alias druid -keystore truststore.jks -noprompt -trustcacerts -storetype JKS
Notes:
- If
pwgen
is unavailable, useopenssl rand -base64 64 | tr -d '\n' | tee keystorepassword
. - You can use a real CN/SAN that matches how services connect (FQDN or service name) or the enterprise PKI and create from that a Keystore.
Package TLS into a Kubernetes Secret
Store artifacts in the druid namespace as a single secret that workloads can mount or reference.
# Apply directly kubectl --namespace=druid \ create secret generic keystores \ --from-file=keystore.jks \ --from-file=keystorepassword \ --from-file=truststore.jks # Or: render YAML for GitOps (encrypt this file with SOPS before committing) kubectl --namespace=druid \ create secret generic keystores \ --from-file=keystore.jks \ --from-file=keystorepassword \ --from-file=truststore.jks \ --dry-run=client -o yaml \ > druid-jks-keystores-secret.yaml # Encrypt the secret file with SOPS (using the .sops.yaml config from part 1) sops -e druid-jks-keystores-secret.yaml > druid-jks-keystores-secret.enc.yaml # Remove the unencrypted file for security rm druid-jks-keystores-secret.yaml # Move to your GitOps repository structure (adjust path as needed) mv druid-jks-keystores-secret.enc.yaml kubernetes/sops-secrets/druid/
Validate your TLS artifacts
# Inspect keystore contents keytool -list -v -keystore keystore.jks -storepass $(cat keystorepassword) | head -n 40 # Inspect the exported certificate openssl x509 -in druid.cert -noout -text | head -n 40 # Confirm secret exists and contains keys kubectl -n druid get secret keystores -o yaml | sed -n '1,80p'
Automation & security recommendations
- Put the JKS generation and Secret rendering behind a Make target or script; keep it idempotent and environment‑aware (dev/stage/prod).
- Use SOPS for at‑rest encryption of YAML secrets in Git.
- Rotate certificates on a schedule (every 6–12 months) and script the rollout with zero‑downtime restarts.
Comparison: Druid Operators
Aspect | datainfrahq operator | druid‑io operator |
---|---|---|
Production alignment | Kubernetes‑native Druid patterns, current CRDs | Divergent feature set and cadence |
Namespace | druid-operator-system (in this ref) | varies |
Verification | CRD: druids.druid.apache.org | different CRDs |
Packaging | Works with manifest or Helm; integrates with GitOps | differs |
Helm charts ecosystem | Iunera Helm Charts | — |
Comparison: TLS certificate management approaches
Approach | Pros | Cons | Typical use |
---|---|---|---|
Self‑signed JKS (per‑cluster) | Fast, fully offline, simple tooling | Requires managing truststore; rotation overhead | Internal clusters and POCs that still need TLS |
Corporate CA‑signed (internal PKI) | Enterprise trust, auditable issuance, centralized lifecycle | Requires PKI access, approval workflows | Regulated enterprises, prod |
ACME/Ingress termination | Automated issuance/renewal | Not ideal for pod‑to‑pod encryption; dependency on ingress | Public endpoints; combine with internal JKS for east‑west |
Comparison: Manual vs. automated certificate deployment
Deployment | Pros | Cons | Notes |
---|---|---|---|
Manual secret apply | Simple, no extra tooling | Human error, drift risk | Use only for tests |
GitOps + SOPS | Auditable, consistent, encrypted at rest | Requires GitOps setup | Preferred for prod |
External secret operator | Integrates vault/KMS dynamically | Extra controller to manage | Great for large orgs |
Comparison: Security considerations by method
Area | Manual | GitOps + SOPS | External secret operator |
---|---|---|---|
At‑rest secret encryption | Optional | Strong (SOPS) | Outsourced (vault/KMS) |
RBAC blast radius | Higher | Scoped via kustomizations | Scoped via external policies |
Rotation playbooks | Ad‑hoc | Documented in Git | Centralized with vault policies |
Conclusion
A production‑ready Apache Druid deployment on Kubernetes depends on two foundations: installing the maintained datainfrahq Druid operator and enabling end‑to‑end TLS from day one. With the operator running in druid‑operator‑system and a hardened JKS secret in the druid namespace, you have the prerequisites in place for a secure, enterprise‑grade cluster. In the next part of this series, we use these building blocks to deploy and validate the full cluster.
FAQ
Which Druid operator should I choose for production deployments?
Use the datainfrahq Druid operator. It aligns with Kubernetes‑native Druid patterns and the CRDs used in this guide.
How do I verify that the operator is working?
Check for the CRD (druids.druid.apache.org
) and a healthy controller pod in druid-operator-system
. Use the kubectl verification commands above.
kubectl apply succeeded but I don’t see the CRD—what’s wrong?
Re‑apply, then describe the CRD for errors. Ensure your kube‑RBAC allows the operator to create CRDs and that your kubectl context is correct.
How should I manage TLS certificate rotation?
Automate it. Regenerate keystores on a schedule, push the updated Secret YAML encrypted with SOPS, and roll your Druid pods with zero downtime (staggered restarts).
Can I terminate TLS at the ingress only?
You can, but for enterprise environments you should also encrypt east‑west (pod‑to‑pod) traffic with a keystore/truststore to protect internal data paths.