Skip to main content

Recipes

Common production add-on combinations. Each recipe is a standalone main.tf — drop it into your infrastructure repo alongside your EKS cluster definition.


Recipe 1 — Minimal Production Cluster

What it installs: Metrics Server, Cluster Autoscaler, AWS Load Balancer Controller, External DNS, cert-manager, External Secrets

The smallest set of add-ons that makes an EKS cluster production-ready:

  • Workloads can scale (HPA + node autoscaling)
  • Services get load balancers and DNS records automatically
  • TLS certificates are provisioned and renewed automatically
  • Secrets are pulled from AWS Secrets Manager — no plaintext in manifests
module "eks_addons" {
source = "git::https://github.com/clouddrove/terraform-aws-eks-addons.git?ref=0.0.7"

eks_cluster_name = module.eks.cluster_name
data_plane_wait_arn = module.eks.data_plane_wait_arn

tags = {
Environment = "production"
ManagedBy = "terraform"
}

# HPA and kubectl top support
metrics_server = true

# Node group autoscaling
cluster_autoscaler = true
cluster_autoscaler_helm_config = {
version = "9.29.0"
}

# ALB/NLB provisioning for Ingress and Service resources
aws_load_balancer_controller = true

# Automatic Route 53 records from Ingress/Service hostnames
external_dns = true
external_dns_iampolicy_json_content = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = ["route53:ChangeResourceRecordSets"]
Resource = "arn:aws:route53:::hostedzone/${var.hosted_zone_id}"
},
{
Effect = "Allow"
Action = ["route53:ListHostedZones", "route53:ListResourceRecordSets"]
Resource = "*"
}
]
})

# Automatic TLS certs via Let's Encrypt
certification_manager = true

# Sync secrets from AWS Secrets Manager → Kubernetes Secrets
external_secrets = true
external_secrets_iampolicy_json_content = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = ["secretsmanager:GetSecretValue", "secretsmanager:DescribeSecret"]
Resource = "arn:aws:secretsmanager:${var.region}:${var.account_id}:secret:${var.env}/*"
}
]
})
}

Recipe 2 — Full Observability Stack

What it installs: Prometheus, Grafana, Loki, FluentBit, Kube State Metrics, Node Termination Handler

Metrics, logs, and dashboards — all in-cluster:

  • Prometheus scrapes all pods and nodes
  • Loki aggregates logs from all pods via FluentBit
  • Grafana visualizes both metrics and logs with a single datasource config
  • Kube State Metrics provides Kubernetes object-level metrics (pod restarts, deployment status, etc.)
module "eks_addons" {
source = "git::https://github.com/clouddrove/terraform-aws-eks-addons.git?ref=0.0.7"

eks_cluster_name = module.eks.cluster_name
data_plane_wait_arn = module.eks.data_plane_wait_arn

tags = { Environment = var.env, ManagedBy = "terraform" }

# Required by Grafana for its ingress
aws_load_balancer_controller = true
metrics_server = true

# Metrics collection and alerting
prometheus = true
prometheus_helm_config = {
version = "25.11.0"
values = [
<<-EOT
alertmanager:
enabled: true
server:
retention: "15d"
persistentVolume:
size: 50Gi
EOT
]
}

# Dashboards — exposed via ALB Ingress
grafana = true
grafana_helm_config = {
version = "7.2.5"
values = [
<<-EOT
datasources:
datasources.yaml:
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: http://prometheus-server
isDefault: true
- name: Loki
type: loki
url: http://loki:3100
grafana.ini:
auth.anonymous:
enabled: false
EOT
]
}

# Log aggregation
loki = true
loki_helm_config = {
version = "5.43.3"
values = [
<<-EOT
loki:
commonConfig:
replication_factor: 1
storage:
type: filesystem
singleBinary:
replicas: 1
EOT
]
}

# Log forwarding from every pod to Loki + CloudWatch
fluent_bit = true
fluent_bit_helm_config = {
version = "0.43.0"
values = [
<<-EOT
config:
outputs: |
[OUTPUT]
Name loki
Match *
Host loki.monitoring.svc.cluster.local
Port 3100
Labels job=fluentbit
[OUTPUT]
Name cloudwatch_logs
Match *
region ${var.region}
log_group_name /eks/${var.cluster_name}/application
log_stream_prefix pod-
auto_create_group true
EOT
]
}

# Kubernetes object metrics for Prometheus
kube_state_metrics = true

# Alert on Spot interruptions
aws_node_termination_handler = true

# Pod restart alerting (Slack/PagerDuty notifications on OOMKilled, CrashLoopBackOff)
k8s_pod_restart_info_collector = true
}

Recipe 3 — Karpenter Autoscaling

What it installs: Karpenter, Metrics Server, AWS Load Balancer Controller

Karpenter replaces Cluster Autoscaler with a faster, more flexible node provisioner. It provisions the right instance type for each workload — no pre-defined node groups required.

module "eks_addons" {
source = "git::https://github.com/clouddrove/terraform-aws-eks-addons.git?ref=0.0.7"

eks_cluster_name = module.eks.cluster_name
data_plane_wait_arn = module.eks.data_plane_wait_arn

tags = { Environment = var.env, ManagedBy = "terraform" }

metrics_server = true
aws_load_balancer_controller = true

# Karpenter — node provisioner
karpenter = true
karpenter_helm_config = {
version = "0.35.0"
values = [
<<-EOT
settings:
clusterName: ${var.cluster_name}
clusterEndpoint: ${var.cluster_endpoint}
interruptionQueue: ${var.karpenter_interruption_queue}
EOT
]
}

# Custom IAM policy — restrict which instance types Karpenter can launch
karpenter_iampolicy_json_content = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"ec2:RunInstances", "ec2:TerminateInstances",
"ec2:DescribeInstances", "ec2:DescribeInstanceTypes",
"ec2:DescribeSubnets", "ec2:DescribeSecurityGroups",
"ec2:DescribeLaunchTemplates", "ec2:CreateLaunchTemplate",
"ec2:DeleteLaunchTemplate", "ec2:CreateFleet",
"ec2:CreateTags", "iam:PassRole",
"ssm:GetParameter"
]
Resource = "*"
}
]
})
}

After applying, create a Karpenter NodePool and EC2NodeClass manifest:

# karpenter-nodepool.yaml
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
nodeClassRef:
name: default
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "m", "r"]
limits:
cpu: 1000
disruption:
consolidationPolicy: WhenUnderutilized
---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: AL2
role: KarpenterNodeRole-${cluster_name}
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: ${cluster_name}
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: ${cluster_name}

Recipe 4 — Service Mesh with Istio

What it installs: AWS Load Balancer Controller, Istio Ingress, Kiali, Calico

Mutual TLS between services, traffic management, and observability with Kiali. Calico provides NetworkPolicy enforcement.

module "eks_addons" {
source = "git::https://github.com/clouddrove/terraform-aws-eks-addons.git?ref=0.0.7"

eks_cluster_name = module.eks.cluster_name
data_plane_wait_arn = module.eks.data_plane_wait_arn

tags = { Environment = var.env, ManagedBy = "terraform" }

metrics_server = true

# Required by Istio Ingress and Grafana
aws_load_balancer_controller = true

# Istio ingress gateway
istio_ingress = true
istio_ingress_helm_config = {
version = "1.20.0"
}
# Paths to Istio Gateway and VirtualService manifests
istio_manifests = {
istio_ingress_manifest_file_path = ["${path.module}/manifests/istio-ingress.yaml"]
istio_gateway_manifest_file_path = ["${path.module}/manifests/istio-gateway.yaml"]
}

# Kiali service mesh dashboard (requires istio_ingress = true)
kiali_server = true
kiali_manifests = {
kiali_virtualservice_file_path = "${path.module}/manifests/kiali-virtualservice.yaml"
}

# Calico network policy (eBPF dataplane)
calico_tigera = true
calico_tigera_helm_config = {
version = "3.27.0"
}

# Reload pods when ConfigMaps or Secrets change
reloader = true
}
Namespace labelling

For Istio sidecar injection, label your application namespaces:

kubectl label namespace my-app istio-injection=enabled

Recipe 5 — Secrets + Backup

What it installs: External Secrets, Velero, Reloader

For teams that need audit-compliant secret management and cluster disaster recovery:

module "eks_addons" {
source = "git::https://github.com/clouddrove/terraform-aws-eks-addons.git?ref=0.0.7"

eks_cluster_name = module.eks.cluster_name
data_plane_wait_arn = module.eks.data_plane_wait_arn

tags = { Environment = var.env, ManagedBy = "terraform" }

# Sync secrets from Secrets Manager → Kubernetes Secrets
external_secrets = true

# Backup cluster resources + PVs to S3
velero = true
velero_helm_config = {
version = "6.0.0"
values = [
<<-EOT
configuration:
backupStorageLocation:
- name: default
provider: aws
bucket: ${var.velero_bucket}
config:
region: ${var.region}
volumeSnapshotLocation:
- name: default
provider: aws
config:
region: ${var.region}
initContainers:
- name: velero-plugin-for-aws
image: velero/velero-plugin-for-aws:v1.9.0
volumeMounts:
- mountPath: /target
name: plugins
EOT
]
}

# Automatically restart pods when secrets are updated by External Secrets
reloader = true
}

Velero IAM policy — the module creates an IRSA role. To use a custom policy:

velero_iampolicy_json_content = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = ["s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucket"]
Resource = [
"arn:aws:s3:::${var.velero_bucket}",
"arn:aws:s3:::${var.velero_bucket}/*"
]
},
{
Effect = "Allow"
Action = ["ec2:CreateSnapshot", "ec2:DeleteSnapshot", "ec2:DescribeSnapshots",
"ec2:CreateTags", "ec2:DescribeVolumes"]
Resource = "*"
}
]
})

Variable Reference

VariableTypeDefaultDescription
eks_cluster_namestring""EKS cluster name
data_plane_wait_arnstring""ARN to wait on before installing
manage_via_gitopsboolfalseSkip Helm installs; create IRSA only
tagsmap(any){}Tags for IAM resources
irsa_iam_role_pathany{}IAM role path for IRSA roles
irsa_iam_permissions_boundaryany{}Permissions boundary for IRSA roles

Reference: terraform-aws-eks-addons →