Recipes
Common production add-on combinations. Each recipe is a standalone main.tf — drop it into your infrastructure repo alongside your EKS cluster definition.
Recipe 1 — Minimal Production Cluster
What it installs: Metrics Server, Cluster Autoscaler, AWS Load Balancer Controller, External DNS, cert-manager, External Secrets
The smallest set of add-ons that makes an EKS cluster production-ready:
- Workloads can scale (HPA + node autoscaling)
- Services get load balancers and DNS records automatically
- TLS certificates are provisioned and renewed automatically
- Secrets are pulled from AWS Secrets Manager — no plaintext in manifests
module "eks_addons" {
source = "git::https://github.com/clouddrove/terraform-aws-eks-addons.git?ref=0.0.7"
eks_cluster_name = module.eks.cluster_name
data_plane_wait_arn = module.eks.data_plane_wait_arn
tags = {
Environment = "production"
ManagedBy = "terraform"
}
# HPA and kubectl top support
metrics_server = true
# Node group autoscaling
cluster_autoscaler = true
cluster_autoscaler_helm_config = {
version = "9.29.0"
}
# ALB/NLB provisioning for Ingress and Service resources
aws_load_balancer_controller = true
# Automatic Route 53 records from Ingress/Service hostnames
external_dns = true
external_dns_iampolicy_json_content = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = ["route53:ChangeResourceRecordSets"]
Resource = "arn:aws:route53:::hostedzone/${var.hosted_zone_id}"
},
{
Effect = "Allow"
Action = ["route53:ListHostedZones", "route53:ListResourceRecordSets"]
Resource = "*"
}
]
})
# Automatic TLS certs via Let's Encrypt
certification_manager = true
# Sync secrets from AWS Secrets Manager → Kubernetes Secrets
external_secrets = true
external_secrets_iampolicy_json_content = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = ["secretsmanager:GetSecretValue", "secretsmanager:DescribeSecret"]
Resource = "arn:aws:secretsmanager:${var.region}:${var.account_id}:secret:${var.env}/*"
}
]
})
}
Recipe 2 — Full Observability Stack
What it installs: Prometheus, Grafana, Loki, FluentBit, Kube State Metrics, Node Termination Handler
Metrics, logs, and dashboards — all in-cluster:
- Prometheus scrapes all pods and nodes
- Loki aggregates logs from all pods via FluentBit
- Grafana visualizes both metrics and logs with a single datasource config
- Kube State Metrics provides Kubernetes object-level metrics (pod restarts, deployment status, etc.)
module "eks_addons" {
source = "git::https://github.com/clouddrove/terraform-aws-eks-addons.git?ref=0.0.7"
eks_cluster_name = module.eks.cluster_name
data_plane_wait_arn = module.eks.data_plane_wait_arn
tags = { Environment = var.env, ManagedBy = "terraform" }
# Required by Grafana for its ingress
aws_load_balancer_controller = true
metrics_server = true
# Metrics collection and alerting
prometheus = true
prometheus_helm_config = {
version = "25.11.0"
values = [
<<-EOT
alertmanager:
enabled: true
server:
retention: "15d"
persistentVolume:
size: 50Gi
EOT
]
}
# Dashboards — exposed via ALB Ingress
grafana = true
grafana_helm_config = {
version = "7.2.5"
values = [
<<-EOT
datasources:
datasources.yaml:
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: http://prometheus-server
isDefault: true
- name: Loki
type: loki
url: http://loki:3100
grafana.ini:
auth.anonymous:
enabled: false
EOT
]
}
# Log aggregation
loki = true
loki_helm_config = {
version = "5.43.3"
values = [
<<-EOT
loki:
commonConfig:
replication_factor: 1
storage:
type: filesystem
singleBinary:
replicas: 1
EOT
]
}
# Log forwarding from every pod to Loki + CloudWatch
fluent_bit = true
fluent_bit_helm_config = {
version = "0.43.0"
values = [
<<-EOT
config:
outputs: |
[OUTPUT]
Name loki
Match *
Host loki.monitoring.svc.cluster.local
Port 3100
Labels job=fluentbit
[OUTPUT]
Name cloudwatch_logs
Match *
region ${var.region}
log_group_name /eks/${var.cluster_name}/application
log_stream_prefix pod-
auto_create_group true
EOT
]
}
# Kubernetes object metrics for Prometheus
kube_state_metrics = true
# Alert on Spot interruptions
aws_node_termination_handler = true
# Pod restart alerting (Slack/PagerDuty notifications on OOMKilled, CrashLoopBackOff)
k8s_pod_restart_info_collector = true
}
Recipe 3 — Karpenter Autoscaling
What it installs: Karpenter, Metrics Server, AWS Load Balancer Controller
Karpenter replaces Cluster Autoscaler with a faster, more flexible node provisioner. It provisions the right instance type for each workload — no pre-defined node groups required.
module "eks_addons" {
source = "git::https://github.com/clouddrove/terraform-aws-eks-addons.git?ref=0.0.7"
eks_cluster_name = module.eks.cluster_name
data_plane_wait_arn = module.eks.data_plane_wait_arn
tags = { Environment = var.env, ManagedBy = "terraform" }
metrics_server = true
aws_load_balancer_controller = true
# Karpenter — node provisioner
karpenter = true
karpenter_helm_config = {
version = "0.35.0"
values = [
<<-EOT
settings:
clusterName: ${var.cluster_name}
clusterEndpoint: ${var.cluster_endpoint}
interruptionQueue: ${var.karpenter_interruption_queue}
EOT
]
}
# Custom IAM policy — restrict which instance types Karpenter can launch
karpenter_iampolicy_json_content = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"ec2:RunInstances", "ec2:TerminateInstances",
"ec2:DescribeInstances", "ec2:DescribeInstanceTypes",
"ec2:DescribeSubnets", "ec2:DescribeSecurityGroups",
"ec2:DescribeLaunchTemplates", "ec2:CreateLaunchTemplate",
"ec2:DeleteLaunchTemplate", "ec2:CreateFleet",
"ec2:CreateTags", "iam:PassRole",
"ssm:GetParameter"
]
Resource = "*"
}
]
})
}
After applying, create a Karpenter NodePool and EC2NodeClass manifest:
# karpenter-nodepool.yaml
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
nodeClassRef:
name: default
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "m", "r"]
limits:
cpu: 1000
disruption:
consolidationPolicy: WhenUnderutilized
---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: AL2
role: KarpenterNodeRole-${cluster_name}
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: ${cluster_name}
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: ${cluster_name}
Recipe 4 — Service Mesh with Istio
What it installs: AWS Load Balancer Controller, Istio Ingress, Kiali, Calico
Mutual TLS between services, traffic management, and observability with Kiali. Calico provides NetworkPolicy enforcement.
module "eks_addons" {
source = "git::https://github.com/clouddrove/terraform-aws-eks-addons.git?ref=0.0.7"
eks_cluster_name = module.eks.cluster_name
data_plane_wait_arn = module.eks.data_plane_wait_arn
tags = { Environment = var.env, ManagedBy = "terraform" }
metrics_server = true
# Required by Istio Ingress and Grafana
aws_load_balancer_controller = true
# Istio ingress gateway
istio_ingress = true
istio_ingress_helm_config = {
version = "1.20.0"
}
# Paths to Istio Gateway and VirtualService manifests
istio_manifests = {
istio_ingress_manifest_file_path = ["${path.module}/manifests/istio-ingress.yaml"]
istio_gateway_manifest_file_path = ["${path.module}/manifests/istio-gateway.yaml"]
}
# Kiali service mesh dashboard (requires istio_ingress = true)
kiali_server = true
kiali_manifests = {
kiali_virtualservice_file_path = "${path.module}/manifests/kiali-virtualservice.yaml"
}
# Calico network policy (eBPF dataplane)
calico_tigera = true
calico_tigera_helm_config = {
version = "3.27.0"
}
# Reload pods when ConfigMaps or Secrets change
reloader = true
}
For Istio sidecar injection, label your application namespaces:
kubectl label namespace my-app istio-injection=enabled
Recipe 5 — Secrets + Backup
What it installs: External Secrets, Velero, Reloader
For teams that need audit-compliant secret management and cluster disaster recovery:
module "eks_addons" {
source = "git::https://github.com/clouddrove/terraform-aws-eks-addons.git?ref=0.0.7"
eks_cluster_name = module.eks.cluster_name
data_plane_wait_arn = module.eks.data_plane_wait_arn
tags = { Environment = var.env, ManagedBy = "terraform" }
# Sync secrets from Secrets Manager → Kubernetes Secrets
external_secrets = true
# Backup cluster resources + PVs to S3
velero = true
velero_helm_config = {
version = "6.0.0"
values = [
<<-EOT
configuration:
backupStorageLocation:
- name: default
provider: aws
bucket: ${var.velero_bucket}
config:
region: ${var.region}
volumeSnapshotLocation:
- name: default
provider: aws
config:
region: ${var.region}
initContainers:
- name: velero-plugin-for-aws
image: velero/velero-plugin-for-aws:v1.9.0
volumeMounts:
- mountPath: /target
name: plugins
EOT
]
}
# Automatically restart pods when secrets are updated by External Secrets
reloader = true
}
Velero IAM policy — the module creates an IRSA role. To use a custom policy:
velero_iampolicy_json_content = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = ["s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucket"]
Resource = [
"arn:aws:s3:::${var.velero_bucket}",
"arn:aws:s3:::${var.velero_bucket}/*"
]
},
{
Effect = "Allow"
Action = ["ec2:CreateSnapshot", "ec2:DeleteSnapshot", "ec2:DescribeSnapshots",
"ec2:CreateTags", "ec2:DescribeVolumes"]
Resource = "*"
}
]
})
Variable Reference
| Variable | Type | Default | Description |
|---|---|---|---|
eks_cluster_name | string | "" | EKS cluster name |
data_plane_wait_arn | string | "" | ARN to wait on before installing |
manage_via_gitops | bool | false | Skip Helm installs; create IRSA only |
tags | map(any) | {} | Tags for IAM resources |
irsa_iam_role_path | any | {} | IAM role path for IRSA roles |
irsa_iam_permissions_boundary | any | {} | Permissions boundary for IRSA roles |
Reference: terraform-aws-eks-addons →