Skip to main content

Troubleshooting

Common issues and how to resolve them.


NetworkPolicy has no effectโ€‹

Symptom: Pods can reach namespaces they shouldn't be able to.

Cause: The CNI plugin does not support NetworkPolicy enforcement.

Fix: Check which CNI is installed and whether network policy is enabled:

# EKS โ€” VPC CNI
kubectl get daemonset aws-node -n kube-system -o jsonpath='{.spec.template.spec.containers[*].env}' \
| jq '.[] | select(.name=="ENABLE_NETWORK_POLICY")'

# AKS
az aks show --name <cluster> --resource-group <rg> \
--query networkProfile.networkPolicy --output tsv

# GKE Standard
gcloud container clusters describe <cluster> --region <region> \
--format="value(networkConfig.enableNetworkPolicy)"

See the platform-specific guides for how to enable enforcement: EKS ยท AKS ยท GKE.


Pods can't reach the databaseโ€‹

Symptom: Connection timeout to RDS, Cloud SQL, Azure Database, or ElastiCache/Memorystore.

Cause: networkPolicy.vpcCidr doesn't match the actual VPC/VNet address space, so the egress rule blocks database traffic.

Fix: Find your actual CIDR and update values.yaml:

# EKS
aws ec2 describe-vpcs --query 'Vpcs[*].CidrBlock'

# AKS
az network vnet list --query '[*].addressSpace.addressPrefixes'

# GKE
gcloud compute networks subnets list --format="table(name,ipCidrRange)"

Then upgrade:

helm upgrade tenants k8s-multitenant/k8s-multitenant \
-f values.yaml --reuse-values

RBAC group binding has no effectโ€‹

Symptom: Users in the group can't access the tenant namespace even though they're listed in rbac.subjects.

Diagnosis:

# Check what RoleBindings exist and who they bind to
kubectl get rolebinding -n <tenant-name> -o yaml

Cause A โ€” EKS: IAM role not mapped to Kubernetes group The group name in rbac.subjects must exactly match the group configured in aws-auth or the EKS access entry. See EKS Guide.

Cause B โ€” AKS: wrong Object ID The name for an Azure AD Group subject must be the group's Object ID (UUID), not its display name.

az ad group show --group "team-alpha-admins" --query id --output tsv

Cause C โ€” GKE: group not under security-group parent The Google Group must be a member of the [email protected] parent group, and that parent must be configured on the cluster. See GKE Guide.


Helm install fails with schema validation errorโ€‹

Symptom:

Error: values don't meet the specifications of the schema(s) in the following chart(s):
k8s-multitenant: ...

Cause: A value in your values.yaml doesn't match the JSON schema in the chart (wrong type, missing required field, invalid CIDR format, etc.).

Fix: Read the error message โ€” it points to the exact field. Common mistakes:

ErrorFix
tenants[0].name doesn't match patternTenant name must be lowercase alphanumeric with hyphens, e.g. team-alpha
networkPolicy.vpcCidr doesn't match patternMust be a valid CIDR like 10.0.0.0/8
rbac.subjects[0].kind not in enumMust be User, Group, or ServiceAccount (case-sensitive)

ResourceQuota blocks pod schedulingโ€‹

Symptom: Pods in a tenant namespace stay Pending with exceeded quota in events.

Diagnosis:

kubectl describe resourcequota -n <tenant-name>
kubectl describe pod <pod-name> -n <tenant-name>

Fix: Either increase the quota for that tenant (using a per-tenant resourceQuota override) or reduce the pod's resource requests.

Per-tenant quota increase in values.yaml:

tenants:
- name: data-platform
resourceQuota:
requests.cpu: "8"
limits.cpu: "16"
requests.memory: 16Gi
limits.memory: 32Gi
pods: "50"

LimitRange rejects podsโ€‹

Symptom: Pod creation fails with maximum cpu usage per Container is 1, but limit is 2.

Cause: The pod's resource limit exceeds the LimitRange max for that namespace.

Fix: Use a per-tenant limitRange override to raise the maximum:

tenants:
- name: data-platform
limitRange:
type: Container
max:
cpu: "4"
memory: 8Gi
default:
cpu: 500m
memory: 512Mi
defaultRequest:
cpu: 200m
memory: 256Mi

Helm upgrade removes a tenant namespaceโ€‹

Symptom: After removing a tenant from the tenants list and running helm upgrade, the namespace (and all its workloads) is deleted.

This is expected behaviour. Helm manages the full lifecycle of resources it owns.

Prevention: Before removing a tenant from values.yaml, annotate the namespace to remove Helm ownership:

kubectl annotate namespace <tenant-name> meta.helm.sh/release-name-
kubectl label namespace <tenant-name> app.kubernetes.io/managed-by-

After that, helm upgrade will no longer manage (or delete) the namespace.