Troubleshooting

Common issues and how to resolve them.

NetworkPolicy has no effect

Symptom: Pods can reach namespaces they shouldn't be able to.

Cause: The CNI plugin does not support NetworkPolicy enforcement.

Fix: Check which CNI is installed and whether network policy is enabled:

# EKS — VPC CNI
kubectl get daemonset aws-node -n kube-system -o jsonpath='{.spec.template.spec.containers[*].env}' \
  | jq '.[] | select(.name=="ENABLE_NETWORK_POLICY")'

# AKS
az aks show --name <cluster> --resource-group <rg> \
  --query networkProfile.networkPolicy --output tsv

# GKE Standard
gcloud container clusters describe <cluster> --region <region> \
  --format="value(networkConfig.enableNetworkPolicy)"

See the platform-specific guides for how to enable enforcement: EKS · AKS · GKE.

Pods can't reach the database

Symptom: Connection timeout to RDS, Cloud SQL, Azure Database, or ElastiCache/Memorystore.

Cause: networkPolicy.vpcCidr doesn't match the actual VPC/VNet address space, so the egress rule blocks database traffic.

Fix: Find your actual CIDR and update values.yaml:

# EKS
aws ec2 describe-vpcs --query 'Vpcs[*].CidrBlock'

# AKS
az network vnet list --query '[*].addressSpace.addressPrefixes'

# GKE
gcloud compute networks subnets list --format="table(name,ipCidrRange)"

Then upgrade:

helm upgrade tenants k8s-multitenant/k8s-multitenant \
  -f values.yaml --reuse-values

RBAC group binding has no effect

Symptom: Users in the group can't access the tenant namespace even though they're listed in rbac.subjects.

Diagnosis:

# Check what RoleBindings exist and who they bind to
kubectl get rolebinding -n <tenant-name> -o yaml

Cause A — EKS: IAM role not mapped to Kubernetes group The group name in rbac.subjects must exactly match the group configured in aws-auth or the EKS access entry. See EKS Guide.

Cause B — AKS: wrong Object ID The name for an Azure AD Group subject must be the group's Object ID (UUID), not its display name.

az ad group show --group "team-alpha-admins" --query id --output tsv

Cause C — GKE: group not under security-group parent The Google Group must be a member of the [email protected] parent group, and that parent must be configured on the cluster. See GKE Guide.

Helm install fails with schema validation error

Symptom:

Error: values don't meet the specifications of the schema(s) in the following chart(s):
k8s-multitenant: ...

Cause: A value in your values.yaml doesn't match the JSON schema in the chart (wrong type, missing required field, invalid CIDR format, etc.).

Fix: Read the error message — it points to the exact field. Common mistakes:

Error	Fix
`tenants[0].name` doesn't match pattern	Tenant name must be lowercase alphanumeric with hyphens, e.g. `team-alpha`
`networkPolicy.vpcCidr` doesn't match pattern	Must be a valid CIDR like `10.0.0.0/8`
`rbac.subjects[0].kind` not in enum	Must be `User`, `Group`, or `ServiceAccount` (case-sensitive)

ResourceQuota blocks pod scheduling

Symptom: Pods in a tenant namespace stay Pending with exceeded quota in events.

Diagnosis:

kubectl describe resourcequota -n <tenant-name>
kubectl describe pod <pod-name> -n <tenant-name>

Fix: Either increase the quota for that tenant (using a per-tenant resourceQuota override) or reduce the pod's resource requests.

Per-tenant quota increase in values.yaml:

tenants:
  - name: data-platform
    resourceQuota:
      requests.cpu: "8"
      limits.cpu: "16"
      requests.memory: 16Gi
      limits.memory: 32Gi
      pods: "50"

LimitRange rejects pods

Symptom: Pod creation fails with maximum cpu usage per Container is 1, but limit is 2.

Cause: The pod's resource limit exceeds the LimitRange max for that namespace.

Fix: Use a per-tenant limitRange override to raise the maximum:

tenants:
  - name: data-platform
    limitRange:
      type: Container
      max:
        cpu: "4"
        memory: 8Gi
      default:
        cpu: 500m
        memory: 512Mi
      defaultRequest:
        cpu: 200m
        memory: 256Mi

Helm upgrade removes a tenant namespace

Symptom: After removing a tenant from the tenants list and running helm upgrade, the namespace (and all its workloads) is deleted.

This is expected behaviour. Helm manages the full lifecycle of resources it owns.

Prevention: Before removing a tenant from values.yaml, annotate the namespace to remove Helm ownership:

kubectl annotate namespace <tenant-name> meta.helm.sh/release-name-
kubectl label namespace <tenant-name> app.kubernetes.io/managed-by-

After that, helm upgrade will no longer manage (or delete) the namespace.

NetworkPolicy has no effect​

Pods can't reach the database​

RBAC group binding has no effect​

Helm install fails with schema validation error​

ResourceQuota blocks pod scheduling​

LimitRange rejects pods​

Helm upgrade removes a tenant namespace​