Troubleshooting
This section will help you troubleshoot common issues that can arise during the installation of Pangea's Private Cloud services.
Pre-Installation Checklist
Before starting the installation, ensure your environment is properly configured. Here's what to check and why:
1. Required Tools
- AWS CLI: Required for ECR access and image pulls.
- Helm: Needed for package management and deployment.
brew list awscli >/dev/null 2>&1 || brew install awscli
brew list helm >/dev/null 2>&1 || brew install helm
2. AWS Configuration
- Verify that AWS credentials are set up:
aws configure list
3. Kubernetes Access
- Check that the cluster is responsive:
kubectl cluster-info
4. pangeacluster.yml Check
- Ensure
pangeacluster.yml
is not in your current directory to avoid interference with the installation. - Suggested action:
rm *.yml
Common Installation Issues
Namespace Management
Problem
- Installation fails due to namespace conflicts.
Symptoms
- Error message about namespace already existing:
ERROR: INSTALLATION FAILED: Unable to continue with install: ClusterRole "manager-role" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: key "meta.helm.sh/release-namespace" must equal "pangea-private-beta1": current value is "pangea-private-beta1"
- Namespace mismatch errors:
ERROR: the namespace from the provided object "pangea-private-beta2" does not match the namespace "pangea-private-beta1". You must pass "--namespace=pangea-private-beta1" to perform this operation."
Resolution Steps
- Check existing namespaces:
kubectl get namespaces
- Clean up existing installation:
Warning: Use the uninstall command cautiously, as it will remove the entire namespace.
./pangea-private-cloud.sh -n <namespace> -u
- Verify namespace cleanup:
kubectl get all -n <namespace>
- Start fresh installation:
./pangea-private-cloud.sh -n <namespace>
Best Practices
- Use one deployment per team.
- Use descriptive names (e.g.,
pangea-team-dev
). - Document namespace assignments.
Service Fails to Install
Problem
- Pangea cluster pods (
authn/embargo/gateway
) are not in a running state.
Symptoms
-
No "pangea-cluster" pods running:
kubectl get pods -n <namespace>
-
Expected Output
Resolution Steps
- Monitor pod status:
kubectl get pods -n <namespace>
- Identify failing jobs:
kubectl get job -n <namespace>
- Delete the failed job:
kubectl delete job pangea-cluster-authn-<hex string>-<hex string> -n <namespace>
File Intel Initial Database Sync
Problem
- File-intel service takes significant time to sync (10-20 hours).
Resolution Steps
- Check maintenance jobs:
kubectl get cronjobs -n <namespace>
- Expected Output:
- Verify job status:
kubectl get pods -n <namespace>
- Expected Output:
- If jobs are missing:
- Uninstall and reinstall:
./pangea-private-cloud.sh -n <namespace> -u
- Uninstall and reinstall:
Database Initialization Delays
Problem
- Postgres operator initialization takes up to 10 minutes.
Symptoms
- Script appears stuck at "Waiting for pod postgres-0 to run..."
- Database connection failures.
- Services failing to start.
Resolution Steps
- Monitor Postgres pod status:
kubectl get pods -n <namespace> | grep postgres
- Check pod events for issues:
kubectl describe pod postgres-0 -n <namespace>
- View Postgres logs:
kubectl logs postgres-0 -n <namespace>
Note: Postgres initialization may require patience. Avoid interrupting the script during this phase.
Resource Investigation
When services aren't working as expected, investigate their state and logs.
Pod Details Investigation
- Reveals resource constraints, configuration problems, and node assignment issues.
kubectl describe pod <pod-name> -n <namespace>
Service Log Analysis
- Shows application-level errors, connection issues, and initialization problems.
kubectl logs <pod-name> -n <namespace> --previous
Event Timeline Review
- Shows chronological order of issues and reveals cascading failures.
kubectl get events -n <namespace> --sort-by='.metadata.creationTimestamp'
Configuration Verification
List Secrets
kubectl get secrets -n <namespace>
List Custom Resources
kubectl get crd -n <namespace> | grep pangea
Clean Uninstallation Process
1. Uninstall Services
- Triggers graceful shutdown of services:
./pangea-private-cloud.sh -n <namespace> -u
2. Verify Resource Cleanup
- Check that all resources and secrets are removed:
kubectl get all -n <namespace>
kubectl get secrets -n <namespace>
3. Force Cleanup (if needed)
- Forcefully remove namespace:
kubectl delete namespace <namespace> --force
Was this article helpful?