What should I do if the installation fails?
Installation could fail due to various reasons. The log/error.log stores detailed messages in case of an error. You can check the log file and identify the issue.
If the installation fails during terraform states, you can resume the installation by running sudo python3 manager.py redeploy. If installation fails before the terraform stages, please re-run the install command.
Verify the following steps before you proceed further:
- Does your installer machine have at least 8 GB of RAM?
The installer machine should have at least 8GB RAM to install Paladin Cloud. We recommend using an instance type greater than t2.large
- Is the maven build failing?
The Maven build may fail if the installation is run from the user’s home directory. We recommend installing from the /opt/ directory.
- Are you getting the error ‘Can’t connect to MySQL server’ on ‘paladincloud-data.xxxxx.rds.amazonaws.com:3306’?
The installer machine should be under the same VPC, or there should be a VPC peering to connect to the resources created from the installer machine. This is required as the installation script needs to access MySQL to import initial data from the SQL file.
- Does your installer machine have enough disk space?
It is recommended that you have at least 20GB of disk space to allow the Docker build to create the images.
Batch jobs stuck in runnable state and not moving to running state. Why?
Batch jobs can remain in the runnable state for a number of reasons. Poor network configuration could be a contributing factor. For batch jobs to run, the instances should have external network connectivity. Since the resources have no public IP address, they must have NAT gateway/instance attached to it.
Check the following links for more details:
I have created an internet-facing(public) ALB, but still, the application is not loading Or seems to be very slow. Why?
An Internet-facing ALB should have an internet gateway attached to its subnet(s). If it internet gateway is not configured correctly to the subnet, the internet and VPC may not be able to communicate. You can check this by going to the load balancer and editing the subnet. If there are any warnings, you can see them while editing the subnet.
I have created an internet-facing(public) ALB, but APIs are failing. Why?
If you make an ALB internet-facing and the internet gateway is correctly configured with subnets, then, after APIs fail that might be because of security group inbound rules. You should either enable access from anywhere Or identify the container IPs and add every one of them to the security group. This is required as all API services except config service communicate with config service initially to get the configuration properties. So other APIs from their containers should be able to connect to the config service, which can happen only if those container IPs are enabled in the security group.
I got disconnected from the installer machine before the install/destroy command got completed. What should I do now?
It is recommended to run the install or destroy command behind the Linux screen(https://linuxize.com/post/how-to-use-linux-screen/) always.
After running the install/destroy command, if you get disconnected from the installer machine, the process will run in the background. Please wait for at least 30 minutes and then try to rerun the command. You may receive a warning message saying, ANOTHER process is running. Check if any processes with the name ‘Terraforms’are running. If yes, please wait until the process is completed. If not, please delete the lock file from installer/data/terraform/.terraform.lock.info, and rerun the command.
Is it necessary to keep the installer machine running at all times?
It is not necessary to run the installer machine continuously. You can stop the instance once you have completed the installation. When a new version is released, the machine may need to be restarted with the Paladin Cloud code pulled and then re-deployed. You can stop the instance again after redeployment.
My installer machine got terminated accidentally. How can I redeploy when the latest version gets released?
The required state files are saved in S3. Follow the steps below to redeploy:
- Create a new instance under the same VPC. Check the detailed steps here
- Get the latest stable release for the Paladin Cloud repo in /opt directory.
- In S3, you can see the Paladin Cloud bucket and a zip file with the name paladincloud-terraform-installer-backup.zip. Download the file and extract it in the installer directory to replace the /installer/data directory.
- Edit the local.py file to have all configurations.
- Run the install command followed by the redeploy command.
The destroy command timed out. What should I do now?
The AWS account might be destroyed if the destroy command fails and ends with a timeout error. Please wait for 30-60 minutes and run destroy command again.
How can I verify if the Docker service is running on my EC2 instance during the installation process?
To check if the docker service is running, go to the EC2 terminal and run the following command:
sudo systemctl docker status
If the docker is running, it will show ‘active (running)‘ and if the docker is not running, please run the following command:
sudo systemctl start docker