Hello and thanks in advance for any assistance.
I’ve been running a private setup of webpage test for over a year, which uses autoscaling. Cloudformation is used to build the stack, and chef to configure the server. For autoscaling I use the amis provided. Previously I used Windows agents, but switched to the linux ami(ami-a88c20d5) late last year, around December. Since then, we’ve had 2 separate events where the agents were no longer able to communicate with the server. Access logs were void of getwork requests.
I run 2 webpagestest environments. One in stag and the other in prod. The first event occurred in early April when newly created instances were passed user data containing different values for wpt_server. We don’t use DNS for the wpt server in our stag environment. In the stag environment wpt_server was set to the wpt server IP address. In our prod environment, wpt_server was set to our DNS entry. Previously this value was set to the stack’s ELB in both environments. To workaround this issue, the prod wpt server was removed from DNS, allowing wpt_server to use the IP address. Not an ideal fix, but I needed to get this running and would return to why this was happening as time permitted.
As of May 22, I’m in a position where autoscaling terminated all the agents, and the newly spawned instances are no longer able to poll the wpt server for work. Nothing changed in the environment, besides the termination and spawning of new agents. This time I’m seeing the issue in both stag and prod.
All the usual aws settings seem appropriate. I’ve verified the files in the settings directory are as expected, as they should be since they’re in chef. The ec2/logs are passing in the same user_data since the first event I mentioned earlier. My attempts thus far were to use cloudformation to spin up a new wpt server instance, thinking that maybe the server code was not in alignment with the new agents.
I have a few questions:
Were any code changes added around this time that may have caused the issue?
What is the preferred way to log into the linux agents to troubleshoot? For instance, the Windows agents use username/password.
What is the link for the official autoscaling documentation?
Is it possible to build the linux ami and run them in our data center? If so, are the agent requirements found here - https://github.com/WPO-Foundation/wptagent/blob/master/docs/install.md