How are people detecting if a test agent is up and still running?
I’ve thought about having a script that will queue up a test periodically and then raise an alert if the test isn’t completed within a specified period. But this strikes me as an impractical way of doing this as there could be many tests that are currently in the queue.
I ask because in the last few days I’ve had the webpagetest driver and url blaster crash for unknown reasons and I’d like some sort of alert when this happens.
There is also a checktesters.php script in the work directory that you can call from cron that will send an email for any locations that haven’t connected to the server in the last hour (all agents for that location have died).
It may also be worthwhile to grab the latest agent binaries. There have been a couple of crash fixes (usually in the optimization checks) but I also added a watchdog process that automatically restarts the agents if they stopped running but didn’t exit on purpose.