Currently have a working private instance, I am looking to harden the Amazon AMI image.
What are the recommended steps to harden the server and agent in AWS?
I have looked around, but did not see any obvious list of steps. So here are my current attempts:
I have edited the security group to block ssh inbound traffic except from some trusted ips.
I have restricted the http inbound traffic, but when the agent restarts, I am not sure how to add that new ip to the inbound rule (or if I can assign a known ip to the restarted agent).
I have added basic auth to the server, but when the agent connects it gets blocked, so I need to dig into how to send the auth from the agent.
Will be adding a self signed cert to the server, but I suspect I will need to edit the agent to add the trusted cert / allow invalid certs.
Haven’t explored if there is away to use security groups in a more effective manner.
For the agents you can block all inbound traffic. You should never need to connect to them. For the server, restricting ssh to a few IP’s is good.
For HTTP access on the server, you can configure the agents to connect to the server’s private IP which should allow access within your VPC but not external and then you can restrict external HTTP access to only those IP’s that need access (office IP block for example).
If you don’t mind, I have two follow up questions:
Out of the box, the WebPageTestAgent terminates, so any changes I make to the agent will be lost. Are you suggesting that the agent should stay running?
If so, where do I make that change? On the agent (add key pair, ssh in presumably)?
How often do you run apt-get update etc?
Right now I see: 272 packages can be updated and 193 updates are security updates. I am totally used to updating/maintaining ubuntu, but is there any (major) risk to functionality if I run the updates.
What changes do you make to the agents? They’re generally meant to be disposable and auto install the latest OS updates as well as browsers when they are started. They are configured through user data so they can be scaled up and down as needed. I tend to use persistent spot requests to keep the costs down and it’s no big deal if the instances get terminated and re-spawned.
The agent waits 30 seconds after starting and then runs /home/ubuntu/agent.sh in a screen session. The shell script runs update, dist-upgrade, autoremove and clean before doing a git update of the agent.
Every hour it exits the agent and does a git update of the agent code and every 24 hours it stops completely and reboots.
The main thing it doesn’t do is a reboot before starting the agent the first time after updating so the kernel update won’t get applied but I’m usually not as worried about that. If it is a concern then I can tweak the default images to reboot the first time they start after running the updates to make sure the kernel gets updated as well.
For the apt-get update/upgrade/dist-upgrade/autoremove/clean, I was considering the server, rather then the agent. Any issues updating the server?
From your previous reply:
I was under the (mis)understanding that the agents needed to be reachable from the server, is this not the case? Is the access from the server to the agent implied? Or would you block all inbound traffic to the agent, including the server?
So for this to work, I would need make the agent long lived and I would have to ssh to the agent and update the agent to use the server private IP?
Dumb question, this on the agent side right?
Sorry for all the questions, just trying to get my head around this.
The agents do not need to be reachable. They poll the server for work and post the results (all over HTTP or HTTPS through the web interface). I usually set them up with the default ACL that blocks all inbound external traffic.
The agents don’t need to be long lived but the server does. For the agent’s user data you provide it with the server’s private network IP (or name) and whenever the agent starts it will adopt the configuration from the user data. You don’t need to ssh to configure it, it reads the configuration from the instance user data. i.e. a user data string like:
That is the actual user data that is used for the EC2 instances in Virginia that connect to the public WebPageTEst (munis the key which was changed). In the case of using the private IP of the server, you just pass that for the wpt_server= paramater.
And yes, the screen session is for the agent. The server doesn’t run anything in screen and just runs the nginx daemon and cron for all of it’s work.
There’s always the chance that an update will break something but it hasn’t been a problem in practice and I should notice really quickly since it will take out most of WebPageTest’s testing.
The bigger impact will be that you will always have the latest browsers and they will update automatically whenever the browser is released so you may need to check browser versions if you investigate a regression to see if it might have been from a browser change.
Working on adding whitelisting to the WebPagetest Server to help with hardening.
I need to add the WebPagetest Agent IP address to the whitelist (for incoming http requests), obviously when the Agent gets dropped and created a new IP is assigned.
I realize this is more of an AWS question, but in the settings.ini there is:
; Per-location Security Group and Subnet IDs to enable launching into VPCs
; (note that this will pin your instances to the availability zone associated with
; the subnet). This is required only if you do not have a default VPC.
;EC2.us-west-2.securityGroup=sg-a0011b223,sg-b1122c334
;EC2.us-west-2.subnetId=subnet-aaa0011bc1
If I set up a WebPagetest Agent Security Group, add the Security Group to the server whitelist and then add it to the settings.ini (and restart ngnix) that should work, right?
Do I need to also set subnetId, or can that be left undefined?