So other than CPU and 2.16 version there are no differences between these, right?
[hr]
FYI, I m using m1.medium on AWS for generating that private instance graph
The public nodes all vary quite a bit but the “Dulles” VMs that you are comparing against are:
Server:
Single socket Xeon E3 ~2.3-2.5Ghz (generations vary across the 5 physical servers that run the 34 VMs, all have 4 core with HT)
32GB Ram
5-7 VMs per server (usually one VM per processor thread + 1 spare for the hypervisor)
256-512GB Samsung 850 Pro SSD (crazy IOPS, removes storage as a bottleneck)
VMWare ESXi 5.5U2
Each VM is allocated 2GB of RAM and 1 processor core.
When I deploy a new server I’ll configure more VMs than I think I’ll be able to run on it and then submit a whole bunch of tests of high-CPU sites with video capture, tcpdump and timeline enabled. I’ll watch the overall server CPU utilization on the host hypervisor and make sure it’s around 60% with peaks that don’t go over 80%.
You don’t get that kind of control on EC2 though so you’re generally looking for being able to get stable and consistent results. You can try the m3.medium instances which provide more CPU than m1 and also have SSD storage if you want to get closer to the Dulles VMs.
Also, make sure you run a LOT more than one test to see how consistent the results are and to make sure you are looking at representative results.
The gold standard are the Dulles Thinkpad testers which are physical laptops with Intel GPU and 2 core i5 processors (with Hyperthreading). They aren’t MUCH faster than the Dulles VM’s (because of all of the hoops I jump through above) but you can see the CPU has a LOT more headroom: http://www.webpagetest.org/result/150103_PZ_568534d3959a482d49c4061b20fb157d/
The m3.2xlarge comes close to the specs of the VM server that the VMs themselves run on, not the specs of the test VMs. The m3.medium is actually slightly higher spec’d than any of my individual VM’s with 1 2.5GHz core, 3.75GB Ram and SSD storage.
It looks like there are a variety of factors on the EC2 instance though. The DNS times are a lot longer (though that may be due to CPU use as well). On paper the m3.medium should have more CPU available than the Dulles VM’s. In practice it looks like EC2 may be playing fast-and-loose with their CPU equivalents.
Nope, the HTTP Archive is also a custom install but it is WAY overloaded. It was originally not spec’d to measure performance so it is running 32VM’s on each of the 2-socket Xeon servers (I run the testing infrastructure for that as well). It is way-over stacked and is completely CPU-bound when running a crawl.
If you’re running just a couple tests per hour, on small HTTP sites, a t2.micro will be okay ($13/month)
If you’re running just a couple tests per hour, on large or secure sites, you’ll need to use a t2.medium ($52/month)
If you’re running lots of tests per hour, you can’t use t2’s – the most efficient agent will be a c3.large ($135/month)
Feel free to let me know if you’ve found otherwise, but I’ve been running c3.large’s for quite some time and am super happy with the consistency.
Great post, thanks for slogging through the research. I’m really disappointed to see how puny EC2’s “VCPU’s” are compared to what I get on VMWare on a physical box.
Azure is on my list to tackle in the next week or two and it will be interesting to see if their HyperV (I assume) infrastructure scales better for Windows than Amazon’s KVM.
I’m super excited to see how Azure performs as well. At first glance, a VM with comparable specs to a c3.large is $20 cheaper per month, so if performance is the same it’d be worth using if you’ve got lots of agents. Perhaps performance will be much better and we can downgrade even further.
Would love to rerun those tests on Azure once the images are ready, will be keeping an eye out. Thanks Patrick!