Wireless AP setup for mobile traffic shaping

I was able to talk with Pat a bit at Velocity about Private Instances and mobile devices. I figure others would be interested in some of that discussion.
You should be aware that the public mobile instances (at least Dulles) are not actually using cellular connections. Its not surprising given the additional costs and the debate as to the benefits of consistency versus the drawback of artificiality aren’t relevant for a free service. Rather, Pat has configured a wireless AP with traffic shaping to mimic a wireless connection and uses that for multiple devices.
Pat said he would document the setup of the AP (perhaps here or elsewhere in the doc). Until then, is anyone else using this approach. Do you care to share what’s worked for you in terms of setups and throttling settings? Were you able to benchmark against a real cellular connection? Was there anything that you weren’t able to mimic to your satisfaction?

Thanks for your thoughts.

I’m writing up the documentation right now. So far I just have the diagram but I should have it mostly complete, including with the traffic shaping by the end of the day: https://sites.google.com/a/webpagetest.org/docs/system-design/mobile-testing

FWIW, we did a LOT of testing at Google with actual networks and dummynet. With dummynet we could simulate a given set of conditions just about exactly (with more consistency) but it’s worth noting that it’s exactly that - one set of conditions. In the case of the default 3G profile I have set up right now, it’s basically modeled around the 60-70th percentile of measured connections globally so it doesn’t match an exact market - some will be faster and some will be slower.

The main things that you can’t duplicate with traffic shaping are:

Radio start-up delay - given a few seconds of idle, most phones will shift the radio into low power modes and it can take anywhere from a few dozen ms to several seconds before the radio comes back and is able to transmit data. It’s more important for multi-page navigations where the radio can turn off between each step but it could also add completely dead time to the beginning of a page load (but that case is completely out of the control of the site owner). In really extreme cases it’s possible for the radio to turn off while waiting for a server response - but if your TTFB is that long you have serious issues to look into anyway.

Network architecture - on traffic-shaped WiFi you will still be going out through your normal ISP’s network (Verizon Fios in the case of Dulles). You won’t go through any carrier proxies, deal with them routing the traffic back to a central egress point or coming from a Carrier’s mobile IP block in case there is any server logic tied to it.

The agents themselves work perfectly fine on actual networks and the test results come back over the wired connection from the tethered host that is driving the agents so the data use is just for the actual browsing and isn’t really why I chose to use WiFi. The main issue is the insane variability using an actual carrier network. The performance varies quite significantly from day to day and even within a day which makes it REALLY hard to compare results to each other.

ok, documentation has been updated: https://sites.google.com/a/webpagetest.org/docs/system-design/mobile-testing

Let me know if there are any areas where more detail would help.

I recently configured a 3G mobile network emulator using a fixed traffic shaping profile, as described by Pat in the documentation of dummynet. Along with his 3G specs of 1.6Mbps down, 768Kbps up, and 300ms RTT, I also added a packet loss factor of 0.2%. I chose to use this loss factor based off of the paper Characterizing 4G and 3G Networks written by Yung-Chih Chen, et al.

A wireless router with the capability to be flashed with OpenWrt’s 14.07 “Barrier Breaker” firmware is the only hardware needed for my setup. Although this is a relatively cheap approach, one misses out on higher level softwares which lead to a much cleaner and elegant script. The softwares required to run my script, along with some well documented terminology about the OpenWrt approach to network emulation, can be found here.

The script:

#!/bin/sh /etc/rc.common
#Default bucket for any ips that do not fall in the range of X.X.X{2..254}
DEFAULT=9999

start() {
  #Download: Parent, default to class 1:9999 which is limited to 100mbps down/up w/ no added latency
  tc qdisc add dev br-lan root handle 1: htb default $DEFAULT
  #Attach class to root node so we can create rules/filters from it
  tc class add dev br-lan parent 1: classid 1:1 htb rate 100mbit burst 15k

  #Upload: Parent, default to some nonexisting class
  tc qdisc add dev eth1 root handle 1: htb default 30

  for i in $(seq 2 254)
  do
    DBUCKET=$(($i+10))
    UBUCKET=$(($i+10+255))
    #Download: Create a bucket from parent class we defined above to limit download to 1.6mbit
    tc class add dev br-lan parent 1:1 classid 1:$DBUCKET htb rate 1.6mbit burst 15k
    #Latency: Add a new qdisc property as a leaf from parent defined in above line with latency and packet loss
    tc qdisc add dev br-lan parent 1:$DBUCKET handle $DBUCKET: netem delay 150ms loss 0.2%
    #Download: Place a filter for each ip in subnet that will refer to the property created in line above
    tc filter add dev br-lan protocol ip parent 1:0 prio 1 u32 match ip dst 192.168.1.$i/32 flowid 1:$DBUCKET
    #Upload: Create a bucket from parent class defined before loop to limit upload speed
    tc class add dev eth1 parent 1: classid 1:$UBUCKET htb rate 768kbit
    #Upload: Create a filter for each ip in subnet that will refer to the class created in the line above
    tc filter add dev eth1 protocol ip parent 1:0 prio 1 handle $UBUCKET fw classid 1:$UBUCKET
    #Upload: Have iptables link filter to ip
    iptables -t mangle -I PREROUTING -s 192.168.1.$i/32 -j MARK --set-mark $UBUCKET
  done

  #Default limit 10mbit handler
  tc class add dev br-lan parent 1:1 classid 1:$DEFAULT htb rate 10mbit burst 15k
  tc qdisc add dev br-lan parent 1:$DEFAULT handle $DEFAULT: sfq perturb 10
}

stop() {
  #Delete br-lan and eth1 rules. Reset iptables
  tc qdisc del dev br-lan root
  tc qdisc del dev eth1 root
  /etc/init.d/firewall restart
}

One main issue I have with my configuration is the technique used to set up the network emulator. TC is very clean when it comes to setting a bandwidth limit across an entire network (which is shared by all devices), but can become very tricky to work with when limiting bandwidth on a per device basis. The solution I came to was using tc to create separate rules for every subnet in the network. When a device connects to the network, the ip that gets assigned to the device will already have rules bound to it. Other issues I had were very similar to the issues Pat ran into, involving emulating the randomness that comes with browsing through a cellular service. The netem software attempts to provide a solution to this problem by offering a random or normal distribution of latency in a predetermined range, but I chose to have consistency in my network instead of randomness for testing purposes.

The benefit that comes with configuring my network emulator on a by ip basis, is that each device that connects to my network can be given custom rules. This means that if I were not using DHCP, I have the ability to run 4G, 3G, or some variant between the two on any device in the network, without affecting the configuration of other devices currently operating within the network.

Please let me know if you have any questions about this approach, or you see areas that could use improvement.

I’ve just been using WiFi connection but have found even that to be all over the place (20k+ speedindexes are fairly common). These results are stable on the old iOS BZAgent, but the NodeJS agent isn’t quite as stable. It might make sense to show how to use the usb connection for network access where these results are so weird.

Android or iOS? You can reverse-tether the Android devices over USB which will provide more stability if you don’t have a clean WiFi environment. I tried doing it once but gave up rather quickly though we do run some devices in a Lab that are configured that way. Other than the network dropping out and having to be reset every now and then it works very well.