Capturing Dev Tools Timeline hangs at fetching the result for some pages

Hi Pat / all,

I’ve been experiencing an issue where capturing the timeline can cause some test to not return correctly. Specifically, for example:

This:
webpagetest test https://www.carters.com/ -f -R --location=labclient2_wptdriver -s --poll 1 --timeout 30

returns a nice JSON result, whereas this (just omitting -M):

$ webpagetest test https://www.carters.com/ -f -R -M --location=labclient2_wptdriver -s --poll 1 --timeout 30
returns:
{
“error”: {
“error”: {
“code”: “TIMEOUT”,
“testId”: “160617_FX_14W”,
“message”: “timeout”
}}}

(also tried setting timeout to 60)

In the second case, I see that the test actually completes successfully within 5-7 seconds, but the agent does not seem to be able to retrieve the results afterwards.

The same things works fine for other test targets, e.g.
webpagetest test https://www.forever21.com/ -f -R -M --location=labclient2_wptdriver -s --poll 1 --timeout 30
returns a json just fine, while for others it sometimes times out and sometimes it fetches a result (so it looks like the particular page/structure seems to trigger more/always for some cases).

I verified that this is not an issue with the npm wrapper: When I start the same tests from the web interface with the same arguments, they display results fine when not capturing the timeline, but they hang at “waiting” when capturing it. In the meantime I observe that the test actually completes fine on the test machine an the driver goes back to waiting for more work.

I also verified that manually exporting a timeline using the same Chrome build on the same test machine against the same target page works fine.

What would you say is the best way to troubleshoot this to see why capturing timelines sometimes leads to timeouts when trying to fetch the test results?

Thanks!

I think I have an idea what’s up, seems like something was hanging while trying to fetch the timeline indeed.

I saw the following in /var/log/apache2/error.log:
“PHP Fatal error: Allowed memory size of 67108864 bytes exhausted (tried to allocate 72 bytes) in /var/www/webpagetest/devtools.inc.php on line 1008”

Adding
ini_set(‘memory_limit’, ‘256M’);
at the top of devtools.inc.php seems to have fixed it. I’m not 100% this is the right solution or if there’s an underlying issue, but it does look like the issue happened more often for larger pages (>130 requests, => larger timelines), so it does seem like just a matter of needing more memory than the default 128M. Didn’t snoop around the code much to see of using 128M is justified, but I was convinced that it was when I manually fetched a timeline for one of the pages I was testing and saw that it was 62M (!)

It might be possible to improve this in the future and parse the trace events one at a time instead of loading the whole file but not currently (the file need to be parsed because the timeline is a subset of all trace events and is a slightly different file format).

Recent changes to the mobile agents write the events out one per line to reduce the memory overhead when parsing so once the change is also made to the desktop agents we can also add the code to the timeline downloading code to parse the file one line at a time (falling back to full-file for older tests).

Thanks, makes sense.

(FWIW, I’ve encountered pages for which this sometime breaks even after increasing the allowed memory to 256M - but I agree that throwing more memory at it isn’t a pretty solution to begin with)