Issues with wptdriver and complex pages

I’ve got a setup a two wptdriver clients (windows 8.0 64 bit norwegian) and windows 7.0. 64 bit norwegian) running on two hp elitebook 2540p laptops. The server is running Ubuntu 12.04 and most stuff works as expected.
All simple pages works flawlessly, but when we run the frontpage of http://www.vg.no/, or company site (…) it will fail most of the time. Especially with chrome. It starts but stalls after a random amount of time usually while the browser says “resolving host”, it may or may not have rendered some part of the page, sometimes it runs the first view flawlessly, sometimes it even finishes, but that’s rare. When it happens the client machine freezes up for several minutes… This only happens when run from wptdriver. Surfing the same page manually with chrome works flawlessly. Any suggestions? Does wptdriver log somewhere?


Audun

If you grab a debug build then it will dump a bunch of information to the debug interface (visible with DebugView).

Debug builds: http://www.webpagetest.org/releases/debug/
DebugView: DebugView - Windows Sysinternals | Microsoft Docs

You may have to run DebugView as an administrator and you’ll need to enable “Capture Global Win32”

Hm. I’ve tried with the debug version now.

Running chrome from the commandline the same way wptdriver does it seems to work with no issues

Launching: “C:\Program Files (x86)\Google\Chrome\Application\chrome.exe” --load-extension=“C:\webpagetest\extension” --user-data-dir=“C:\Users\wpt\AppData\Roaming\webpagetest_profiles\Chrome” --no-proxy-server --enable-experimental-extension-apis --ignore-certificate-errors --disable-background-networking --no-default-browser-check --no-first-run --process-per-tab --new-window --disable-translate --disable-desktop-notifications --allow-running-insecure-content --enable-npn http://127.0.0.1:8888/blank.html

Except that http://127.0.0.1:8888/blank gets connection refused.

But once i try it through webpagetest the machine goes to a grinding halt. 100% CPU usage and i usually have to hard-reset the laptop

It only happens on www.vg.no and other very heavy sites…

I’ve attached two logs, one from www.schibsted.no that works with no issues and the one from www.vg.no that needed a hard-reset in the end.

Working log (www.schibsted.no)

Failure and eventually hard reset (www.vg.no)

Can’t see anything in those logs that should explain the problem

Hm. I’ve stilling trying to debug this. I have 7 identical HP 2540p laptops. And I’ve tried serveral different versions of windows on it. To make stuff as simple as possible I’ve not standardized on windows 7 pro 32 bit. On a machine that still have IE 9. Urlblast works as a charm. But anything that uses wptdriver (including IE9) will in 8 out of 10 runs go to a grinding halt on http:///www.vg.no.

I also have issues to get usefull logging from the debugbuilds and debugview. As I can see from the from the prior post I’ve managed it earlier (on a windows 8 box) But I can’t manage to configure it correclty on windows 7 pro. No mater what kind of settings i put it through I only the the default output that also goes to the wptdriver itself

“Checking for work” “Waiting for work” etc.

When it comes the the crash in my prior log it seems that it’s doing this

00000312 16:42:54 [2760] [wpthook] TestState::ActivityDetected()
00000313 16:42:54 [2760] [wpthook] - Request::DataIn(len=1001)
00000314 16:42:54 [2760] [wpthook] - Requests::DataIn(socket_id=18, len=1001)
00000315 16:42:54 [2760] [wpthook] TestState::ActivityDetected()
00000316 16:42:54 [2760] [wpthook] - Request::DataIn(len=2048)
00000317 16:42:54 [2760] [wpthook] - Requests::DataIn(socket_id=14, len=2048)
00000318 16:42:54 [2760] [wpthook] - TrackSockets::Connected(2812) - Client port: 49293
00000319 16:42:54 [2760] [wpthook] TestState::ActivityDetected()
00000320 16:42:54 [2760] [wpthook] - Request::DataIn(len=4096)
00000321 16:42:54 [2760] [wpthook] - Requests::DataIn(socket_id=14, len=4096)
00000322 16:42:54 [2760] [wpthook] HTTP Request: /event/request_data
00000323 16:42:54 [2760] [wpthook] HTTP Query String: (null)
00000324 16:42:54 [2760] [wpthook] TestState::ActivityDetected()
00000325 16:42:54 [2760] [wpthook] TestState::ActivityDetected()
00000326 16:42:54 [2760] [wpthook] - Request::DataIn(len=7450)
00000327 16:42:54 [2760] [wpthook] - Requests::DataIn(socket_id=14, len=7450)
00000328 16:42:54 [2760] [wpthook] TestState::ActivityDetected()
00000329 16:42:54 [2760] [wpthook] - new request on socket 100
00000330 16:42:54 [2760] [wpthook] TestState::ActivityDetected()
00000331 16:42:54 [2760] [wpthook] - Request::DataIn(len=1024)
00000332 16:42:54 [2760] [wpthook] - Requests::DataIn(socket_id=19, len=1024)
00000333 16:42:54 [2760] [wpthook] TestState::ActivityDetected()

<… millions of lines here …>

00002362 16:43:16 [2760] [wpthook] TestState::ActivityDetected()
00002363 16:43:16 [2760] [wpthook] - TrackSockets::Connect(3252)
00002364 16:43:16 [2760] [wpthook] - TrackSockets::Connect: Warning: IPv6 unsupported!
00002365 16:43:21 [2760] [wpthook] - TrackSockets::Connect(3664)
00002366 16:43:31 [2760] [wpthook] - TestState::IsDone() Test: 22704ms, load: 0ms, inactive: 16ms, test timeout:120000, navigating:1, navigated: 0

< 5 minutes of nothing , the machine is unresponsive>

00002367 16:48:50 [2760] [wpthook] - TrackSockets::Connect: Warning: IPv6 unsupported!
00002368 16:48:50 [2760] [wpthook] HTTP Query String: (null)
00002369 16:48:50 [2760] [wpthook] HTTP Request: /task
00002370 16:48:50 [2760] [wpthook] HTTP Query String: (null)
00002371 16:48:50 [2760] [wpthook] - WptTest::GetNextTask

Any ideas?

Browsing through the forums i think i found the issue.

It seems like wptdriver isn’t very happy running in a multi-core enviroment. All my testmachies are pysical laptops with quad-core cpu. I used taskmanager to set affinity on wptdriver.exe to run on only one core. And Hey Presto: All wptdriver issues vanishes in thin air.

Racecondition that gets worse with number of cores on complex sites?
[hr]
creating “wptdriver.bat” with content:

start /affinity 1 wptdriver.exe

And linking that .bat files in from startup instead makes it permanent.

:smiley:

Hmm. Possible though it doesn’t make much sense. wptdriver just launches the browser process and injects wpthook into the browser and then waits for everything to finish. It is super-simplistic and while it does thread a bit. Any interaction with a complex page would be in wpthook which runs inside of the browser process(es) and wouldn’t be affected by the affinity.

I’ll have a bit more spare time early next week and will be able to take a look then and see if it is something I can reproduce or track down. I run on everything from single-core VM’s to quad-core laptops (with hyperthreading) so it should be easy enough to track down if I can reproduce it.

Well. I don’t know. But setting affinity to 1 definitely solves the problems. Not a single issue with it on. Did a few hundred tests and no issues. Once I removed the affinity and ran wptdriver on all cores. Complex pages like www.vg.no would hang 9 out of 10 times.

Urlblast is unaffected and works as expected.

If affinity is set on the wptdriver process and that process fires up chrome. Will chrome and all other children also only run on one core?

ok, I think I may have finally nailed it (sorry it took so long). Can you try the latest agent binary (without the forced affinity)? - https://sites.google.com/a/webpagetest.org/docs/private-instances#TOC-Updating-Test-Agents

I was able to reproduce it locally and tracked down a couple of structures that were not protected by critical sections. Fixed it and it has been running rock solid on the machines where I was able to reproduce it.

YES!

It now works flawlessly without affinity. Good catch!