Hi all -
I recently updated a private setup of WPT with one server and ~20 agents to the latest docker images and I’m experiencing very unusual behavior. When submitting tests with multiple runs (both “first view only” and “first & repeat view”), the tests take extremely long or time out and they only seem to report the last run. Below test log is an example:
2021/04/12 10:19:04 - Test Created
2021/04/12 10:19:45 - Extracting 723215 byte uploaded file '/tmp/phpQMAgYZ' to './results/21/04/12/FS/8c1c912e87d81e592a17db08c1f3dae0'
2021/04/12 10:19:45 - Test Run Complete. Run: 3, Cached: 0, Done: 1, Tester: wptagent001-10.x.x.x
2021/04/12 10:19:45 - 1 of 3 tests complete
2021/04/12 10:19:45 - Done Processing. Run: 3, Cached: 0, Done: 1, Tester: wptagent001-10.x.x.x
Notice that the agent has completed Run 3 but there is no output for Run 1 and 2 - at this stage, the test is in “being tested” status and will stay there until it times out …
The end result is this:
I have tried different things like:
- running the agents “bare” on CentOS 7
- using older docker images (tried 20.01 and 20.05)
- toggling various parameters like
--shaper none
,--xvfb
--dockerized
… but all to no avail.
Here are some of the things i found in the agent logs:
From a Chrome test:
chrome: no process found
[87:87:0412/100005.574484:ERROR:browser_dm_token_storage_linux.cc(94)] Error: /etc/machine-id contains 0 characters (32 were expected).
[87:109:0412/100016.923430:ERROR:bus.cc(393)] Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory
DevTools listening on ws://127.0.0.1:9222/devtools/browser/d951c911-7f99-46f5-8799-135f140810fd
[87:121:0412/100017.080992:ERROR:bus.cc(393)] Failed to connect to the bus: Could not parse server address: Unknown address type (examples of valid types are "tcp" and on UNIX "unix")
[118:118:0412/100017.211604:ERROR:vaapi_wrapper.cc(573)] Could not get a valid VA display
[87:98:0412/100050.119091:ERROR:zygote_communication_linux.cc(276)] Failed to send GetTerminationStatus message to zygote
[0412/100050.115783:ERROR:nacl_helper_linux.cc(307)] NaCl helper process running without a sandbox!
Most likely you need to configure your SUID sandbox correctly
[87:87:0412/100050.129556:ERROR:zygote_communication_linux.cc(276)] Failed to send GetTerminationStatus message to zygote
chrome: no process found
10:00:50.605 - Uploading result
/usr/local/lib/python2.7/dist-packages/urllib3/connectionpool.py:1020: InsecureRequestWarning: Unverified HTTPS request is being made to host 'license.webpagetest.org'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning,
From a Firefox test:
ffmpeg: no process found
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Terminated
firefox: no process found
firefox-trunk: no process found
Current system setup:
Docker version 20.10.5, build 55c4c88
OS: CentOS Linux release 7.9.2009 (Core)
Server install page shows green in all the important places
Server has 16GB of ram and plenty of free disk space, no swapping going on
Agents have 4GB and ~50GB of free disk, 15% CPU util, no swapping
Agents are started with --dockerized
agent option and --shm-size=1g
and --cap-add=NET_ADMIN
docker flags
I don’t know where else i can look - anybody have any ideas?
Thank you so much!!