If you get a chance, can you try the (just released) 352 agent to see if it helps?
It should improve the situation if not completely fix it. It’s a bit of a game of whack-a-mole but I think I got to the main offender.
-
Microsoft has a background process that compiles the .net clr assemblies that runs at a low priority but can run for a very long time (30+ minutes). It looks to be the main issue that was causing the variability. It wasn’t discovered until the other work was done though and results were still variable with gaps so it could still be a combination of them.
-
I terminate and completely nuke the chrome updater now so there is no way it will run.
-
I also tweaked the startup idle check to make sure it waits for 2 seconds of idle time. Previously it could have started if there was as little as 100ms of idle. This will slow down thruput a little bit but hopefully not big enough that it matters and with improved consistency.
That said, I still managed to get a few waterfalls with gaps and long requests and watching the task manager on the machine when it was running there wasn’t anything but Chrome so there’s still some lingering issue with either Chrome or EC2.